Skip to content

Native Medical Services

OpenMed currently operates across three relevant boundaries:

  • the local operator runtime
  • the configured model-provider path for general agent reasoning
  • the native medical services that power extraction, de-identification, terminology, and HCC/RAF work

The point of this split is not locality for its own sake. It is to keep sensitive or high-volume clinical processing on dedicated medical-service planes that can be governed separately, respond faster on long unstructured inputs, and operate more cost-efficiently than sending every page through a frontier LLM path.

Reference architecture

Inspectable runtime. Protected medical services. Explicit boundary.

OpenMed keeps sessions, plans, workflow previews, and artifacts in the operator runtime while general agent reasoning uses the configured model provider. Clinical extraction and medical-coding services sit behind separate configurable endpoints, so teams do not need to route every PHI-heavy or page-heavy workload through the same frontier-model path.

Plane 01

Operator runtime

CLI, TUI, sessions, plans, provenance, and workflow artifacts remain on the operator machine, with the review loop visible before teams depend on outputs.

Local runtime Review surface Artifact control
Plane 02

Clinical extraction service

NER, PII detection, de-identification, and batch extraction run through a dedicated endpoint.

OPENMED_INFERENCE_URL Private container Accelerated hardware
Plane 03

Medical coding and terminology service

ICD-10, CPT, SNOMED, LOINC, RxNorm, MedlinePlus, PubMed, HCC mapping, and RAF scoring run through a separate service boundary.

Terminology service LOINC / RxNorm / MedlinePlus HCC / RAF Protected endpoint

Service planes

Clinical extraction plane

The extraction plane is the endpoint behind OPENMED_INFERENCE_URL.

It powers native OpenMed tools such as:

  • extract_entities
  • extract_pii
  • deidentify_text
  • batch extraction workflows built on the same service

Operationally, this is where OpenMed handles:

  • clinical NER
  • billing-oriented entity extraction
  • PII detection
  • de-identification

Current posture:

  • during preview, OpenMed provisions this endpoint on private Hugging Face accelerated infrastructure
  • access can be protected with OPENMED_INFERENCE_API_KEY
  • access can also use OPENMED_INFERENCE_HF_TOKEN or HF_TOKEN
  • packaged binaries can resolve embedded service credentials without hard-coding plaintext defaults into source

Coding and terminology plane

The coding plane is the endpoint behind OPENMED_MED_CODES_API_URL.

It powers native OpenMed tools such as:

  • PubMed search and abstract retrieval
  • ICD-10, CPT, SNOMED, and LOINC search / lookup / validation
  • RxNorm medication normalization and related-concept lookup
  • MedlinePlus patient-education topic lookup
  • code crosswalks
  • HCC mapping
  • RAF score calculation

This is the service boundary that makes the HCC and revenue-integrity story concrete: OpenMed can orchestrate clinical note review, extract coding candidates, and then hand those codes to a separate terminology/HCC service instead of collapsing everything into one generic model call.

Current posture:

  • during preview, OpenMed provisions this endpoint on private Hugging Face accelerated infrastructure
  • access can be protected with OPENMED_MED_CODES_API_KEY
  • access can also use OPENMED_MED_CODES_HF_TOKEN or HF_TOKEN
  • packaged binaries can resolve embedded service credentials without hard-coding plaintext defaults into source

Why the split matters

This architecture gives OpenMed a stronger healthcare deployment story than a single undifferentiated model endpoint:

  • extraction and de-identification can scale independently from coding and terminology lookup
  • HCC and RAF operations can live behind a dedicated protected service boundary
  • sensitive or page-heavy clinical inputs do not need to consume frontier-model context for every extraction or coding pass
  • teams can reserve the model-provider path for general reasoning while keeping protected clinical processing on dedicated services
  • the service tier can be swapped without changing the operator workflow surface

These service endpoints are native OpenMed backends. They are not remote MCP servers.

Deployment patterns

Pattern Runtime Extraction / PII Coding / HCC
Preview reference Local operator machine Private Hugging Face accelerated endpoint operated by OpenMed Separate protected med-codes endpoint operated by OpenMed
Customer cloud Local operator machine or managed desktop Private container in VPC / private cloud Private terminology and HCC service in the same environment
On-prem / edge Managed workstation Local GPU or internal inference cluster Internal terminology / HCC API
Lab / dev Local operator machine Local or sandbox endpoint Local or sandbox endpoint

Private Hugging Face hosting is the current preview deployment, not a hard dependency. The real product boundary is the pair of configurable endpoint URLs.

Security and access model

OpenMed itself does not claim that every deployment is automatically compliant because one part of the product is local. The defensible statement is narrower and more useful:

  • the operator runtime and review loop are inspectable
  • the model-provider path and the medical-service tier are separate, explicit boundaries
  • the actual privacy and compliance posture depends on where those services are hosted and governed
  • service access can be protected with API keys and optional bearer tokens
  • packaged binaries can carry embedded service credentials instead of shipping plaintext defaults
  • no product telemetry is built into the runtime

During preview, OpenMed serves the native medical-service tier from private Hugging Face infrastructure so evaluators do not need to deploy it themselves. The same workflow surface can later target customer-managed cloud or on-prem environments while preserving the OpenMed runtime and review loop.

Configuration surface

export OPENMED_INFERENCE_URL="https://<private-inference-endpoint>"
export OPENMED_INFERENCE_API_KEY="..."
export OPENMED_INFERENCE_TIMEOUT_SECONDS="30"

export OPENMED_MED_CODES_API_URL="https://<private-med-codes-endpoint>"
export OPENMED_MED_CODES_API_KEY="..."
export OPENMED_MED_CODES_TIMEOUT_SECONDS="10"

export OPENMED_SERVICE_MAX_RETRIES="3"
export OPENMED_SERVICE_RETRY_BACKOFF="1"
export OPENMED_SERVICE_CIRCUIT_OPEN_SECONDS="120"

Optional bearer-token auth:

export OPENMED_INFERENCE_HF_TOKEN="..."
export OPENMED_MED_CODES_HF_TOKEN="..."

See Configuration for the full environment-variable reference and Privacy & Security for the runtime-boundary explanation.