Healthcare Data Semantics
Healthcare Data Semantics
Two systems can exchange syntactically valid FHIR — well-formed resources, correct REST interactions, passing schema validation — and still fail to interoperate. If one system codes diagnoses in SNOMED CT and the other codes them in ICD-10-CM, the receiving system cannot reliably interpret the data without translation. If one lab uses a local code for a serum sodium test and another uses LOINC 2823-3, no downstream system can aggregate the results without a mapping table.
This is the semantic layer: the agreement on what codes mean and which code system to use for which data type. It is an architectural decision, not an implementation detail. Getting it wrong compounds over time — every consumer of your data inherits the semantic choices you made at the source.
This article covers what the major clinical vocabularies are, when to use each, and how to approach mapping between them. For FHIR-specific mechanics — CodeSystem, ValueSet, ConceptMap, terminology server operations — see FHIR Terminology.
SNOMED CT
SNOMED CT (Systematized Nomenclature of Medicine — Clinical Terms) is the broadest clinical terminology in healthcare. Where other code systems classify concepts for administrative or billing purposes, SNOMED CT is designed to represent clinical meaning — what a clinician actually observed, did, or documented.
Concept model
Every SNOMED CT concept has:
- A concept identifier (numeric, e.g.,
44054006for Type 2 diabetes mellitus) - One or more descriptions: a Fully Specified Name (FSN, unique and unambiguous), one Preferred Term, and optional synonyms
- Relationships to other concepts:
IS A(hierarchy),finding site,associated morphology,causative agent, and dozens of others - A definition status: primitive (not fully defined by its relationships) or fully defined
Relationships make SNOMED CT a terminology, not just a code list. A system can infer that 44054006 | Type 2 diabetes mellitus | IS A 73211009 | Diabetes mellitus | IS A 362969004 | Disorder of endocrine system |, and use this hierarchy for subsumption queries: “find all patients with any form of diabetes.”
Hierarchy
SNOMED CT has nineteen top-level hierarchies. The most commonly used in clinical integration:
| Hierarchy | Description | Examples |
|---|---|---|
| Clinical finding | Observations, findings, diagnoses | Hypertension, fracture, Type 2 diabetes |
| Procedure | Clinical activities | Appendectomy, blood pressure measurement, vaccination |
| Body structure | Anatomical locations | Left femur, hepatic artery, cerebral cortex |
| Organism | Pathogens and organisms | Staphylococcus aureus, influenza virus |
| Substance | Drugs, chemicals | Penicillin, glucose, ethanol |
| Pharmaceutical / biologic product | Medicinal products | Amoxicillin 250mg capsule |
| Observable entity | What can be measured | Serum sodium concentration, body temperature |
| Situation with explicit context | Findings with context modifiers | Family history of cancer, no known allergies |
Expressions and post-coordination
Simple SNOMED codes (single concept identifiers) are pre-coordinated — the full meaning is built into the concept definition. Post-coordination combines multiple concepts in an expression to represent nuanced meaning that no single concept captures:
64572001 |Disease| : 363698007 |Finding site| = 368209003 |Right arm structure|
This expression means “disease of the right arm structure.” Post-coordination is powerful but increases implementation complexity. Most real-world systems use pre-coordinated codes; support for post-coordination should be an explicit design decision.
Release cadence and licensing
SNOMED International releases SNOMED CT twice yearly, in January and July. Access requires a license through a National Release Center. Most countries have national release programs (the US NRC is at NLM). Clinical software deployed in a member country can access SNOMED CT at no additional cost through the national license. Verify your jurisdiction’s terms before deployment.
When to use SNOMED CT
SNOMED CT is the right choice for:
- Problem lists and clinical diagnoses in provider-facing systems
- Clinical decision support (subsumption queries require a proper terminology)
- Procedures in clinical (non-billing) contexts
- Clinical findings and observations that do not have a LOINC code
SNOMED CT is harder to implement than ICD-10 — it requires a terminology server capable of subsumption and expression queries. For billing submissions, ICD-10 is mandated regardless of what your clinical system uses internally.
LOINC
LOINC (Logical Observation Identifiers Names and Codes) is the standard for laboratory tests, clinical measurements, and clinical observations. Where SNOMED CT represents findings and diagnoses, LOINC represents what was measured and how. The division is: SNOMED codes the result concept; LOINC codes the observation method.
LOINC is produced by the Regenstrief Institute in Indianapolis and is free to use — download the full database from loinc.org. There is no licensing fee.
The six-part naming convention
Every LOINC code is defined by six axes. Together, they constitute a unique, fully specified observation. No two LOINC codes share the same combination of all six parts.
| Axis | Name | Description | Example |
|---|---|---|---|
| 1 | Component | What is measured | Sodium |
| 2 | Property | Characteristic of what is measured | SCnc (substance concentration) |
| 3 | Time Aspect | Interval or point in time | Pt (point in time) |
| 4 | System | Specimen or system | Ser (serum) |
| 5 | Scale | How the result is expressed | Qn (quantitative) |
| 6 | Method | How the measurement was made | (optional — often blank for general methods) |
Applying this to serum sodium: Component=Sodium, Property=SCnc (substance concentration), Time=Pt, System=Ser (serum), Scale=Qn (quantitative), Method=(unspecified) → LOINC 2951-2 Sodium [Moles/volume] in Serum or Plasma.
The six-part structure is why LOINC codes are unique: 2951-2 (serum sodium) is distinct from 2952-0 (urine sodium), because the System axis differs.
LOINC code categories
| Category | Count (approx.) | Use Cases |
|---|---|---|
| Laboratory | ~85,000 | Blood chemistry, hematology, microbiology, pathology, urinalysis |
| Clinical (vital signs) | ~2,000 | Blood pressure, heart rate, temperature, oxygen saturation, BMI |
| Survey instruments | ~7,000 | PHQ-9, GAD-7, AUDIT, Apgar, PROMIS scales |
| Document ontology | ~5,000 | Document type codes used in CDA and FHIR DocumentReference |
| Imaging | ~2,000 | Radiology study types (used in DICOM metadata and FHIR ImagingStudy) |
Why lab interoperability fails without LOINC
Every laboratory has a local test catalog. The local code for “Complete Metabolic Panel” at Hospital A is different from Hospital B’s local code for the same panel. Without a common code, a receiving EHR cannot know that both systems are reporting the same test.
LOINC solves this: both labs map their local codes to the corresponding LOINC code. The receiving system understands LOINC and can aggregate, trend, and alert on results regardless of which lab generated them.
This mapping is not trivial. A lab may have hundreds or thousands of local codes, some of which map to a single LOINC code, some that require different LOINC codes depending on the specimen type, and some for which no exact LOINC match exists. This is real work that requires a LOINC expert reviewer.
LOINC for order vs. result codes
LOINC distinguishes order codes from result codes. The order code represents what the clinician requested; the result code represents what was actually measured. For a BMP (Basic Metabolic Panel), the order code might be 24320-4 (Basic metabolic panel) and the result components include individual codes for sodium, potassium, chloride, bicarbonate, BUN, creatinine, and glucose.
When building FHIR Observation resources, use the result LOINC code on the Observation, not the order code. The order belongs on the ServiceRequest.
RxNorm
RxNorm is the US standard terminology for drugs. It is produced and distributed by the National Library of Medicine (NLM), is free to use, and is updated weekly. RxNorm provides normalised names and codes for medications, and — critically — links across the different drug coding systems used in US healthcare.
Concept types (Term Types / TTY)
RxNorm organises drug concepts in a hierarchy of Term Types:
| TTY | Name | Description | Example |
|---|---|---|---|
| IN | Ingredient | Active ingredient only | Metformin |
| PIN | Precise Ingredient | Specific salt or ester form | Metformin hydrochloride |
| MIN | Multiple Ingredients | Combination ingredient set | Metformin / sitagliptin |
| BN | Brand Name | Trade name | Glucophage |
| SCD | Semantic Clinical Drug | Ingredient + strength + dose form | Metformin 500 MG Oral Tablet |
| SBD | Semantic Branded Drug | Brand + strength + dose form | Glucophage 500 MG Oral Tablet |
| GPCK | Generic Pack | Generic multi-pack | Metformin 500 MG Oral Tablet [60 Tablets] |
| BPCK | Branded Pack | Branded multi-pack | Glucophage 500 MG Oral Tablet [60 Tablets] |
Which TTY to use
For prescriptions and medication orders: use SCD (Semantic Clinical Drug) for generic prescribing or SBD for brand-required prescribing. SCD and SBD encode the information a pharmacist needs to dispense: the drug, the strength, and the form.
Using IN (ingredient only) for a prescription is insufficient — “Metformin” doesn’t tell the pharmacy what to dispense. Using NDC for a prescription is also incorrect — NDC is package-specific and changes when a manufacturer reformulates or repackages, which breaks medication history comparisons.
For dispensing records: NDC (National Drug Code) is appropriate because it identifies the specific packaged product dispensed. RxNorm provides mappings from NDC to RxNorm, enabling normalisation.
For medication history and reconciliation: SCD or IN level codes provide stable identifiers that aggregate across brand changes and reformulations.
NDC relationship
The NDC is a 10- or 11-digit code that identifies a specific drug product from a specific manufacturer in a specific package size. Multiple NDCs map to the same SCD — the same active ingredient, strength, and form from different manufacturers or in different package sizes all map to one SCD.
RxNorm is the normalisation layer that makes “Metformin 500mg tablet from Manufacturer A, 100-count bottle” and “Metformin 500mg tablet from Manufacturer B, 60-count bottle” comparable in clinical and analytics systems.
FHIR system URI
Use http://www.nlm.nih.gov/research/umls/rxnorm as the CodeSystem URI when coding medications in FHIR resources.
ICD-10
ICD-10 is the tenth revision of the International Classification of Diseases, produced by the World Health Organization. In the United States, two variants are used:
- ICD-10-CM (Clinical Modification): diagnoses — used by all care settings for billing and reporting
- ICD-10-PCS (Procedure Coding System): inpatient procedures — used by hospitals for inpatient billing
ICD-10-CM replaced ICD-9-CM in the US on 1 October 2015.
ICD-10-CM structure
ICD-10-CM codes are 3–7 characters. The structure encodes clinical specificity hierarchically:
E11.649
│ │└── 9 = Without specified complication
│ └─── 64 = Hypoglycemia
└────── E11 = Type 2 diabetes mellitus
Position 1 is alphabetic (A–Z, not I or O, to avoid confusion with digits). Positions 2–3 are numeric. Position 4 onward (after the decimal point, which is implicit in electronic systems) adds specificity: etiology, manifestation, laterality, severity, encounter type.
| Code length | Level of specificity | Example |
|---|---|---|
| 3 characters | Category (broadest) | E11 — Type 2 diabetes mellitus |
| 4 characters | Etiology / body system | E11.6 — Type 2 diabetes mellitus with other specified complications |
| 5–7 characters | Manifestation, laterality, severity | E11.641 — Type 2 diabetes mellitus with hypoglycemia with coma |
In electronic systems, always use the most specific (longest) applicable code. Submitting a 3-character code when a more specific code exists will result in claim rejection.
ICD-10-PCS structure
ICD-10-PCS codes are exactly 7 alphanumeric characters. The structure is:
Section | Body System | Root Operation | Body Part | Approach | Device | Qualifier
Every character has meaning. 0FB43ZX decodes as:
0= Medical and SurgicalF= Hepatobiliary System and PancreasB= Excision4= Gallbladder3= PercutaneousZ= No DeviceX= Diagnostic
ICD-10 for billing, not clinical systems
ICD-10 was designed for administrative classification and billing. It is excellent for that purpose. It is not designed for clinical precision — multiple distinct clinical conditions may share an ICD-10 code, and the code does not carry the semantic relationships needed for clinical decision support.
The implication: ICD-10 is mandatory for claims and encounter reporting. For the clinical problem list, SNOMED CT is preferable. When building FHIR Condition resources, you often need both: SNOMED for the clinical system of record, ICD-10 for billing-facing systems. See Clinical Data Mapping for how to handle dual coding on a single Condition resource.
CPT
CPT (Current Procedural Terminology) is the standard for coding outpatient and physician procedures in the United States. It is maintained and licensed by the American Medical Association (AMA). Unlike SNOMED CT, LOINC, and RxNorm, CPT is not free — use requires a license from the AMA. Most EHR vendors include a CPT license in their software agreements, but redistribution and use in custom applications require separate licensing.
CPT categories
| Category | Description | Example Range |
|---|---|---|
| Category I | Procedures and services | 00100–99499 |
| Category II | Performance measurement tracking codes | 0001F–9007F |
| Category III | Emerging technology, services, and procedures | 0001T–0780T |
Category I codes are what appear on claims. Category II codes are supplementary tracking codes, not billable as primary. Category III codes are temporary codes for new and experimental procedures — they may be promoted to Category I or retired.
CPT codes are updated annually on 1 January.
CPT for billing, not clinical characterisation
A single CPT code can map to multiple distinct clinical procedures. 99213 (Office or other outpatient visit, established patient, moderate complexity) covers an enormous range of encounters. A CPT code tells you what was billed; it does not describe what happened clinically with the precision that SNOMED CT provides.
For clinical analytics and decision support, SNOMED CT codes for procedures carry the semantics. For billing, CPT is mandated by Medicare and virtually all commercial payers. When building Procedure resources in FHIR, model both when both are present — the CPT code for billing context, the SNOMED code for clinical context.
Code system comparison
| Code System | Maintained By | Cost | Primary Use | Scope | FHIR System URI |
|---|---|---|---|---|---|
| SNOMED CT | SNOMED International / national release centers | Free via national license | Clinical findings, diagnoses, procedures | Global; clinical precision | http://snomed.info/sct |
| LOINC | Regenstrief Institute | Free | Lab tests, observations, document types | Global; laboratory and clinical measurements | http://loinc.org |
| RxNorm | NLM | Free | Drug terminology, normalization | US; links NDC, brand, generic | http://www.nlm.nih.gov/research/umls/rxnorm |
| ICD-10-CM | WHO / CDC (US version) | Free | Diagnosis coding for billing | US clinical modification; billing | http://hl7.org/fhir/sid/icd-10-cm |
| ICD-10-PCS | CMS | Free | Inpatient procedure coding | US inpatient procedures; billing | http://www.cms.gov/Medicare/Coding/ICD10 |
| CPT | AMA | Licensed | Outpatient procedure and service coding | US; billing | http://www.ama-assn.org/go/cpt |
Mapping strategy
When to map
Cross-system semantic mapping is required whenever data crosses a vocabulary boundary: sending ICD-10 diagnoses to a clinical system that expects SNOMED, receiving lab results with local codes that must be normalised to LOINC, aggregating medication data from systems using different drug terminologies.
Mapping is not free. Every mapping table must be created, validated, maintained, and versioned. A mapping created against ICD-10-CM 2023 may be incorrect for ICD-10-CM 2025 if codes were added, revised, or retired. Budget mapping maintenance as an ongoing operational cost, not a one-time project.
Mapping relationship types
| Relationship | Description | Example |
|---|---|---|
| Equivalent (1:1 exact) | Source and target have the same meaning | SNOMED 44054006 ↔ ICD-10-CM E11 (approximate; see below) |
| Narrower than (1:many expansion) | Source is broader; multiple target codes required | One LOINC panel code maps to multiple LOINC component codes |
| Broader than (many:1 aggregation) | Multiple source codes collapse to one target | Multiple SNOMED finding codes → one ICD-10-CM billing code |
| Inexact / related | Approximate; loss of meaning acknowledged | Free-text diagnosis mapped to closest ICD-10 code |
| Unmappable | No target equivalent; must be documented | Local proprietary code with no standard equivalent |
Note that SNOMED and ICD-10 are not semantically equivalent at any level of granularity. SNOMED 44054006 | Type 2 diabetes mellitus | and ICD-10-CM E11 are related, but the SNOMED concept is more precise and the ICD-10 category broader. Claiming these are exact equivalents is incorrect. The HL7 ConceptMap resource represents these relationships with the appropriate relationship type — use equivalent, narrower, or broader accurately.
NullFlavor for unmappable concepts
When a source concept cannot be mapped to any target code, do not omit the coding or substitute a placeholder code. Use a NullFlavor code to explicitly document that a value exists but cannot be represented:
| NullFlavor Code | Meaning |
|---|---|
UNK | Unknown — a value exists but is not known |
OTH | Other — a value exists but is not in the target code system |
NASK | Not asked — the data was not collected |
ASKU | Asked but unknown — asked, but patient/source could not provide |
In FHIR, represent unmappable concepts using a dataAbsentReason extension or by including a text-only coding with no code element. Document every unmappable concept in your mapping specification — they represent semantic gaps that accumulate as technical debt.
Free-text to code is a different problem
Mapping between code systems is structured-to-structured translation. Mapping free text to a code (e.g., “patient has hypertension” → SNOMED 38341003) requires natural language processing (NLP). NLP-based coding is probabilistic, not deterministic, and requires human review workflows for clinical use. Do not conflate the two problems or assume that a mapping table solves free-text inputs.
Value set governance
A value set is a curated set of codes drawn from one or more code systems, assembled for a specific clinical purpose. The problem list value set defines which SNOMED codes are valid for problem list entries. The vital signs value set defines which LOINC codes are valid for vital sign observations.
Value sets are where vocabulary policy meets implementation. A FHIR profile binds a data element to a value set with a binding strength (required, extensible, preferred, example). A “required” binding means the system must use a code from that value set — deviating causes validation failures. An “extensible” binding means the value set is preferred but local codes are permitted when no match exists.
Governance failures cause interoperability failures
Governance failure happens when:
- A value set is not versioned, so different systems pin different snapshots
- A value set includes retired codes that downstream systems reject
- A value set excludes legitimate codes, forcing systems into unmappable situations
- Value set updates are not distributed to trading partners
The plumbing can work perfectly — messages are delivered, validated, processed — and the integration still fails because the codes in those messages are not mutually understood.
VSAC
The Value Set Authority Center (VSAC) is the authoritative US repository for value sets used in quality measures, clinical decision support, and regulatory programs. Value sets used in eCQMs (electronic Clinical Quality Measures), CCDA templates, and ONC-endorsed programs are published in VSAC. Access requires a UMLS license (free).
When building integrations for US clinical programs, check VSAC for authoritative value sets before defining your own. Using a VSAC-published value set rather than a locally defined one increases interoperability with other systems implementing the same measures or programs.
Cross-reference
For the FHIR mechanics of terminology — how to represent CodeSystem, ValueSet, and ConceptMap resources; how to use $expand and $validate-code; how to structure coded elements in resources — see FHIR Terminology.
For how vocabulary choices affect Observation, Condition, and Procedure mapping specifically — including the coding anti-patterns most commonly encountered in clinical data — see Clinical Data Mapping.