Patient Identity Matching

Patient Identity Matching

Patient matching is the problem of determining whether two records in different systems (or in the same system at different times) refer to the same person. It sounds straightforward. It is not.

The United States has no universal patient identifier. The 1996 HIPAA legislation originally called for one; Congress prohibited its development due to privacy concerns and has renewed that prohibition annually ever since. Every provider organisation assigns its own Medical Record Number (MRN). The same patient has a different MRN at every hospital they have visited, a different identifier with their insurer, and possibly no shared identifier at all with a specialist they were referred to.

Demographic data compounds the problem. Names change — through marriage, divorce, or preference. Addresses change. Date of birth is transcribed wrong. Middle names and suffixes are inconsistently captured. A patient named “María García-Lopez” may appear as “Maria Garcia Lopez,” “M. Lopez,” or “Garcia Maria” depending on which front-desk staff entered the registration data.

The consequences of getting matching wrong are asymmetric. A false positive — linking the wrong patient’s records — exposes the receiving clinician to another patient’s allergies, diagnoses, and medications. That is a patient safety event. A false negative — failing to match records that belong to the same person — results in fragmented records, repeated tests, and incomplete clinical pictures. False negatives are bad; false positives are worse. Matching algorithms must be calibrated to prioritise avoiding false positives.


Deterministic matching

Deterministic matching asserts a match when one or more identifiers are exactly equal across records. It is the simplest and most reliable form of matching when it works.

Common identifier types used for deterministic matching:

Identifier TypeReliabilityLimitation
Medical Record Number (MRN)High within one organisationFacility-specific; not portable across organisations
Social Security Number (SSN)HighOften partially collected; privacy-sensitive; errors in entry
Insurance Member IDHigh within one payerChanges when insurance changes; same patient has multiple IDs across payers
National Provider Identifier (NPI)N/AThis is a provider identifier, not a patient identifier — do not confuse them
Driver’s License NumberMediumNot consistently collected; varies by state format
Passport NumberLowInfrequently collected in clinical settings

Deterministic matching fails when:

  • Source systems use different identifier namespaces (your MRN is not the same as the referring hospital’s MRN)
  • The identifier was not collected at registration
  • The identifier was entered incorrectly

Never rely solely on MRN for cross-organisational matching. An MRN is a facility-local identifier. Sending it to another organisation is meaningless unless both systems share the same identifier namespace — which they generally do not.

In FHIR, every Patient.identifier must include a system URI that identifies the issuing organisation’s namespace. An identifier without a system is unresolvable across systems and cannot be used for deterministic matching. This is one of the most common FHIR implementation failures.


Probabilistic matching

Probabilistic matching assigns a composite similarity score across multiple demographic fields and uses threshold rules to decide: match, possible match (review), or no match.

Demographic fields and matching algorithms

FieldAlgorithmNotes
Last nameJaro-Winkler similarityHandles transpositions and typos; weight heavily
First nameJaro-Winkler similarityMiddle name sometimes more distinctive than first
Date of birthExact match or digit transposition checkHigh discriminating power; transcription errors common
Sex / genderExact matchLow discriminating power alone
Address — streetJaro-Winkler or standardised comparisonNormalise with postal address standardisation first
Address — cityExact or fuzzyLow discriminating power alone
Address — ZIP codeExact matchModerate discriminating power
Phone numberExact match (normalised)Changes frequently; use with low weight
SSN (last 4)Exact matchModerate discriminating power; privacy constraints

Jaro-Winkler is preferred over Levenshtein for name matching in patient data because it gives higher weight to prefix agreement (the first few characters are more likely to be correct than later characters in a transcription error scenario) and is robust to character transpositions.

Threshold configuration

A probabilistic matcher produces a score between 0 and 1 (or an equivalent weighted score). You define two thresholds:

score < lower threshold  → No match
lower threshold ≤ score < upper threshold  → Possible match (manual review queue)
score ≥ upper threshold  → Automatic match

The gap between the two thresholds is your review zone. A narrow gap reduces manual review volume but increases both false positives and false negatives. A wide gap increases review volume but catches more uncertain cases before they become errors.

Calibrate thresholds against your actual patient population and your actual data quality. A paediatric hospital has a different name distribution than an urban safety-net hospital. A system that collects SSN consistently can weight it more heavily than one that only captures it for Medicare patients. Threshold values from a reference implementation are not safe to adopt without validation against your data.

Phonetic algorithms

Soundex and Metaphone are phonetic encoding algorithms — they encode names by how they sound rather than how they are spelled. They are useful as a blocking strategy (pre-filter candidates before computing Jaro-Winkler) but should not be the primary matching algorithm. Jaro-Winkler is more accurate for clinical name matching.


Hybrid approach

In practice, the most reliable matching strategy combines deterministic and probabilistic matching in sequence:

  1. Attempt deterministic match on shared identifiers (insurance ID, SSN if available)
  2. If no identifier match, apply probabilistic scoring across demographics
  3. Score above upper threshold → auto-match
  4. Score in review zone → route to manual review queue
  5. Score below lower threshold → treat as new patient; create new record

The sequence matters. Identifier matching is a prerequisite gate — if two records share the same SSN with the same issuer, probabilistic scoring of demographics is redundant. Only invoke probabilistic matching when identifier matching produces no result.


Master Patient Index (MPI)

A Master Patient Index is the architectural component responsible for patient identity management: storing identity records, executing matching logic, maintaining links between records across systems, and providing a golden record.

Enterprise MPI vs facility MPI

TypeScopeTypical Deployment
Facility MPIOne organisation or one EHR instanceBuilt into the EHR registration module
Enterprise MPIMultiple facilities, systems, or organisationsStandalone MPI product (IBM InfoSphere, NextGate, Rhapsody EMPI)
Regional / HIE MPIPatient population across an entire region or stateHealth Information Exchange infrastructure

A facility MPI handles the internal problem: deduplication within one organisation’s system. An enterprise MPI handles the cross-system problem: linking a patient’s records across the organisation’s multiple facilities, acquired practices, and partner organisations.

The MPI integration pattern

The correct integration pattern for any new patient encounter:

  1. Registration system receives patient demographics
  2. Before creating a new local record, query the MPI: “Does this patient already exist?”
  3. If MPI returns a match above the confidence threshold, link the new encounter to the existing MPI record
  4. If MPI returns a possible match, route to a review workflow
  5. If MPI returns no match, create a new MPI record and return the enterprise identifier
  6. Use the MPI-assigned enterprise identifier for all downstream linking

This pattern must be synchronous for registration workflows — the registration system needs the MPI result before completing the encounter. For asynchronous enrichment (e.g., adding an MPI enterprise ID to records that predate the MPI implementation), the MPI processes a batch and sends back enrichment updates.

Golden record

A golden record (also called a master record) is the authoritative composite view of a patient’s identity, constructed by survivorship rules from the linked source records. When two records are merged, survivorship rules determine which field value wins:

  • Most recently updated field value
  • Most complete (non-null) value
  • Source system priority (EHR takes precedence over registration kiosk)
  • Most frequently occurring value across linked records

Survivorship rules must be defined explicitly in your MPI configuration. The defaults in most MPI products are not clinically validated — review them with clinical and operational stakeholders.


FHIR $match operation

FHIR R4 defines the $match operation on the Patient resource, enabling a FHIR-native patient identity query. The operation takes a candidate patient record and returns a scored list of potential matches.

Request

POST /Patient/$match

The request body is a Parameters resource containing:

  • resource: a Patient resource representing the search candidate (does not need to exist in the server)
  • onlyCertainMatches: boolean — if true, return only matches above the certain-match threshold
  • count: maximum number of results to return
{
  "resourceType": "Parameters",
  "parameter": [
    {
      "name": "resource",
      "resource": {
        "resourceType": "Patient",
        "name": [
          {
            "family": "Smith",
            "given": ["John", "A"]
          }
        ],
        "birthDate": "1978-03-15",
        "gender": "male",
        "address": [
          {
            "line": ["123 Main St"],
            "city": "Springfield",
            "state": "IL",
            "postalCode": "62701"
          }
        ],
        "identifier": [
          {
            "system": "http://hl7.org/fhir/sid/us-ssn",
            "value": "999-32-1234"
          }
        ]
      }
    },
    {
      "name": "onlyCertainMatches",
      "valueBoolean": false
    },
    {
      "name": "count",
      "valueInteger": 5
    }
  ]
}

Response

The server returns a Bundle of type searchset. Each matched Patient entry includes a search.score (0.0 to 1.0) and an extension carrying the match grade:

{
  "resourceType": "Bundle",
  "type": "searchset",
  "total": 2,
  "entry": [
    {
      "fullUrl": "https://example.org/fhir/Patient/pat-001",
      "resource": {
        "resourceType": "Patient",
        "id": "pat-001",
        "name": [{ "family": "Smith", "given": ["John", "Arthur"] }],
        "birthDate": "1978-03-15",
        "gender": "male"
      },
      "search": {
        "extension": [
          {
            "url": "http://hl7.org/fhir/StructureDefinition/match-grade",
            "valueCode": "certain"
          }
        ],
        "mode": "match",
        "score": 0.97
      }
    },
    {
      "fullUrl": "https://example.org/fhir/Patient/pat-142",
      "resource": {
        "resourceType": "Patient",
        "id": "pat-142",
        "name": [{ "family": "Smith", "given": ["Jon"] }],
        "birthDate": "1978-03-15",
        "gender": "male"
      },
      "search": {
        "extension": [
          {
            "url": "http://hl7.org/fhir/StructureDefinition/match-grade",
            "valueCode": "possible"
          }
        ],
        "mode": "match",
        "score": 0.71
      }
    }
  ]
}

The match-grade extension values are: certain, probable, possible, certainly-not. The score is the server’s confidence (0.0–1.0). The HL7 Patient Matching IG specifies expected server behaviour in detail; server implementations vary — document your server’s threshold configuration.

What $match does not do

$match is a query operation, not a merge operation. It returns candidate matches for client-side decision-making. The client is responsible for deciding whether to create a link, route to manual review, or treat the candidate as a new patient. The server does not automatically merge records or update Patient.link as a result of $match.


FHIR’s Patient.link element represents a known relationship between two Patient records on the same or different servers. It is how you express “this record is a duplicate of that record” or “queries about this patient should be redirected to that record.”

linkType values

linkTypeMeaningWhen to use
replaced-byThis record has been superseded; use the linked recordDeprecated local record replaced by enterprise MPI record
replacesThis record is the replacement for the linked recordThe enterprise golden record that superseded a local record
referRedirect queries to the linked recordLocal stub record pointing to authoritative record on another server
seealsoThe linked record concerns the same patient but is not a replacementRelated records in distinct systems where neither supersedes the other

A correct merge pattern uses replaced-by on the deprecated record and replaces on the surviving record. Both links must be populated — replaced-by without the corresponding replaces on the other record is an inconsistent state.


Deduplication strategy

Upstream prevention

The cheapest deduplication is preventing duplicates at creation. At registration, always query the MPI before creating a new record. Enforce required identifier capture at registration (insurance ID or government-issued ID). Train registration staff on duplicate prevention. Automated prevention is more reliable than manual deduplication after the fact.

Downstream resolution

When duplicates already exist — and in any organisation of meaningful size, they do — resolution requires a structured deduplication pipeline:

  1. Candidate generation: use blocking strategies to identify candidate pairs without comparing every record against every other record (quadratic complexity). Block on phonetic name encoding, ZIP code, DOB year.
  2. Pair scoring: apply probabilistic matching to each candidate pair.
  3. Survivor selection: apply survivorship rules to determine which record’s field values appear on the golden record.
  4. Merge execution: create or update the golden record; set Patient.link on all constituent records; mark superseded records with replaced-by links.
  5. Downstream notification: notify downstream systems of the merge so they can update their local references.

Step 5 is frequently omitted and causes downstream inconsistency. When a merge occurs in the MPI, every system that holds a reference to the deprecated patient identifier must be notified. In HL7 v2, this is an ADT^A40 (Merge Patient) message. In FHIR, this can be modelled as a subscription notification or as an explicit merge operation on the MPI server.


Common failure modes

Relying on MRN as a universal identifier. An MRN is meaningful only within the issuing organisation’s system. Sending an MRN to another organisation without the identifier system URI makes it unresolvable. Include identifier.system on every Patient resource, always.

Missing identifier system URIs. A FHIR Patient.identifier with a value but no system cannot be matched deterministically. The system URI defines the namespace — without it, you cannot know whether two identifiers with the same value are the same patient or a collision across different systems.

Demographic drift. Patient demographics change over time — address, phone, insurance, name. Records created years ago diverge from current reality. A probabilistic matcher calibrated on current data may fail to match old records that have drifted. Include an identifier-based match as the primary path; do not rely solely on demographics for long-lived records.

Name and gender changes not propagated. When a patient legally changes their name or gender, some systems update the record while others do not. This creates demographic inconsistency that breaks probabilistic matching. Propagating demographic updates across systems is a patient safety requirement, not just a data quality concern.

Treating match score as binary. A score of 0.97 is not the same as a score of 0.72. Implement a manual review workflow for scores in the uncertain range. Auto-accepting all scores above any non-zero threshold produces false positive merges. The review queue is not a failure of the matching system — it is correct behaviour.

Section: interop Content Type: pattern Audience: technical
Interoperability Level:
Structural
Published: 18/03/2024 Modified: 26/11/2025 14 min read
Keywords: patient matching MPI master patient index FHIR $match probabilistic matching deduplication patient identity demographic matching golden record
Sources: