Patient Identity Matching

Patient matching is the problem of determining whether two records in different systems (or in the same system at different times) refer to the same person. It sounds straightforward. It is not.

The United States has no universal patient identifier. The 1996 HIPAA legislation originally called for one; Congress prohibited its development due to privacy concerns and has renewed that prohibition annually ever since. Every provider organisation assigns its own Medical Record Number (MRN). The same patient has a different MRN at every hospital they have visited, a different identifier with their insurer, and possibly no shared identifier at all with a specialist they were referred to.

Demographic data compounds the problem. Names change — through marriage, divorce, or preference. Addresses change. Date of birth is transcribed wrong. Middle names and suffixes are inconsistently captured. A patient named “María García-Lopez” may appear as “Maria Garcia Lopez,” “M. Lopez,” or “Garcia Maria” depending on which front-desk staff entered the registration data.

The consequences of getting matching wrong are asymmetric. A false positive — linking the wrong patient’s records — exposes the receiving clinician to another patient’s allergies, diagnoses, and medications. That is a patient safety event. A false negative — failing to match records that belong to the same person — results in fragmented records, repeated tests, and incomplete clinical pictures. False negatives are bad; false positives are worse. Matching algorithms must be calibrated to prioritise avoiding false positives.

Deterministic matching

Deterministic matching asserts a match when one or more identifiers are exactly equal across records. It is the simplest and most reliable form of matching when it works.

Common identifier types used for deterministic matching:

Identifier Type	Reliability	Limitation
Medical Record Number (MRN)	High within one organisation	Facility-specific; not portable across organisations
Social Security Number (SSN)	High	Often partially collected; privacy-sensitive; errors in entry
Insurance Member ID	High within one payer	Changes when insurance changes; same patient has multiple IDs across payers
National Provider Identifier (NPI)	N/A	This is a provider identifier, not a patient identifier — do not confuse them
Driver’s License Number	Medium	Not consistently collected; varies by state format
Passport Number	Low	Infrequently collected in clinical settings

Deterministic matching fails when:

Source systems use different identifier namespaces (your MRN is not the same as the referring hospital’s MRN)
The identifier was not collected at registration
The identifier was entered incorrectly

Never rely solely on MRN for cross-organisational matching. An MRN is a facility-local identifier. Sending it to another organisation is meaningless unless both systems share the same identifier namespace — which they generally do not.

In FHIR, every Patient.identifier must include a system URI that identifies the issuing organisation’s namespace. An identifier without a system is unresolvable across systems and cannot be used for deterministic matching. This is one of the most common FHIR implementation failures.

Probabilistic matching

Probabilistic matching assigns a composite similarity score across multiple demographic fields and uses threshold rules to decide: match, possible match (review), or no match.

Demographic fields and matching algorithms

Field	Algorithm	Notes
Last name	Jaro-Winkler similarity	Handles transpositions and typos; weight heavily
First name	Jaro-Winkler similarity	Middle name sometimes more distinctive than first
Date of birth	Exact match or digit transposition check	High discriminating power; transcription errors common
Sex / gender	Exact match	Low discriminating power alone
Address — street	Jaro-Winkler or standardised comparison	Normalise with postal address standardisation first
Address — city	Exact or fuzzy	Low discriminating power alone
Address — ZIP code	Exact match	Moderate discriminating power
Phone number	Exact match (normalised)	Changes frequently; use with low weight
SSN (last 4)	Exact match	Moderate discriminating power; privacy constraints

Jaro-Winkler is preferred over Levenshtein for name matching in patient data because it gives higher weight to prefix agreement (the first few characters are more likely to be correct than later characters in a transcription error scenario) and is robust to character transpositions.

Threshold configuration

A probabilistic matcher produces a score between 0 and 1 (or an equivalent weighted score). You define two thresholds:

score < lower threshold  → No match
lower threshold ≤ score < upper threshold  → Possible match (manual review queue)
score ≥ upper threshold  → Automatic match

The gap between the two thresholds is your review zone. A narrow gap reduces manual review volume but increases both false positives and false negatives. A wide gap increases review volume but catches more uncertain cases before they become errors.

Calibrate thresholds against your actual patient population and your actual data quality. A paediatric hospital has a different name distribution than an urban safety-net hospital. A system that collects SSN consistently can weight it more heavily than one that only captures it for Medicare patients. Threshold values from a reference implementation are not safe to adopt without validation against your data.

Phonetic algorithms

Soundex and Metaphone are phonetic encoding algorithms — they encode names by how they sound rather than how they are spelled. They are useful as a blocking strategy (pre-filter candidates before computing Jaro-Winkler) but should not be the primary matching algorithm. Jaro-Winkler is more accurate for clinical name matching.

Hybrid approach

In practice, the most reliable matching strategy combines deterministic and probabilistic matching in sequence:

Attempt deterministic match on shared identifiers (insurance ID, SSN if available)
If no identifier match, apply probabilistic scoring across demographics
Score above upper threshold → auto-match
Score in review zone → route to manual review queue
Score below lower threshold → treat as new patient; create new record

The sequence matters. Identifier matching is a prerequisite gate — if two records share the same SSN with the same issuer, probabilistic scoring of demographics is redundant. Only invoke probabilistic matching when identifier matching produces no result.

Master Patient Index (MPI)

A Master Patient Index is the architectural component responsible for patient identity management: storing identity records, executing matching logic, maintaining links between records across systems, and providing a golden record.

Enterprise MPI vs facility MPI

Type	Scope	Typical Deployment
Facility MPI	One organisation or one EHR instance	Built into the EHR registration module
Enterprise MPI	Multiple facilities, systems, or organisations	Standalone MPI product (IBM InfoSphere, NextGate, Rhapsody EMPI)
Regional / HIE MPI	Patient population across an entire region or state	Health Information Exchange infrastructure

A facility MPI handles the internal problem: deduplication within one organisation’s system. An enterprise MPI handles the cross-system problem: linking a patient’s records across the organisation’s multiple facilities, acquired practices, and partner organisations.

The MPI integration pattern

The correct integration pattern for any new patient encounter:

Registration system receives patient demographics
Before creating a new local record, query the MPI: “Does this patient already exist?”
If MPI returns a match above the confidence threshold, link the new encounter to the existing MPI record
If MPI returns a possible match, route to a review workflow
If MPI returns no match, create a new MPI record and return the enterprise identifier
Use the MPI-assigned enterprise identifier for all downstream linking

This pattern must be synchronous for registration workflows — the registration system needs the MPI result before completing the encounter. For asynchronous enrichment (e.g., adding an MPI enterprise ID to records that predate the MPI implementation), the MPI processes a batch and sends back enrichment updates.

Golden record

A golden record (also called a master record) is the authoritative composite view of a patient’s identity, constructed by survivorship rules from the linked source records. When two records are merged, survivorship rules determine which field value wins:

Most recently updated field value
Most complete (non-null) value
Source system priority (EHR takes precedence over registration kiosk)
Most frequently occurring value across linked records

Survivorship rules must be defined explicitly in your MPI configuration. The defaults in most MPI products are not clinically validated — review them with clinical and operational stakeholders.

FHIR $match operation

FHIR R4 defines the $match operation on the Patient resource, enabling a FHIR-native patient identity query. The operation takes a candidate patient record and returns a scored list of potential matches.

Request

POST /Patient/$match

The request body is a Parameters resource containing:

resource: a Patient resource representing the search candidate (does not need to exist in the server)
onlyCertainMatches: boolean — if true, return only matches above the certain-match threshold
count: maximum number of results to return

{
  "resourceType": "Parameters",
  "parameter": [
    {
      "name": "resource",
      "resource": {
        "resourceType": "Patient",
        "name": [
          {
            "family": "Smith",
            "given": ["John", "A"]
          }
        ],
        "birthDate": "1978-03-15",
        "gender": "male",
        "address": [
          {
            "line": ["123 Main St"],
            "city": "Springfield",
            "state": "IL",
            "postalCode": "62701"
          }
        ],
        "identifier": [
          {
            "system": "http://hl7.org/fhir/sid/us-ssn",
            "value": "999-32-1234"
          }
        ]
      }
    },
    {
      "name": "onlyCertainMatches",
      "valueBoolean": false
    },
    {
      "name": "count",
      "valueInteger": 5
    }
  ]
}

Response

The server returns a Bundle of type searchset. Each matched Patient entry includes a search.score (0.0 to 1.0) and an extension carrying the match grade:

{
  "resourceType": "Bundle",
  "type": "searchset",
  "total": 2,
  "entry": [
    {
      "fullUrl": "https://example.org/fhir/Patient/pat-001",
      "resource": {
        "resourceType": "Patient",
        "id": "pat-001",
        "name": [{ "family": "Smith", "given": ["John", "Arthur"] }],
        "birthDate": "1978-03-15",
        "gender": "male"
      },
      "search": {
        "extension": [
          {
            "url": "http://hl7.org/fhir/StructureDefinition/match-grade",
            "valueCode": "certain"
          }
        ],
        "mode": "match",
        "score": 0.97
      }
    },
    {
      "fullUrl": "https://example.org/fhir/Patient/pat-142",
      "resource": {
        "resourceType": "Patient",
        "id": "pat-142",
        "name": [{ "family": "Smith", "given": ["Jon"] }],
        "birthDate": "1978-03-15",
        "gender": "male"
      },
      "search": {
        "extension": [
          {
            "url": "http://hl7.org/fhir/StructureDefinition/match-grade",
            "valueCode": "possible"
          }
        ],
        "mode": "match",
        "score": 0.71
      }
    }
  ]
}

The match-grade extension values are: certain, probable, possible, certainly-not. The score is the server’s confidence (0.0–1.0). The HL7 Patient Matching IG specifies expected server behaviour in detail; server implementations vary — document your server’s threshold configuration.

What $match does not do

$match is a query operation, not a merge operation. It returns candidate matches for client-side decision-making. The client is responsible for deciding whether to create a link, route to manual review, or treat the candidate as a new patient. The server does not automatically merge records or update Patient.link as a result of $match.

Patient.link

FHIR’s Patient.link element represents a known relationship between two Patient records on the same or different servers. It is how you express “this record is a duplicate of that record” or “queries about this patient should be redirected to that record.”

linkType values

linkType	Meaning	When to use
`replaced-by`	This record has been superseded; use the linked record	Deprecated local record replaced by enterprise MPI record
`replaces`	This record is the replacement for the linked record	The enterprise golden record that superseded a local record
`refer`	Redirect queries to the linked record	Local stub record pointing to authoritative record on another server
`seealso`	The linked record concerns the same patient but is not a replacement	Related records in distinct systems where neither supersedes the other

A correct merge pattern uses replaced-by on the deprecated record and replaces on the surviving record. Both links must be populated — replaced-by without the corresponding replaces on the other record is an inconsistent state.

Deduplication strategy

Upstream prevention

The cheapest deduplication is preventing duplicates at creation. At registration, always query the MPI before creating a new record. Enforce required identifier capture at registration (insurance ID or government-issued ID). Train registration staff on duplicate prevention. Automated prevention is more reliable than manual deduplication after the fact.

Downstream resolution

When duplicates already exist — and in any organisation of meaningful size, they do — resolution requires a structured deduplication pipeline:

Candidate generation: use blocking strategies to identify candidate pairs without comparing every record against every other record (quadratic complexity). Block on phonetic name encoding, ZIP code, DOB year.
Pair scoring: apply probabilistic matching to each candidate pair.
Survivor selection: apply survivorship rules to determine which record’s field values appear on the golden record.
Merge execution: create or update the golden record; set Patient.link on all constituent records; mark superseded records with replaced-by links.
Downstream notification: notify downstream systems of the merge so they can update their local references.

Step 5 is frequently omitted and causes downstream inconsistency. When a merge occurs in the MPI, every system that holds a reference to the deprecated patient identifier must be notified. In HL7 v2, this is an ADT^A40 (Merge Patient) message. In FHIR, this can be modelled as a subscription notification or as an explicit merge operation on the MPI server.

Common failure modes

Relying on MRN as a universal identifier. An MRN is meaningful only within the issuing organisation’s system. Sending an MRN to another organisation without the identifier system URI makes it unresolvable. Include identifier.system on every Patient resource, always.

Missing identifier system URIs. A FHIR Patient.identifier with a value but no system cannot be matched deterministically. The system URI defines the namespace — without it, you cannot know whether two identifiers with the same value are the same patient or a collision across different systems.

Demographic drift. Patient demographics change over time — address, phone, insurance, name. Records created years ago diverge from current reality. A probabilistic matcher calibrated on current data may fail to match old records that have drifted. Include an identifier-based match as the primary path; do not rely solely on demographics for long-lived records.

Name and gender changes not propagated. When a patient legally changes their name or gender, some systems update the record while others do not. This creates demographic inconsistency that breaks probabilistic matching. Propagating demographic updates across systems is a patient safety requirement, not just a data quality concern.

Treating match score as binary. A score of 0.97 is not the same as a score of 0.72. Implement a manual review workflow for scores in the uncertain range. Auto-accepting all scores above any non-zero threshold produces false positive merges. The review queue is not a failure of the matching system — it is correct behaviour.

Section: interop Content Type: pattern Audience: technical

Interoperability Level:

Published: 18/03/2024 Modified: 26/11/2025 14 min read

Keywords: patient matching MPI master patient index FHIR $match probabilistic matching deduplication patient identity demographic matching golden record

Sources:

HL7 FHIR Patient Matching Implementation Guide (Accessed: 26/11/2025)
FHIR R4 Patient $match Operation (Accessed: 26/11/2025)
FHIR R4 Patient.link (Accessed: 26/11/2025)
ONC Patient Matching Project (Accessed: 26/11/2025)