Using RAG and MCP to Accelerate Healthcare Data Mapping

A practical look at using authoritative legacy mappings, retrieval, and human review to help implementation engineers map customer EMR data into a common analytics model.

May 1, 2026

AI SystemsKnowledge SystemsHealthcare

Healthcare analytics platforms depend on a hard, unglamorous step before the interesting analysis can begin: getting customer data mapped into a common model.

For a new customer, that usually means understanding the source EMR schema, finding the right target tables and columns, writing transformation logic, validating the result, and repeating that process until the customer’s data can support meaningful analytics work.

In this environment, that mapping process often took four to five weeks. The delay mattered because most downstream analytics work could not begin until the mapping was far enough along to trust.

The company also had a major advantage: nearly 15 years of mapping history embedded in a legacy healthcare analytics platform. Over that time, every major EMR had been mapped into the system. The knowledge existed, but it was scattered across code repositories, SQL, Python notebooks, YAML definitions, historical client mappings, and the experience of implementation engineers.

The opportunity was not to replace those engineers. The opportunity was to give them better starting context.

Making Old Mappings Useful Again

The project used retrieval-augmented generation and MCP to expose existing mapping knowledge inside an AI-assisted Databricks workflow.

The corpus came primarily from two source repositories. Those repositories included SQL statements, Python notebooks, and YAML files that described target column names and the intended meaning of each column. We treated three sources as authoritative: canonical mappings from existing clients, vendor-to-target crosswalk data, and the target model descriptions.

The retrieval structure followed the way implementation engineers think about the work: EMR vendor first, then target table and column. Metadata attached to retrieved chunks included vendor, source table and column, target table and column, domain area, mapping type, and client or project context.

That structure mattered because healthcare source systems are not uniform. Even within one EMR family, different versions can have meaningfully different schemas. A mapping pattern that works for one version may be wrong for another.

The MCP Interface

I built the prototype as a Python FastAPI service that exposes MCP tools to the Databricks coding workflow.

The service extracts knowledge from the source repositories, builds a FAISS search index, and makes that index available through tools such as mapping search, field-context retrieval, prior-example retrieval, target-model suggestions, and notebook-scaffolding support. It runs as an Azure Container App.

The index refresh process is automated. When changes are pushed to the main branch of either source repository, or on a nightly schedule, the service clones the repositories, reruns extraction, and generates a fresh index.

In practice, an implementation engineer can identify the customer’s EMR vendor and ask the Databricks assistant for candidate mappings. The MCP server retrieves relevant prior examples and target model context. The assistant then uses that context to propose mapping logic, usually SQL, occasionally Python, along with source and target fields and a confidence score.

The intended output is not production code dropped blindly into place. It is a better first draft: mapping logic the engineer can inspect, adjust, validate, and turn into the Databricks Asset Bundle notebooks used for implementation work.

Keeping Humans In The Loop

The most important design constraint was review.

Nothing writes directly to production. The system can propose mappings, explain why a prior example looks relevant, and generate candidate SQL, but an implementation engineer still validates the work.

We evaluate output through a few practical measures: fill rate, cardinality, and human review. Fill rate helps show whether the mapping produces usable coverage. Cardinality helps catch shape problems. Human review catches the domain-specific issues that metrics alone miss.

The system also lets engineers submit observations back into the application. That matters because some of the highest-value mapping knowledge is still tribal. An engineer might know, for example, that a specific version of a vendor system stores patient encounter data in a way that differs from other versions. That observation can become future retrieval context.

Those observations are stored separately and queued for human review before they become part of the corpus. Review happens through a Python Databricks notebook, so the feedback loop improves the system without letting unreviewed notes quietly become authoritative.

Security Boundaries

The system is designed around metadata rather than patient data.

The retrieval corpus includes mappings, schema context, transformation logic, and target-model descriptions. It does not require sending source data values to the model. Tool calls can create data frames that describe tables and fields without exposing the actual contents of customer data.

The workflow also stays within the Databricks environment covered by the company’s existing agreement, which keeps the AI-assisted mapping process inside an approved operating boundary.

What I Learned

The hardest part was not just building the retrieval service. It was learning enough of the data engineering domain to know what a valid mapping looked like.

I am not a data engineer by trade, so the prototype required close work with the people who live in this process every day. That collaboration shaped the corpus, the retrieval model, the confidence signals, and the review workflow.

Adoption also required care. Some engineers were understandably skeptical of AI-generated mapping suggestions. That skepticism was healthy. The point was never to ask them to trust a black box. The point was to give them cited, reviewable, context-rich suggestions based on prior authoritative work.

The design target is to reduce new-client mapping from four or five weeks to a few days by starting engineers with a partially generated set of mapping logic. That work is still in progress, but the direction is clear.

AI is most useful when it is attached to authoritative context. In this case, the value is not that a model can invent mappings. The value is that it can help engineers reuse years of prior implementation knowledge, enrich it with reviewed tribal context, and keep humans responsible for the final decision.