Overview
The prototype shows clear gains in a biomedical use case and provides reproducible plans, but robustness, scaling, and consistent automation require more work and user oversight.
Citations1
Evidence Strength0.70
Confidence0.80
Risk Signals11
Trust Signals
Findings with numeric evidence: 3/3
Findings with evidence refs: 3/3
Results with explicit delta: 2/2
Reproducibility
Status: Code + data available
Open source: Yes
At A Glance
Cost impact: 50%
Production readiness: 40%
Novelty: 60%
Why It Matters For Business
Agentic harmonization can speed up combining heterogeneous datasets and produce reusable, publishable transformation scripts that improve reproducibility and reduce manual engineering time.
Who Should Care
Summary TLDR
This paper proposes "agentic" data harmonization: LLM-based agents that interact with users and modular data-integration routines to synthesize reusable harmonization pipelines. The authors implement Harmonia, a prototype that combines a library of primitives (bdi-kit), an LLM agent (GPT-4o via Archytas), and a chat-style UI (Beaker). In a clinical use case mapping a cohort to the GDC standard, Harmonia outperforms baseline primitives (schema-matching accuracy 1.0 vs 0.88; value-mapping F1 0.68 vs 0.57). The paper discusses practical limits: LLM brittleness, context-window and evaluation gaps, and the need for provenance, uncertainty, and better benchmarks.
Problem Statement
Combining tables from different sources requires mapping column names and standardizing values. Existing work uses scripts and ad-hoc tools that are slow, brittle, and poorly documented. LLMs can help with language and code, but they are inconsistent, sensitive to prompts, and do not by themselves provide scalable, reproducible harmonization pipelines.
Main Contribution
Present a vision for agentic data harmonization that combines LLM reasoning, user interaction, and composable primitives.
Introduce Harmonia, a working prototype that integrates bdi-kit primitives, Archytas-based LLM tool-calling, and a Beaker chat UI.
Key Findings
Harmonia produced perfect schema-matching on the evaluated use case.
LLM-augmented pipeline improved value-mapping F1 over the baseline primitives.
Results
| Metric | Value | Baseline | Delta | Split / Dataset | Evidence | Evidence Ref |
|---|---|---|---|---|---|---|
| Accuracy | 1.00 | 0.88 | +0.12 | Dou et al. cohort → GDC mapping | Table 1 reports Harmonia accuracy 1 and baseline 0.88 | Table 1 |
| Value mapping F1 | 0.68 | 0.57 | +0.11 | Dou et al. cohort → GDC mapping | Table 1 reports Harmonia F1 0.68 and baseline F1 0.57 | Table 1 |
What To Try In 7 Days
Clone the Harmonia repo and run the demo mapping a CSV to the GDC schema.
Replace one manual mapping script with a harmonization plan and materialize the mapping to save reproducible output.
Measure time and error rate vs your current manual harmonization process.
Agent Features
Memory
Planning
Tool Use
Frameworks
Is Agentic
Yes
Architectures
Collaboration
Optimization Features
Token Efficiency
Infra Optimization
System Optimization
Reproducibility
Data URLs
Risks & Boundaries
Limitations
LLM brittleness and inconsistent corrections across runs
Context-window problems for large tables and long workflows
When Not To Use
When you require fully automated, unattended harmonization for critical systems without human review
On extremely large schemas without external storage to manage context
Failure Modes
LLM hallucinations leading to incorrect mappings
Prompt sensitivity producing different pipelines for the same task

