Overview
Production Readiness
0.6
Novelty Score
0.7
Cost Impact Score
0.6
Citation Count
0
Why It Matters For Business
Relink reduces multi-hop QA errors and increases robustness by building only the facts a query needs, cutting wrong reasoning and making answers easier to verify.
Summary TLDR
Relink replaces the usual static knowledge-graph then-reason pipeline with a "reason-and-construct" flow that builds a compact, query-specific evidence graph. It combines a high-precision KG backbone with a high-recall pool of latent relations (from entity co-occurrence + PMI). A query-driven ranker (coarse trainable ranker + LLM re-ranker) iteratively selects edges; when needed an LLM instantiates latent relations into factual triples. On five multi-hop QA benchmarks Relink improves average EM by 5.4% and F1 by 5.2% over strong GraphRAG baselines and stays robust when most KG edges are removed.
Problem Statement
GraphRAG methods rely on a static, pre-built knowledge graph. Static KGs are often incomplete and contain many query-relevant but misleading facts. This breaks multi-hop reasoning chains and amplifies distractors, so systems need a way to dynamically repair missing links and filter out misleading KG facts.
Main Contribution
Diagnose the limits of the build-then-reason paradigm: KG incompleteness and distractor facts break GraphRAG reasoning.
Propose Relink: a reason-and-construct framework that dynamically builds a compact, query-specific evidence graph from a factual KG plus a latent relation pool.
Design a unified query-aware ranking and LLM-based instantiation pipeline and show consistent improvements across five ODQA benchmarks with robustness experiments and ablations.
Key Findings
Relink yields consistent accuracy gains over leading GraphRAG baselines on five multi-hop QA datasets.
On 2WikiMultiHopQA Relink achieves EM=0.628 and F1=0.722.
Relink is robust when the explicit KG is heavily degraded: it retains high F1 even with most edges removed.
Each core component (explicit KG, latent relation pool, query-driven ranker, contrastive alignment) contributes measurably.
Results
EM (2WikiMultiHopQA)
EM (HotpotQA)
Average improvement
Who Should Care
What To Try In 7 Days
Run Relink-style pipeline on a small QA slice: add a PMI-based latent relation pool from your corpus.
Train a lightweight coarse ranker to prioritize candidates for a few-hot paths and compare EM/F1 to your static KG baseline.
Use an LLM to instantiate top latent relations and inspect provenance for a handful of failing queries.
Reproducibility
Code Available
Data Available
Open Source Status
- partial
Risks & Boundaries
Limitations
- Relies on LLM quality to instantiate latent relations; poor LLM outputs can introduce false facts.
- Latent relation pool built from co-occurrence + PMI may surface spurious links without semantic filtering.
- Runtime cost rises due to LLM re-ranking and on-the-fly instantiation compared with static KG methods.
- Evaluation uses 500 sampled questions per dataset, which may limit variance estimates.
When Not To Use
- When strict, immutable provenance is required and generated relations are unacceptable.
- In low-latency or low-cost environments where extra LLM calls are prohibitive.
- When your corpus is too small for reliable co-occurrence statistics.
Failure Modes
- LLM-instantiated relations hallucinate plausible but incorrect triples.
- Ranker fails to distinguish useful vs. merely related facts, letting distractors through.
- High computation and latency from repeated LLM scoring and generation.
Core Entities
Models
- deepseek-v3-0324
- gpt-4o-2024-07-06
- RAPTOR
- GraphRAG
- HippoRAG
- G-Retriever
- TOG
- Vanilla RAG
Metrics
- EM
- F1
Datasets
- 2WikiMultiHopQA
- HotpotQA
- ConcurrentQA
- MuSiQue-Ans
- MuSiQue-Full
Benchmarks
- 2WikiMultiHopQA
- HotpotQA
- ConcurrentQA
- MuSiQue-Ans
- MuSiQue-Full
Context Entities
Models
- OpenAI text-embedding-3-small
Datasets
- 2WikiMultiHopQA
- HotpotQA

