Overview
Results show consistent gains on five benchmarks, ablations confirm component roles, and robustness tests demonstrate survival under heavy KG sparsity.
Citations0
Evidence Strength0.80
Confidence0.90
Risk Signals10
Trust Signals
Findings with numeric evidence: 4/4
Findings with evidence refs: 4/4
Results with explicit delta: 3/3
Reproducibility
Status: Code + data available
Open source: Partial
At A Glance
Cost impact: 60%
Production readiness: 60%
Novelty: 70%
Why It Matters For Business
Relink reduces multi-hop QA errors and increases robustness by building only the facts a query needs, cutting wrong reasoning and making answers easier to verify.
Who Should Care
Summary TLDR
Relink replaces the usual static knowledge-graph then-reason pipeline with a "reason-and-construct" flow that builds a compact, query-specific evidence graph. It combines a high-precision KG backbone with a high-recall pool of latent relations (from entity co-occurrence + PMI). A query-driven ranker (coarse trainable ranker + LLM re-ranker) iteratively selects edges; when needed an LLM instantiates latent relations into factual triples. On five multi-hop QA benchmarks Relink improves average EM by 5.4% and F1 by 5.2% over strong GraphRAG baselines and stays robust when most KG edges are removed.
Problem Statement
GraphRAG methods rely on a static, pre-built knowledge graph. Static KGs are often incomplete and contain many query-relevant but misleading facts. This breaks multi-hop reasoning chains and amplifies distractors, so systems need a way to dynamically repair missing links and filter out misleading KG facts.
Main Contribution
Diagnose the limits of the build-then-reason paradigm: KG incompleteness and distractor facts break GraphRAG reasoning.
Propose Relink: a reason-and-construct framework that dynamically builds a compact, query-specific evidence graph from a factual KG plus a latent relation pool.
Key Findings
Relink yields consistent accuracy gains over leading GraphRAG baselines on five multi-hop QA datasets.
On 2WikiMultiHopQA Relink achieves EM=0.628 and F1=0.722.
Results
| Metric | Value | Baseline | Delta | Split / Dataset | Evidence | Evidence Ref |
|---|---|---|---|---|---|---|
| EM (2WikiMultiHopQA) | 0.628 | HippoRAG 0.578 | +0.050 | 2WikiMultiHopQA test(500 samples) | Table 1 shows Relink EM 0.628 vs HippoRAG 0.578 | Table 1 |
| EM (HotpotQA) | 0.558 | HippoRAG 0.498 | +0.060 | HotpotQA test(500 samples) | Table 1 shows Relink EM 0.558 vs HippoRAG 0.498 | Table 1 |
What To Try In 7 Days
Run Relink-style pipeline on a small QA slice: add a PMI-based latent relation pool from your corpus.
Train a lightweight coarse ranker to prioritize candidates for a few-hot paths and compare EM/F1 to your static KG baseline.
Use an LLM to instantiate top latent relations and inspect provenance for a handful of failing queries.
Reproducibility
Risks & Boundaries
Limitations
Relies on LLM quality to instantiate latent relations; poor LLM outputs can introduce false facts.
Latent relation pool built from co-occurrence + PMI may surface spurious links without semantic filtering.
When Not To Use
When strict, immutable provenance is required and generated relations are unacceptable.
In low-latency or low-cost environments where extra LLM calls are prohibitive.
Failure Modes
LLM-instantiated relations hallucinate plausible but incorrect triples.
Ranker fails to distinguish useful vs. merely related facts, letting distractors through.

