Build query-specific evidence graphs on the fly to fix missing links and filter distractor facts

Overview

Decision SnapshotReady For Pilot

Results show consistent gains on five benchmarks, ablations confirm component roles, and robustness tests demonstrate survival under heavy KG sparsity.

Citations0

Evidence Strength0.80

Confidence0.90

Risk Signals10

Trust Signals

Findings with numeric evidence: 4/4

Findings with evidence refs: 4/4

Results with explicit delta: 3/3

Reproducibility

Status: Code + data available

Open source: Partial

At A Glance

Cost impact: 60%

Production readiness: 60%

Novelty: 70%

Authors

Manzong Huang, Chenyang Bu, Yi He, Xingrui Zhuo, Xindong Wu

Links

Abstract / PDF / Code

Why It Matters For Business

Relink reduces multi-hop QA errors and increases robustness by building only the facts a query needs, cutting wrong reasoning and making answers easier to verify.

Who Should Care

ML Engineer Data Scientist CTO Product Manager Engineering Lead

Summary TLDR

Relink replaces the usual static knowledge-graph then-reason pipeline with a "reason-and-construct" flow that builds a compact, query-specific evidence graph. It combines a high-precision KG backbone with a high-recall pool of latent relations (from entity co-occurrence + PMI). A query-driven ranker (coarse trainable ranker + LLM re-ranker) iteratively selects edges; when needed an LLM instantiates latent relations into factual triples. On five multi-hop QA benchmarks Relink improves average EM by 5.4% and F1 by 5.2% over strong GraphRAG baselines and stays robust when most KG edges are removed.

Problem Statement

GraphRAG methods rely on a static, pre-built knowledge graph. Static KGs are often incomplete and contain many query-relevant but misleading facts. This breaks multi-hop reasoning chains and amplifies distractors, so systems need a way to dynamically repair missing links and filter out misleading KG facts.

Main Contribution

Diagnose the limits of the build-then-reason paradigm: KG incompleteness and distractor facts break GraphRAG reasoning.

Propose Relink: a reason-and-construct framework that dynamically builds a compact, query-specific evidence graph from a factual KG plus a latent relation pool.

Key Findings

Relink yields consistent accuracy gains over leading GraphRAG baselines on five multi-hop QA datasets.

Numbersavg +5.4% EM; avg +5.2% F1 across five benchmarks

Practical UseSwitching to dynamic, query-driven graph construction can raise multi-hop QA accuracy noticeably in practice.

Evidence RefAbstract; Experiments; Table 1

On 2WikiMultiHopQA Relink achieves EM=0.628 and F1=0.722.

Numbers2WikiMultiHopQA EM 0.628, F1 0.722

Practical UseExpect strong per-dataset gains for structured multi-hop queries when using Relink-like pipelines.

Evidence RefTable 1

Results

Metric	Value	Baseline	Delta	Split / Dataset	Evidence	Evidence Ref
EM (2WikiMultiHopQA)	0.628	HippoRAG 0.578	+0.050	2WikiMultiHopQA test(500 samples)	Table 1 shows Relink EM 0.628 vs HippoRAG 0.578	Table 1
EM (HotpotQA)	0.558	HippoRAG 0.498	+0.060	HotpotQA test(500 samples)	Table 1 shows Relink EM 0.558 vs HippoRAG 0.498	Table 1

What To Try In 7 Days

Run Relink-style pipeline on a small QA slice: add a PMI-based latent relation pool from your corpus.

Train a lightweight coarse ranker to prioritize candidates for a few-hot paths and compare EM/F1 to your static KG baseline.

Use an LLM to instantiate top latent relations and inspect provenance for a handful of failing queries.

Reproducibility

Code AvailableYes

Data AvailableYes

Open Source StatusPartial

LicenseUnknown

Code URLs

https://github.com/DMiC-Lab-HFUT/Relink

Risks & Boundaries

Limitations

Relies on LLM quality to instantiate latent relations; poor LLM outputs can introduce false facts.

Latent relation pool built from co-occurrence + PMI may surface spurious links without semantic filtering.

When Not To Use

When strict, immutable provenance is required and generated relations are unacceptable.

In low-latency or low-cost environments where extra LLM calls are prohibitive.

Failure Modes

LLM-instantiated relations hallucinate plausible but incorrect triples.

Ranker fails to distinguish useful vs. merely related facts, letting distractors through.

Core Entities

Models

deepseek-v3-0324gpt-4o-2024-07-06RAPTORGraphRAGHippoRAGG-RetrieverTOGVanilla RAG

Metrics

EMF1

Datasets

2WikiMultiHopQAHotpotQAConcurrentQAMuSiQue-AnsMuSiQue-Full

Benchmarks

2WikiMultiHopQAHotpotQAConcurrentQAMuSiQue-AnsMuSiQue-Full

Context Entities

Models

OpenAI text-embedding-3-small

Datasets

2WikiMultiHopQAHotpotQA

Overview

Trust Signals

Reproducibility

At A Glance

Authors

Links

Why It Matters For Business

Who Should Care

Summary TLDR

Problem Statement

Main Contribution

Key Findings

Relink yields consistent accuracy gains over leading GraphRAG baselines on five multi-hop QA datasets.

On 2WikiMultiHopQA Relink achieves EM=0.628 and F1=0.722.

Results

What To Try In 7 Days

Reproducibility

Code URLs

Risks & Boundaries

Limitations

When Not To Use

Failure Modes

Core Entities

Models

Metrics

Datasets

Benchmarks

Context Entities

Models

Datasets

You May Also Want to Read

Turn an LLM output into a mini knowledge graph, check each fact with an NLI model, and get explainable hallucination flags

Key finding

Combine LLMs with a medical knowledge graph to get more accurate, verifiable scientific answers

Key finding

Use a personal causal graph so an LLM recommends foods that better lower your post-meal glucose

Key finding

A practical survey showing how knowledge graphs can make LLMs better at complex question answering

Key finding

MindMap: prompt LLMs with knowledge-graph evidence to produce explicit graph-style reasoning and reduce hallucination

Key finding