Add causal graphs and what-if checks to RAG to reduce hallucinations and improve causal answers

Overview

Decision SnapshotNeeds Validation

The idea is novel in combining explicit causal graphs with programmatic counterfactual checks. Results show clear metric gains on the authors' evaluations, but those use a custom dataset and an LLM judge, limiting external validation.

Citations0

Evidence Strength0.60

Confidence0.85

Risk Signals10

Trust Signals

Findings with numeric evidence: 2/3

Findings with evidence refs: 3/3

Results with explicit delta: 4/4

Reproducibility

Status: No open assets linked

Open source: Unknown

At A Glance

Cost impact: 30%

Production readiness: 40%

Novelty: 70%

Authors

Harshad Khadilkar, Abhay Gupta

Links

Abstract / PDF / Data

Why It Matters For Business

If your product needs trustworthy causal answers (for diagnostics, policy, medical reasoning, or financial analysis), adding causal graphs plus counterfactual checks can cut incorrect causal claims and improve interpretability. Expect higher compute and latency costs.

Who Should Care

ML Engineer Product Manager Data Scientist CTO

Summary TLDR

This paper builds a Retrieval-Augmented Generation (RAG) pipeline that stores cause-effect pairs in a causal knowledge graph (CKG), retrieves candidates with a two-stage vector+LLM check, and then runs programmatic counterfactual simulations to test whether retrieved causes are truly necessary. On their evaluations, this approach raises precision and causal reasoning scores versus a standard semantic-similarity RAG, at the cost of extra LLM calls and higher latency.

Problem Statement

Standard RAG fetches text by semantic similarity, which often returns superficially relevant but causally incorrect information. RAG systems lack explicit causal grounding and rarely test counterfactuals, so they can produce plausible-looking but unreliable causal claims.

Main Contribution

A pipeline (Causal-Counterfactual RAG) that constructs a Causal Knowledge Graph (CKG) from documents and stores traceable cause-effect pairs.

A two-stage retrieval: fast vector search followed by LLM-based semantic+polarity verification to avoid context-mismatched matches.

Key Findings

Causal-Counterfactual RAG yields substantially higher precision than Regular RAG on evaluated benchmarks.

NumbersPrecision: 80.57 vs 60.13 (Regular RAG)

Practical UseExpect fewer irrelevant/incorrect retrieved documents when answering causally framed queries; useful when correctness matters more than raw coverage.

Evidence RefSection 5.2, Figure 3

Causal-Counterfactual RAG improves causal reasoning metrics over Regular RAG.

NumbersCausal Chain Integrity Score: 75.58 vs 53.62; Counterfactual Robustness: 69.90 vs 49.12

Practical UseModels produce more logically consistent causal explanations on tested datasets; better choice for causal analysis or diagnosis tasks.

Evidence RefSection 5.2, Figure 3

Results

Metric	Value	Baseline	Delta	Split / Dataset	Evidence	Evidence Ref
Precision	80.57	Regular RAG: 60.13	+20.44	custom causal QA + OpenAlex experiments	Higher precision reported for Causal-Counterfactual RAG vs Regular RAG	Section 5.2, Figure 3
Recall	78.18	Regular RAG: 74.58	+3.60	custom causal QA + OpenAlex experiments	Recall slightly higher for proposed method	Section 5.2, Figure 3

What To Try In 7 Days

Build a tiny Causal Knowledge Graph from 100 domain docs using an embedding model and store (cause,effect) pairs.

Add a two-stage retrieval: vector nearest neighbors then a small LLM prompt to verify polarity and semantic match.

Implement one counterfactual check per query: generate a plausible opposite of a top cause and re-run retrieval to see if the outcome persists.

Agent Features

Memory

retrieval memory

Tool Use

LLMs for extraction and verificationvector DB (Neo4j) for fast nearest neighbor searchembedding models for semantic encoding

Frameworks

RAGCausal Knowledge Graph (CKG)

Architectures

retrieval+LLM pipelineknowledge-graph-backed retrieval

Optimization Features

Infra Optimization

use of vector index (Neo4j) for fast search; judge LLM deployed via Groq

Reproducibility

Code AvailableNo

Data AvailableNo

Open Source StatusUnknown

LicenseUnknown

Data URLs

OpenAlex (used as corpus)

Risks & Boundaries

Limitations

Relies on LLMs to construct the CKG; errors can enshrine false causal links.

Counterfactual generation can produce implausible alternatives, corrupting validation.

When Not To Use

When strict low-latency, real-time responses are required.

For simple fact lookups where semantic retrieval suffices.

Failure Modes

Graph contains fabricated or misinterpreted cause-effect pairs -> wrong 'ground truth'.

LLM generates illogical counterfactuals -> wrong necessity judgments.

Core Entities

Models

Gemini 1.5SentenceTransformer all-MiniLM-L6-v2LLaMA-3.1-8B-Instant

Metrics

PrecisionRecallCausal Chain Integrity Score (CCIS/CIS)Counterfactual Robustness Score (CRS)

Datasets

OpenAlex corpuscustom causal QA dataset (generated per-document)

Overview

Trust Signals

Reproducibility

At A Glance

Authors

Links

Why It Matters For Business

Who Should Care

Summary TLDR

Problem Statement

Main Contribution

Key Findings

Causal-Counterfactual RAG yields substantially higher precision than Regular RAG on evaluated benchmarks.

Causal-Counterfactual RAG improves causal reasoning metrics over Regular RAG.

Results

What To Try In 7 Days

Agent Features

Optimization Features

Reproducibility

Data URLs

Risks & Boundaries

Limitations

When Not To Use

Failure Modes

Core Entities

Models

Metrics

Datasets

You May Also Want to Read

Turn an LLM output into a mini knowledge graph, check each fact with an NLI model, and get explainable hallucination flags

Key finding

Combine LLMs with a medical knowledge graph to get more accurate, verifiable scientific answers

Key finding

Use a personal causal graph so an LLM recommends foods that better lower your post-meal glucose

Key finding

A practical survey showing how knowledge graphs can make LLMs better at complex question answering

Key finding

MindMap: prompt LLMs with knowledge-graph evidence to produce explicit graph-style reasoning and reduce hallucination

Key finding