Add causal graphs and what-if checks to RAG to reduce hallucinations and improve causal answers

September 17, 20256 min

Overview

Decision SnapshotNeeds Validation

The idea is novel in combining explicit causal graphs with programmatic counterfactual checks. Results show clear metric gains on the authors' evaluations, but those use a custom dataset and an LLM judge, limiting external validation.

Citations0

Evidence Strength0.60

Confidence0.85

Risk Signals10

Trust Signals

Findings with numeric evidence: 2/3

Findings with evidence refs: 3/3

Results with explicit delta: 4/4

Reproducibility

Status: No open assets linked

Open source: Unknown

At A Glance

Cost impact: 30%

Production readiness: 40%

Novelty: 70%

Authors

Harshad Khadilkar, Abhay Gupta

Links

Abstract / PDF / Data

Why It Matters For Business

If your product needs trustworthy causal answers (for diagnostics, policy, medical reasoning, or financial analysis), adding causal graphs plus counterfactual checks can cut incorrect causal claims and improve interpretability. Expect higher compute and latency costs.

Who Should Care

Summary TLDR

This paper builds a Retrieval-Augmented Generation (RAG) pipeline that stores cause-effect pairs in a causal knowledge graph (CKG), retrieves candidates with a two-stage vector+LLM check, and then runs programmatic counterfactual simulations to test whether retrieved causes are truly necessary. On their evaluations, this approach raises precision and causal reasoning scores versus a standard semantic-similarity RAG, at the cost of extra LLM calls and higher latency.

Problem Statement

Standard RAG fetches text by semantic similarity, which often returns superficially relevant but causally incorrect information. RAG systems lack explicit causal grounding and rarely test counterfactuals, so they can produce plausible-looking but unreliable causal claims.

Main Contribution

A pipeline (Causal-Counterfactual RAG) that constructs a Causal Knowledge Graph (CKG) from documents and stores traceable cause-effect pairs.

A two-stage retrieval: fast vector search followed by LLM-based semantic+polarity verification to avoid context-mismatched matches.

Key Findings

Causal-Counterfactual RAG yields substantially higher precision than Regular RAG on evaluated benchmarks.

NumbersPrecision: 80.57 vs 60.13 (Regular RAG)

Practical UseExpect fewer irrelevant/incorrect retrieved documents when answering causally framed queries; useful when correctness matters more than raw coverage.

Evidence RefSection 5.2, Figure 3

Causal-Counterfactual RAG improves causal reasoning metrics over Regular RAG.

NumbersCausal Chain Integrity Score: 75.58 vs 53.62; Counterfactual Robustness: 69.90 vs 49.12

Practical UseModels produce more logically consistent causal explanations on tested datasets; better choice for causal analysis or diagnosis tasks.

Evidence RefSection 5.2, Figure 3

Results

MetricValueBaselineDeltaSplit / DatasetEvidenceEvidence Ref
Precision80.57Regular RAG: 60.13+20.44custom causal QA + OpenAlex experimentsHigher precision reported for Causal-Counterfactual RAG vs Regular RAGSection 5.2, Figure 3
Recall78.18Regular RAG: 74.58+3.60custom causal QA + OpenAlex experimentsRecall slightly higher for proposed methodSection 5.2, Figure 3

What To Try In 7 Days

Build a tiny Causal Knowledge Graph from 100 domain docs using an embedding model and store (cause,effect) pairs.

Add a two-stage retrieval: vector nearest neighbors then a small LLM prompt to verify polarity and semantic match.

Implement one counterfactual check per query: generate a plausible opposite of a top cause and re-run retrieval to see if the outcome persists.

Agent Features

Memory
retrieval memory
Tool Use
LLMs for extraction and verificationvector DB (Neo4j) for fast nearest neighbor searchembedding models for semantic encoding
Frameworks
RAGCausal Knowledge Graph (CKG)
Architectures
retrieval+LLM pipelineknowledge-graph-backed retrieval

Optimization Features

Infra Optimization
use of vector index (Neo4j) for fast search; judge LLM deployed via Groq

Reproducibility

Code AvailableNo
Data AvailableNo
Open Source StatusUnknown
LicenseUnknown

Data URLs

OpenAlex (used as corpus)

Risks & Boundaries

Limitations

Relies on LLMs to construct the CKG; errors can enshrine false causal links.

Counterfactual generation can produce implausible alternatives, corrupting validation.

When Not To Use

When strict low-latency, real-time responses are required.

For simple fact lookups where semantic retrieval suffices.

Failure Modes

Graph contains fabricated or misinterpreted cause-effect pairs -> wrong 'ground truth'.

LLM generates illogical counterfactuals -> wrong necessity judgments.

Core Entities

Models

Gemini 1.5SentenceTransformer all-MiniLM-L6-v2LLaMA-3.1-8B-Instant

Metrics

PrecisionRecallCausal Chain Integrity Score (CCIS/CIS)Counterfactual Robustness Score (CRS)

Datasets

OpenAlex corpuscustom causal QA dataset (generated per-document)