E-KELL: a KG-backed LLM system that guides decisions with standards-based evidence to cut hallucinations

Overview

Decision SnapshotNeeds Validation

KG + prompt-chain shows clear gains on a focused case study and expert ratings, but evidence is limited to a small prototype, semi-automatic KG and 10 queries; expect more engineering and data work before production.

Citations4

Evidence Strength0.60

Confidence0.85

Risk Signals10

Trust Signals

Findings with numeric evidence: 3/4

Findings with evidence refs: 4/4

Results with explicit delta: 0/4

Reproducibility

Status: No open assets linked

Open source: No

At A Glance

Cost impact: 60%

Production readiness: 50%

Novelty: 60%

Authors

Minze Chen, Zhenxiang Tao, Weitong Tang, Tingxin Qin, Rui Yang, Chunli Zhu

Links

Abstract / PDF

Why It Matters For Business

For safety-critical operations, E-KELL-style KG+LLM reduces hallucination and ensures answers trace back to standards, lowering legal and operational risk while making guidance faster and more auditable.

Who Should Care

Product Manager ML Engineer CTO

Summary TLDR

E-KELL is a prototype emergency decision support system that stores Chinese emergency standards as a structured knowledge graph (2264 triples) and uses a prompt-chain to make an LLM reason over relevant KG segments. In a hazardous-chemical leakage case study (10 representative queries) E-KELL matched standards, avoided factual errors, and scored ~9/10 from 19 domain experts on clarity, accuracy, conciseness, and instructiveness. The approach reduces LLM hallucination and yields auditable answers, but building and updating the KG required semi-automatic extraction plus manual curation and the system currently relies on limited document coverage.

Problem Statement

Emergency decisions must follow laws and technical standards, but raw LLM outputs can hallucinate and miss logical links embedded across fragmented documents (tables, diagrams). EDSS need fast, auditable, and standards-compliant guidance; current LLMs alone lack reliable referencing and structured reasoning over heterogeneous regulatory texts.

Main Contribution

A practical EDSS framework (E-KELL) that stores emergency standards in a knowledge graph (KG) and guides an LLM to reason over KG segments via a prompt-chain.

A semi-automatic pipeline to extract triples from Chinese emergency documents and a curated KG (2264 triples) used as the authoritative knowledge base.

Key Findings

E-KELL produced factually correct and standards-compliant answers on the 10 evaluated queries.

NumbersFactually correct 10/10; In compliance with standards 10/10 (Table 1)

Practical UseUse KG-backed retrieval + prompt guidance to avoid LLM hallucinations and ensure responses follow local regulations.

Evidence RefTable 1, Section 5.2

Domain experts rated E-KELL higher on usability metrics than the baselines.

NumbersComprehensibility 9.06; Accuracy 9.09; Conciseness 9.03; Instructiveness 9.06 (Table 2)

Practical UseExpect clearer, more actionable guidance for frontline staff, which can speed decisions under stress.

Evidence RefTable 2, Section 5.2

Results

Metric	Value	Baseline	Delta	Split / Dataset	Evidence	Evidence Ref
Grammatically correct (10 queries)	E-KELL 10/10; ChatGLM-6b 10/10; GPT-3.5 10/10	—	—	10 use-case queries (hazardous chemical leakage)	Table 1 objective attributes	Table 1
Factually correct (10 queries)	E-KELL 10/10; ChatGLM-6b 8/10; GPT-3.5 9/10	—	—	10 use-case queries	Table 1 objective attributes	Table 1

What To Try In 7 Days

Extract 1–2 critical local standards and build a tiny KG (10–100 triples) for a frequent emergency scenario.

Connect that KG to an LLM via a retrieval index (Llama Index) and run 10 representative queries vs the plain LLM to compare factual compliance.

Publish a prompt template that forces the model to cite source triples and iterate templates based on user feedback.

Agent Features

Tool Use

Llama Index (vector retrieval)OCR for document ingestionMixed Reality UI for frontline

Frameworks

LLM + Knowledge Graph prompt-chain

Optimization Features

Infra Optimization

Local deployment on NVIDIA A100

Reproducibility

Code AvailableNo

Data AvailableNo

Open Source StatusNo

LicenseUnknown

Risks & Boundaries

Limitations

Limited document coverage: prototype built on a small set of Chinese standards and quick references.

KG construction required substantial manual curation; automatic fusion was insufficient.

When Not To Use

As the sole or authoritative decision-maker without human review.

For emergencies outside the documents and standards loaded into the KG.

Failure Modes

Incomplete or outdated KG leads to incorrect or non-compliant advice.

Retrieval misses relevant triples, causing the LLM to hallucinate from its base weights.

E-KELL: a KG-backed LLM system that guides decisions with standards-based evidence to cut hallucinations

Overview

Trust Signals

Reproducibility

At A Glance

Authors

Links

Why It Matters For Business

Who Should Care

Summary TLDR

Problem Statement

Main Contribution

Key Findings

E-KELL produced factually correct and standards-compliant answers on the 10 evaluated queries.

Domain experts rated E-KELL higher on usability metrics than the baselines.

Results

What To Try In 7 Days

Agent Features

Optimization Features

Reproducibility

Risks & Boundaries

Limitations

When Not To Use

Failure Modes

Core Entities

Models

Metrics

Overview

Trust Signals

Reproducibility

At A Glance

Authors

Links

Why It Matters For Business

Who Should Care

Summary TLDR

Problem Statement

Main Contribution

Key Findings

E-KELL produced factually correct and standards-compliant answers on the 10 evaluated queries.

Domain experts rated E-KELL higher on usability metrics than the baselines.

Results

What To Try In 7 Days

Agent Features

Optimization Features

Reproducibility

Risks & Boundaries

Limitations

When Not To Use

Failure Modes

Core Entities

Models

Metrics

You May Also Want to Read

Turn an LLM output into a mini knowledge graph, check each fact with an NLI model, and get explainable hallucination flags

Key finding

Combine LLMs with a medical knowledge graph to get more accurate, verifiable scientific answers

Key finding

Use a personal causal graph so an LLM recommends foods that better lower your post-meal glucose

Key finding

A practical survey showing how knowledge graphs can make LLMs better at complex question answering

Key finding

MindMap: prompt LLMs with knowledge-graph evidence to produce explicit graph-style reasoning and reduce hallucination

Key finding