E-KELL: a KG-backed LLM system that guides decisions with standards-based evidence to cut hallucinations

November 15, 20237 min

Overview

Decision SnapshotNeeds Validation

KG + prompt-chain shows clear gains on a focused case study and expert ratings, but evidence is limited to a small prototype, semi-automatic KG and 10 queries; expect more engineering and data work before production.

Citations4

Evidence Strength0.60

Confidence0.85

Risk Signals10

Trust Signals

Findings with numeric evidence: 3/4

Findings with evidence refs: 4/4

Results with explicit delta: 0/4

Reproducibility

Status: No open assets linked

Open source: No

At A Glance

Cost impact: 60%

Production readiness: 50%

Novelty: 60%

Authors

Minze Chen, Zhenxiang Tao, Weitong Tang, Tingxin Qin, Rui Yang, Chunli Zhu

Links

Abstract / PDF

Why It Matters For Business

For safety-critical operations, E-KELL-style KG+LLM reduces hallucination and ensures answers trace back to standards, lowering legal and operational risk while making guidance faster and more auditable.

Who Should Care

Summary TLDR

E-KELL is a prototype emergency decision support system that stores Chinese emergency standards as a structured knowledge graph (2264 triples) and uses a prompt-chain to make an LLM reason over relevant KG segments. In a hazardous-chemical leakage case study (10 representative queries) E-KELL matched standards, avoided factual errors, and scored ~9/10 from 19 domain experts on clarity, accuracy, conciseness, and instructiveness. The approach reduces LLM hallucination and yields auditable answers, but building and updating the KG required semi-automatic extraction plus manual curation and the system currently relies on limited document coverage.

Problem Statement

Emergency decisions must follow laws and technical standards, but raw LLM outputs can hallucinate and miss logical links embedded across fragmented documents (tables, diagrams). EDSS need fast, auditable, and standards-compliant guidance; current LLMs alone lack reliable referencing and structured reasoning over heterogeneous regulatory texts.

Main Contribution

A practical EDSS framework (E-KELL) that stores emergency standards in a knowledge graph (KG) and guides an LLM to reason over KG segments via a prompt-chain.

A semi-automatic pipeline to extract triples from Chinese emergency documents and a curated KG (2264 triples) used as the authoritative knowledge base.

Key Findings

E-KELL produced factually correct and standards-compliant answers on the 10 evaluated queries.

NumbersFactually correct 10/10; In compliance with standards 10/10 (Table 1)

Practical UseUse KG-backed retrieval + prompt guidance to avoid LLM hallucinations and ensure responses follow local regulations.

Evidence RefTable 1, Section 5.2

Domain experts rated E-KELL higher on usability metrics than the baselines.

NumbersComprehensibility 9.06; Accuracy 9.09; Conciseness 9.03; Instructiveness 9.06 (Table 2)

Practical UseExpect clearer, more actionable guidance for frontline staff, which can speed decisions under stress.

Evidence RefTable 2, Section 5.2

Results

MetricValueBaselineDeltaSplit / DatasetEvidenceEvidence Ref
Grammatically correct (10 queries)E-KELL 10/10; ChatGLM-6b 10/10; GPT-3.5 10/1010 use-case queries (hazardous chemical leakage)Table 1 objective attributesTable 1
Factually correct (10 queries)E-KELL 10/10; ChatGLM-6b 8/10; GPT-3.5 9/1010 use-case queriesTable 1 objective attributesTable 1

What To Try In 7 Days

Extract 1–2 critical local standards and build a tiny KG (10–100 triples) for a frequent emergency scenario.

Connect that KG to an LLM via a retrieval index (Llama Index) and run 10 representative queries vs the plain LLM to compare factual compliance.

Publish a prompt template that forces the model to cite source triples and iterate templates based on user feedback.

Agent Features

Tool Use
Llama Index (vector retrieval)OCR for document ingestionMixed Reality UI for frontline
Frameworks
LLM + Knowledge Graph prompt-chain

Optimization Features

Infra Optimization
Local deployment on NVIDIA A100

Reproducibility

Code AvailableNo
Data AvailableNo
Open Source StatusNo
LicenseUnknown

Risks & Boundaries

Limitations

Limited document coverage: prototype built on a small set of Chinese standards and quick references.

KG construction required substantial manual curation; automatic fusion was insufficient.

When Not To Use

As the sole or authoritative decision-maker without human review.

For emergencies outside the documents and standards loaded into the KG.

Failure Modes

Incomplete or outdated KG leads to incorrect or non-compliant advice.

Retrieval misses relevant triples, causing the LLM to hallucinate from its base weights.

Core Entities

Models

ChatGLM-6BGPT-3.5

Metrics

Accuracyobjective attribute scores (grammatical, factual, compliance)