Overview
The method is straightforward to reproduce: RL for concise KG path extraction plus a bandit to pick formats. Experiments on three public datasets back the claims, but real deployment depends on KG quality and API budgets.
Citations4
Evidence Strength0.80
Confidence0.78
Risk Signals9
Trust Signals
Findings with numeric evidence: 4/4
Findings with evidence refs: 4/4
Results with explicit delta: 4/5
Reproducibility
Status: Partial assets available
Open source: Unknown
At A Glance
Cost impact: 70%
Production readiness: 60%
Novelty: 60%
Why It Matters For Business
KnowGPT upgrades closed‑box LLM accuracy using existing KGs while trimming prompt size and API costs. It lets teams improve domain QA without fine‑tuning large models or owning model weights.
Who Should Care
Summary TLDR
KnowGPT is a practical pipeline to inject structured facts from knowledge graphs (KGs) into closed‑box LLMs via prompts. It uses a reinforcement‑learning agent to extract short, relevant KG paths and a contextual multi‑armed bandit to pick how to present those facts (triples, sentences, or graph descriptions). Built on GPT‑3.5 and tested on CommonsenseQA, OpenBookQA and MedQA, it gives large accuracy gains (e.g., ~92.4% on OpenBookQA test; leaderboard 92.6%) while cutting average prompt size and API cost versus other KG‑prompting methods. Main limits: noisy or incomplete KGs and remaining API cost.
Problem Statement
Given a question, a large KG and only API access to an LLM, create a short, factual prompt that improves QA accuracy. Challenges: KGs are huge, API calls cost money and tokens, and hand‑crafted hard prompts are brittle across questions and KG structures.
Main Contribution
Define KG‑based prompting for black‑box LLMs that builds prompts from subgraphs.
P RL: a deep reinforcement learning policy that extracts concise, context‑relevant KG paths as reasoning background.
Key Findings
KnowGPT raises QA accuracy substantially over baseline LLMs on three datasets
OpenBookQA leaderboard performance reaches human‑level
Results
| Metric | Value | Baseline | Delta | Split / Dataset | Evidence | Evidence Ref |
|---|---|---|---|---|---|---|
| Accuracy | 0.924 | — | — | OpenBookQA test | Table 1 main results | Table 1 |
| Accuracy | 0.926 | Human 0.917 | +0.009 | OpenBookQA leaderboard | Section 4.2.1 and Table 2 | Table 2 |
What To Try In 7 Days
Run simple entity linking and 2‑hop subgraph extraction (P_sub) on your domain KG and feed as 'sentence' prompts to your LLM API to check gains.
Implement off‑line RL path sampler on a small KG region to extract short paths and compare accuracy and token use vs full subgraph.
Train a light MAB or UCB selector to pick prompt format per question and measure API cost and accuracy tradeoffs.
Optimization Features
Token Efficiency
System Optimization
Training Optimization
Inference Optimization
Reproducibility
Data URLs
Risks & Boundaries
Limitations
Real‑world KGs contain noisy or incorrect triples that can mislead the LLM (Section 5).
RL retrieval fails when KG is sparse or entities have few neighbors; P_sub fallback is needed (C.4).
When Not To Use
When no reliable KG exists for the domain.
When latency or zero external API usage is mandatory.
Failure Modes
Noisy KG facts cause confident but wrong LLM outputs.
RL policy cannot find reachable paths in sparse graphs and yields poor prompts.

