Overview
Production Readiness
0.7
Novelty Score
0.5
Cost Impact Score
0.6
Citation Count
0
Why It Matters For Business
Graph‑backed retrieval plus a small LLM turns a curated safety database into an almost error‑free lookup service for side‑effect presence, cutting clinician search time and reducing misinformation risk.
Summary TLDR
This paper builds two retrieval-augmented systems to answer binary questions like “Is X a side effect of Y?” using the SIDER 4.1 drug-side effect database. A vector-based RAG (Pinecone + ada002 embeddings) and a graph-based GraphRAG (Neo4j + Cypher) feed a Llama-3 8B model. On a balanced subset of 19,520 pairs (976 drugs, 3,851 side effects) GraphRAG scored 0.9999 accuracy and RAG with pairwise format scored 0.998, while a standalone Llama-3 8B scored 0.529. Code is available on GitHub.
Problem Statement
Off-the-shelf LLMs hallucinate and lack reliable domain knowledge for pharmacovigilance. Clinicians need fast, accurate answers about whether a drug is known to cause a specific side effect. The paper asks: can retrieval (text or graph) plus a small LLM deliver reliable, binary drug–side-effect retrieval?
Main Contribution
Design and implement two retrieval-augmented pipelines for drug-side-effect lookup: vector RAG and GraphRAG using SIDER 4.1 as the knowledge base.
Show that GraphRAG (Neo4j graph + Cypher) plus Llama-3 8B gives near-perfect binary retrieval on a 19,520-pair balanced test set.
Demonstrate that representation choices matter: pairwise text format (Data Format B) vastly outperforms aggregated lists (Data Format A) in RAG.
Key Findings
GraphRAG (Neo4j graph + Llama‑3 8B) achieved near‑perfect retrieval accuracy
Data representation strongly affects RAG performance
Standalone Llama‑3 8B performs poorly without retrieval
Large hosted LLMs still underperform without domain retrieval
Evaluation used a balanced, constrained subset of SIDER 4.1
Results
Accuracy
F1 (GraphRAG)
Accuracy
Accuracy
Accuracy
Who Should Care
What To Try In 7 Days
Index a small, curated drug–side‑effect table as pairwise text (Data Format B) and test RAG similarity retrieval.
Load the same pairs into a simple Neo4j graph and run direct existence queries with Cypher.
Add entity extraction and a binary prompt to a small LLM to compare results quickly.
Reproducibility
Code Available
Data Available
Open Source Status
- partial
Risks & Boundaries
Limitations
- Evaluation uses a balanced subset of SIDER 4.1; real-world reports and underreported events are not covered.
- System only supports single‑drug queries; no multi‑drug, class, or reverse queries yet.
- LLM output was constrained to binary responses for evaluation rather than richer explanations.
- Potential mismatches from drug name variants, brand names, or typos are not fully addressed.
When Not To Use
- When you need to discover novel or unreported adverse events from noisy real‑world data.
- For causal inference about whether a drug caused an event rather than documented association.
- When multi‑drug interactions or class‑level summaries are needed.
Failure Modes
- Missed new or underreported side effects because SIDER lacks post‑marketing signals.
- Entity-recognition errors (drug or side‑effect spelling/variant mismatch) leading to false negatives.
- Index or database corruption could produce incorrect existence checks.
- Binary output hides uncertainty and nuanced evidence found in literature.
Core Entities
Models
- Llama-3 8B
- ChatGPT 3.5
- ChatGPT 4
Metrics
- Accuracy
- F1
- precision
- sensitivity
- specificity
Datasets
- SIDER 4.1 (subset of 19,520 pairs)

