Ask-EDA: a Slack-ready design chatbot that combines hybrid retrieval and an abbreviation lookup to reduce hallucinations

Overview

Decision SnapshotNeeds Validation

The system is implemented end-to-end and integrated into Slack; numeric evidence shows sizable recall gains on small in-domain test sets, but the data and code are internal and evaluations are limited to 100-item subsets.

Citations1

Evidence Strength0.75

Confidence0.85

Risk Signals11

Trust Signals

Findings with numeric evidence: 3/4

Findings with evidence refs: 4/4

Results with explicit delta: 0/5

Reproducibility

Status: No open assets linked

Open source: Partial

At A Glance

Cost impact: 60%

Production readiness: 70%

Novelty: 50%

Authors

Luyao Shi, Michael Kazda, Bradley Sears, Nick Shropshire, Ruchir Puri

Links

Abstract / PDF

Why It Matters For Business

A hybrid RAG layer plus a small abbreviation lookup can cut wrong answers and boost recall on internal technical queries, speeding engineering work and reducing time spent hunting docs.

Who Should Care

Product Manager Engineering Lead ML Engineer Founder

Summary TLDR

Ask-EDA is a domain chat assistant for chip design that pairs an LLM with a hybrid RAG retrieval layer (dense + sparse + reciprocal rank fusion) and an abbreviation de-hallucination module. Evaluated on three 100-item, in-domain test sets, hybrid RAG improved recall vs no-RAG (40%+ on q2a-100, 60%+ on cmds-100) and abbreviation lookup (ADH) improved recall on abbr-100 by >70%. The system runs over Slack and returns sources for user review. Key limits: a small tailored knowledge base (≈400 MB, IBM-specific), 249 abbreviations, and remaining LLM recall/hallucination issues.

Problem Statement

Design engineers struggle to find correct, up-to-date technical guidance and command syntax across scattered internal docs and Slack. Off-the-shelf LLMs hallucinate or lack current/institutional knowledge. The goal is a 24/7 assistant that returns accurate, sourced answers and reduces hallucinated abbreviation expansions.

Main Contribution

Built Ask-EDA: a chat assistant for electronic design that combines an LLM, hybrid retrieval (dense + sparse), and abbreviation de-hallucination.

Implemented a hybrid search pipeline using sentence-transformer dense vectors, BM25 sparse index, and reciprocal rank fusion (RRF).

Key Findings

Hybrid RAG substantially increases answer recall versus no retrieval.

Numbersq2a-100: >40% recall improvement vs no-RAG; cmds-100: >60% recall improvement vs no-RAG

Practical UseAdd hybrid retrieval (dense+sparse+RRF) to an LLM pipeline when you need higher recall on domain Q&A and command lookup.

Evidence RefAbstract; III.C Results (Fig.3)

Abbreviation de-hallucination (ADH) greatly reduces wrong expansions.

Numbersabbr-100: >70% recall improvement with ADH

Practical UseInclude a curated abbreviation dictionary and inject exact matches into prompts to sharply reduce abbreviation hallucinations.

Evidence RefAbstract; III.C Results (Fig.4)

Results

Metric	Value	Baseline	Delta	Split / Dataset	Evidence	Evidence Ref
q2a-100 Recall improvement (hybrid vs none)	>40% relative increase	no RAG	—	q2a-100	Abstract; III.C Results (Fig.3)	Abstract; III.C
cmds-100 Recall improvement (hybrid vs none)	>60% relative increase	no RAG (Recall=0)	—	cmds-100	Abstract; III.C Results (Fig.3) and text	III.C

What To Try In 7 Days

Build a small hybrid index (dense + BM25) over your most-used internal docs and test recall on 50 common queries.

Add a curated abbreviation dictionary and inject exact matches into prompts for abbreviation-heavy domains.

Expose retrieval sources in the UI so engineers can verify answers quickly.

Agent Features

Memory

short-term chat history included from recent prior questions

Tool Use

Slack API (conversational interface)Source listing for user verification

Frameworks

LangChain for ingestionChromaDB for dense storage

Is Agentic

Yes

Architectures

single-turn LLM with retrieval-augmented context

Collaboration

SME-built abbreviation dictionary; feedback collection via Slack (not used in eval)

Optimization Features

Token Efficiency

chunking documents to control context length (2048 chunk size, 256 overlap)

System Optimization

reciprocal rank fusion (RRF) to merge dense and sparse results

Reproducibility

Code AvailableNo

Data AvailableNo

Open Source StatusPartial

LicenseUnknown

Risks & Boundaries

Limitations

Knowledge base is IBM-specific and ~400MB; results may not generalize to other orgs.

Abbreviation dictionary has 249 entries; only ~25% are general industry terms.

When Not To Use

When you need perfect recall on open-ended, up-to-the-minute sources not ingested into the index.

When handling highly sensitive or confidential data unless retrieval and access controls are hardened.

Failure Modes

LLM ignores injected abbreviation info and hallucinates expansions despite ADH.

Hybrid context overwhelms the LLM leading to lower F1 even with higher recall.

Core Entities

Models

Granite-13b-chat-v2.1Llama2-13b-chatall-MiniLM-L6-v2 (embedder)

Metrics

ROUGE-Lsum F1Recall

Datasets

q2a-100cmds-100abbr-100internal doc corpus (≈400MB; ~10.2k command pages; ~5k params; 30 slack channels; 18k Q&A)

Overview

Trust Signals

Reproducibility

At A Glance

Authors

Links

Why It Matters For Business

Who Should Care

Summary TLDR

Problem Statement

Main Contribution

Key Findings

Hybrid RAG substantially increases answer recall versus no retrieval.

Abbreviation de-hallucination (ADH) greatly reduces wrong expansions.

Results

What To Try In 7 Days

Agent Features

Optimization Features

Reproducibility

Risks & Boundaries

Limitations

When Not To Use

Failure Modes

Core Entities

Models

Metrics

Datasets

You May Also Want to Read

RAGElo: use synthetic queries + LLM-as-judge + Elo tournaments to compare RAG vs RAG-Fusion on company docs

Key finding

Use multi-agent RAG plus a hybrid vector-graph memory to auto-generate traceable test plans and cases, cutting test-document work by ~85% in

Key finding

An LLM agent that first pulls subgraphs from Wikidata, then triggers focused web search and prompt-based self-improvement to improve fact‑f​

Key finding

RAG + a 10M‑token Vedanta corpus cuts hallucinations for niche long‑form QA

Key finding

HybridRAG-Bench: contamination-aware tests that force retrieval + multi-hop reasoning over text + knowledge graphs

Key finding

An LLM agent that first pulls subgraphs from Wikidata, then triggers focused web search and prompt-based self-improvement to improve fact‑f