Ask-EDA: a Slack-ready design chatbot that combines hybrid retrieval and an abbreviation lookup to reduce hallucinations

June 3, 20248 min

Overview

Decision SnapshotNeeds Validation

The system is implemented end-to-end and integrated into Slack; numeric evidence shows sizable recall gains on small in-domain test sets, but the data and code are internal and evaluations are limited to 100-item subsets.

Citations1

Evidence Strength0.75

Confidence0.85

Risk Signals11

Trust Signals

Findings with numeric evidence: 3/4

Findings with evidence refs: 4/4

Results with explicit delta: 0/5

Reproducibility

Status: No open assets linked

Open source: Partial

At A Glance

Cost impact: 60%

Production readiness: 70%

Novelty: 50%

Authors

Luyao Shi, Michael Kazda, Bradley Sears, Nick Shropshire, Ruchir Puri

Links

Abstract / PDF

Why It Matters For Business

A hybrid RAG layer plus a small abbreviation lookup can cut wrong answers and boost recall on internal technical queries, speeding engineering work and reducing time spent hunting docs.

Who Should Care

Summary TLDR

Ask-EDA is a domain chat assistant for chip design that pairs an LLM with a hybrid RAG retrieval layer (dense + sparse + reciprocal rank fusion) and an abbreviation de-hallucination module. Evaluated on three 100-item, in-domain test sets, hybrid RAG improved recall vs no-RAG (40%+ on q2a-100, 60%+ on cmds-100) and abbreviation lookup (ADH) improved recall on abbr-100 by >70%. The system runs over Slack and returns sources for user review. Key limits: a small tailored knowledge base (≈400 MB, IBM-specific), 249 abbreviations, and remaining LLM recall/hallucination issues.

Problem Statement

Design engineers struggle to find correct, up-to-date technical guidance and command syntax across scattered internal docs and Slack. Off-the-shelf LLMs hallucinate or lack current/institutional knowledge. The goal is a 24/7 assistant that returns accurate, sourced answers and reduces hallucinated abbreviation expansions.

Main Contribution

Built Ask-EDA: a chat assistant for electronic design that combines an LLM, hybrid retrieval (dense + sparse), and abbreviation de-hallucination.

Implemented a hybrid search pipeline using sentence-transformer dense vectors, BM25 sparse index, and reciprocal rank fusion (RRF).

Key Findings

Hybrid RAG substantially increases answer recall versus no retrieval.

Numbersq2a-100: >40% recall improvement vs no-RAG; cmds-100: >60% recall improvement vs no-RAG

Practical UseAdd hybrid retrieval (dense+sparse+RRF) to an LLM pipeline when you need higher recall on domain Q&A and command lookup.

Evidence RefAbstract; III.C Results (Fig.3)

Abbreviation de-hallucination (ADH) greatly reduces wrong expansions.

Numbersabbr-100: >70% recall improvement with ADH

Practical UseInclude a curated abbreviation dictionary and inject exact matches into prompts to sharply reduce abbreviation hallucinations.

Evidence RefAbstract; III.C Results (Fig.4)

Results

MetricValueBaselineDeltaSplit / DatasetEvidenceEvidence Ref
q2a-100 Recall improvement (hybrid vs none)>40% relative increaseno RAGq2a-100Abstract; III.C Results (Fig.3)Abstract; III.C
cmds-100 Recall improvement (hybrid vs none)>60% relative increaseno RAG (Recall=0)cmds-100Abstract; III.C Results (Fig.3) and textIII.C

What To Try In 7 Days

Build a small hybrid index (dense + BM25) over your most-used internal docs and test recall on 50 common queries.

Add a curated abbreviation dictionary and inject exact matches into prompts for abbreviation-heavy domains.

Expose retrieval sources in the UI so engineers can verify answers quickly.

Agent Features

Memory
short-term chat history included from recent prior questions
Tool Use
Slack API (conversational interface)Source listing for user verification
Frameworks
LangChain for ingestionChromaDB for dense storage
Is Agentic

Yes

Architectures
single-turn LLM with retrieval-augmented context
Collaboration
SME-built abbreviation dictionary; feedback collection via Slack (not used in eval)

Optimization Features

Token Efficiency
chunking documents to control context length (2048 chunk size, 256 overlap)
System Optimization
reciprocal rank fusion (RRF) to merge dense and sparse results

Reproducibility

Code AvailableNo
Data AvailableNo
Open Source StatusPartial
LicenseUnknown

Risks & Boundaries

Limitations

Knowledge base is IBM-specific and ~400MB; results may not generalize to other orgs.

Abbreviation dictionary has 249 entries; only ~25% are general industry terms.

When Not To Use

When you need perfect recall on open-ended, up-to-the-minute sources not ingested into the index.

When handling highly sensitive or confidential data unless retrieval and access controls are hardened.

Failure Modes

LLM ignores injected abbreviation info and hallucinates expansions despite ADH.

Hybrid context overwhelms the LLM leading to lower F1 even with higher recall.

Core Entities

Models

Granite-13b-chat-v2.1Llama2-13b-chatall-MiniLM-L6-v2 (embedder)

Metrics

ROUGE-Lsum F1Recall

Datasets

q2a-100cmds-100abbr-100internal doc corpus (≈400MB; ~10.2k command pages; ~5k params; 30 slack channels; 18k Q&A)