Overview
Production Readiness
0.7
Novelty Score
0.5
Cost Impact Score
0.6
Citation Count
1
Why It Matters For Business
A hybrid RAG layer plus a small abbreviation lookup can cut wrong answers and boost recall on internal technical queries, speeding engineering work and reducing time spent hunting docs.
Summary TLDR
Ask-EDA is a domain chat assistant for chip design that pairs an LLM with a hybrid RAG retrieval layer (dense + sparse + reciprocal rank fusion) and an abbreviation de-hallucination module. Evaluated on three 100-item, in-domain test sets, hybrid RAG improved recall vs no-RAG (40%+ on q2a-100, 60%+ on cmds-100) and abbreviation lookup (ADH) improved recall on abbr-100 by >70%. The system runs over Slack and returns sources for user review. Key limits: a small tailored knowledge base (≈400 MB, IBM-specific), 249 abbreviations, and remaining LLM recall/hallucination issues.
Problem Statement
Design engineers struggle to find correct, up-to-date technical guidance and command syntax across scattered internal docs and Slack. Off-the-shelf LLMs hallucinate or lack current/institutional knowledge. The goal is a 24/7 assistant that returns accurate, sourced answers and reduces hallucinated abbreviation expansions.
Main Contribution
Built Ask-EDA: a chat assistant for electronic design that combines an LLM, hybrid retrieval (dense + sparse), and abbreviation de-hallucination.
Implemented a hybrid search pipeline using sentence-transformer dense vectors, BM25 sparse index, and reciprocal rank fusion (RRF).
Created three domain test sets (q2a-100, cmds-100, abbr-100) and measured ROUGE-Lsum F1 and Recall to quantify gains.
Integrated with Slack for conversational use and source review; provided a practical system prompt and deployment details.
Key Findings
Hybrid RAG substantially increases answer recall versus no retrieval.
Abbreviation de-hallucination (ADH) greatly reduces wrong expansions.
Model choice affects extraction quality: Granite-13b-chat-v2.1 gives higher F1, Llama2-13b-chat can have similar or better recall but lower F1.
Without RAG, the LLMs had zero recall on command lookup (cmds-100).
Results
q2a-100 Recall improvement (hybrid vs none)
cmds-100 Recall improvement (hybrid vs none)
abbr-100 Recall improvement with ADH
cmds-100 no-RAG recall
Model comparison (F1 vs Recall)
Who Should Care
What To Try In 7 Days
Build a small hybrid index (dense + BM25) over your most-used internal docs and test recall on 50 common queries.
Add a curated abbreviation dictionary and inject exact matches into prompts for abbreviation-heavy domains.
Expose retrieval sources in the UI so engineers can verify answers quickly.
Agent Features
Memory
- short-term chat history included from recent prior questions
Tool Use
- Slack API (conversational interface)
- Source listing for user verification
Frameworks
- LangChain for ingestion
- ChromaDB for dense storage
Is Agentic
true
Architectures
- single-turn LLM with retrieval-augmented context
Collaboration
- SME-built abbreviation dictionary; feedback collection via Slack (not used in eval)
Optimization Features
Token Efficiency
- chunking documents to control context length (2048 chunk size, 256 overlap)
System Optimization
- reciprocal rank fusion (RRF) to merge dense and sparse results
Reproducibility
Open Source Status
- partial
Risks & Boundaries
Limitations
- Knowledge base is IBM-specific and ~400MB; results may not generalize to other orgs.
- Abbreviation dictionary has 249 entries; only ~25% are general industry terms.
- Evaluations use small 100-item test sets per task; results may be noisy.
- Some LLMs still fail to recall injected abbreviations or to produce concise final answers.
- No RLHF or fine-tuning on the ingested design data was applied in this study.
When Not To Use
- When you need perfect recall on open-ended, up-to-the-minute sources not ingested into the index.
- When handling highly sensitive or confidential data unless retrieval and access controls are hardened.
- When you require full public reproducibility — datasets and code are internal.
Failure Modes
- LLM ignores injected abbreviation info and hallucinates expansions despite ADH.
- Hybrid context overwhelms the LLM leading to lower F1 even with higher recall.
- Sparse or dense retrieval misses key command docs if chunking or indexing parameters are suboptimal.
Core Entities
Models
- Granite-13b-chat-v2.1
- Llama2-13b-chat
- all-MiniLM-L6-v2 (embedder)
Metrics
- ROUGE-Lsum F1
- Recall
Datasets
- q2a-100
- cmds-100
- abbr-100
- internal doc corpus (≈400MB; ~10.2k command pages; ~5k params; 30 slack channels; 18k Q&A)

