ChipExpert: Open-source LLM tuned for integrated-circuit design

July 26, 20247 min

Overview

Decision SnapshotReady For Pilot

The model is mature enough for experimental deployment in IC workflows and R&D, given public code and benchmark; expect integration work for private data, RAG, and verification.

Citations0

Evidence Strength0.75

Confidence0.80

Risk Signals10

Trust Signals

Findings with numeric evidence: 3/4

Findings with evidence refs: 4/4

Results with explicit delta: 3/3

Reproducibility

Status: Partial assets available

Open source: Partial

At A Glance

Cost impact: 60%

Production readiness: 60%

Novelty: 60%

Authors

Ning Xu, Zhaoyang Zhang, Lei Qi, Wensuo Wang, Chao Zhang, Zihao Ren, Huaiyuan Zhang, Xin Cheng, Yanqi Zhang, Zhichao Liu, Qingwen Wei, Shiyang Wu, Lanlan Yang, Qianfeng Lu, Yiqun Ma, Mengyao Zhao, Junbo Liu, Yufan Song, Xin Geng, Jun Yang

Links

Abstract / PDF / Code / Data

Why It Matters For Business

ChipExpert provides an open, lower-cost assistant focused on IC design knowledge; it can speed onboarding, reduce expert time for Q&A, and be adapted into internal tools.

Who Should Care

Summary TLDR

ChipExpert is an open-source 8B-parameter LLM adapted from Llama-3 and tuned specifically for integrated-circuit (IC) design. The authors built a 4.7B-token IC corpus, generated >70k domain QA pairs with a multi-agent system (ChipInstruct), continued pretraining, supervised fine-tuning, and Direct Preference Optimization (DPO) alignment. They add a RAG layer (top-3 retrieved passages) and release ChatICD-Bench to evaluate IC knowledge. On their benchmark ChipExpert matches or exceeds GPT-4 on many IC tasks (notably EDA and several advanced subdomains). Model, code, and benchmark are available online.

Problem Statement

General LLMs lack deep, usable IC design knowledge. Students and engineers face high learning costs and limited accessible, accurate domain materials. The paper aims to build and evaluate an open-source LLM tailored to IC design so practitioners get more accurate, domain-aware answers.

Main Contribution

ChipExpert: an open-source IC-design-focused LLM built from Llama-3 8B and released on HuggingFace.

A 4.7B-token IC corpus (blended, with domain knowledge repeated 4x to 11.2B effective tokens) used for continued pretraining.

Key Findings

ChipExpert beats GPT-4 on foundational EDA questions.

NumbersChipExpert 0.93 vs GPT-4 0.87

Practical UseFor EDA foundational Q&A, use ChipExpert to get answers comparable or better than GPT-4 at lower cost and with open-source control.

Evidence RefSection 5.2; Fig.6

ChipExpert outperforms GPT-4 in many advanced IC subdomains.

NumbersOutperforms GPT-4 in 6 of 9 advanced subdomains; +0.28 in CIM

Practical UseR&D teams working on cutting-edge IC topics (e.g., compute-in-memory) can use ChipExpert as a domain-aware assistant for idea exploration and literature Q&A.

Evidence RefSection 5.2; Fig.7

Results

MetricValueBaselineDeltaSplit / DatasetEvidenceEvidence Ref
Human-eval score on EDA foundational questions0.93 (ChipExpert)0.87 (GPT-4)+0.06ChatICD-Bench (foundational, EDA)Section 5.2; Fig.6Fig.6
Human-eval delta on compute-in-memory (advanced)ChipExpert improves by +0.28GPT-4+0.28ChatICD-Bench (advanced, CIM)Section 5.2; Fig.7Fig.7

What To Try In 7 Days

Run the released ChipExpert model on a few internal IC Q&A cases to compare answers vs your expert results

Evaluate ChatICD-Bench on your use cases and add representative prompts

Add a RAG layer using your internal docs (embed + ANN, top-3 passages) to improve factuality quickly

Agent Features

Tool Use
RAG (vector DB + ANN retrieval)LoRAFlash Attention / GQA for efficiency
Frameworks
ChipInstructModelLinkMarker
Architectures
Autoregressive transformer (Llama-3 8B base)Instruction-tuned assistant
Collaboration
Multi-agent pipeline for data synthesis (ChipInstruct)

Optimization Features

Token Efficiency
Domain repeat strategy (repeat domain knowledge 4x in mix)
Model Optimization
LoRA
System Optimization
Trained on 8 Ascend-910B NPUs
Training Optimization
Continued pretraining on domain corpusSFTTwo-phase DPO alignment (preference tuning)
Inference Optimization
Flash AttentionGQA (Group Query Attention)

Reproducibility

Risks & Boundaries

Limitations

Weaker performance reported on analog circuit domain compared to GPT-4

Pretraining relies on publicly available texts; may miss proprietary or newest datasets

When Not To Use

High-assurance analog circuit design decisions without human verification

Tasks requiring diagram/graph interpretation (no multimodal model yet)

Failure Modes

Hallucinations when RAG retrieval misses or returns irrelevant passages

Overfitting if further fine-tuned without fresh domain data

Core Entities

Models

ChipExpert-8B-Instruct (fine-tuned Llama-3 8B)Llama-3 8B (base)

Metrics

Human expert rating (0-1)Automatic LLM multi-agent scoring + referee debate

Datasets

Custom IC continue-pretraining corpus (4.7B tokens)Supervised QA pairs (>70k)ChatICD-Bench (released)

Benchmarks

ChatICD-Bench