ChipExpert: Open-source LLM tuned for integrated-circuit design

Overview

Decision SnapshotReady For Pilot

The model is mature enough for experimental deployment in IC workflows and R&D, given public code and benchmark; expect integration work for private data, RAG, and verification.

Citations0

Evidence Strength0.75

Confidence0.80

Risk Signals10

Trust Signals

Findings with numeric evidence: 3/4

Findings with evidence refs: 4/4

Results with explicit delta: 3/3

Reproducibility

Status: Partial assets available

Open source: Partial

At A Glance

Cost impact: 60%

Production readiness: 60%

Novelty: 60%

Authors

Ning Xu, Zhaoyang Zhang, Lei Qi, Wensuo Wang, Chao Zhang, Zihao Ren, Huaiyuan Zhang, Xin Cheng, Yanqi Zhang, Zhichao Liu, Qingwen Wei, Shiyang Wu, Lanlan Yang, Qianfeng Lu, Yiqun Ma, Mengyao Zhao, Junbo Liu, Yufan Song, Xin Geng, Jun Yang

Links

Abstract / PDF / Code / Data

Why It Matters For Business

ChipExpert provides an open, lower-cost assistant focused on IC design knowledge; it can speed onboarding, reduce expert time for Q&A, and be adapted into internal tools.

Who Should Care

ML Engineer Product Manager Engineering Lead Data Scientist Founder

Summary TLDR

ChipExpert is an open-source 8B-parameter LLM adapted from Llama-3 and tuned specifically for integrated-circuit (IC) design. The authors built a 4.7B-token IC corpus, generated >70k domain QA pairs with a multi-agent system (ChipInstruct), continued pretraining, supervised fine-tuning, and Direct Preference Optimization (DPO) alignment. They add a RAG layer (top-3 retrieved passages) and release ChatICD-Bench to evaluate IC knowledge. On their benchmark ChipExpert matches or exceeds GPT-4 on many IC tasks (notably EDA and several advanced subdomains). Model, code, and benchmark are available online.

Problem Statement

General LLMs lack deep, usable IC design knowledge. Students and engineers face high learning costs and limited accessible, accurate domain materials. The paper aims to build and evaluate an open-source LLM tailored to IC design so practitioners get more accurate, domain-aware answers.

Main Contribution

ChipExpert: an open-source IC-design-focused LLM built from Llama-3 8B and released on HuggingFace.

A 4.7B-token IC corpus (blended, with domain knowledge repeated 4x to 11.2B effective tokens) used for continued pretraining.

Key Findings

ChipExpert beats GPT-4 on foundational EDA questions.

NumbersChipExpert 0.93 vs GPT-4 0.87

Practical UseFor EDA foundational Q&A, use ChipExpert to get answers comparable or better than GPT-4 at lower cost and with open-source control.

Evidence RefSection 5.2; Fig.6

ChipExpert outperforms GPT-4 in many advanced IC subdomains.

NumbersOutperforms GPT-4 in 6 of 9 advanced subdomains; +0.28 in CIM

Practical UseR&D teams working on cutting-edge IC topics (e.g., compute-in-memory) can use ChipExpert as a domain-aware assistant for idea exploration and literature Q&A.

Evidence RefSection 5.2; Fig.7

Results

Metric	Value	Baseline	Delta	Split / Dataset	Evidence	Evidence Ref
Human-eval score on EDA foundational questions	0.93 (ChipExpert)	0.87 (GPT-4)	+0.06	ChatICD-Bench (foundational, EDA)	Section 5.2; Fig.6	Fig.6
Human-eval delta on compute-in-memory (advanced)	ChipExpert improves by +0.28	GPT-4	+0.28	ChatICD-Bench (advanced, CIM)	Section 5.2; Fig.7	Fig.7

What To Try In 7 Days

Run the released ChipExpert model on a few internal IC Q&A cases to compare answers vs your expert results

Evaluate ChatICD-Bench on your use cases and add representative prompts

Add a RAG layer using your internal docs (embed + ANN, top-3 passages) to improve factuality quickly

Agent Features

Tool Use

RAG (vector DB + ANN retrieval)LoRAFlash Attention / GQA for efficiency

Frameworks

ChipInstructModelLinkMarker

Architectures

Autoregressive transformer (Llama-3 8B base)Instruction-tuned assistant

Collaboration

Multi-agent pipeline for data synthesis (ChipInstruct)

Optimization Features

Token Efficiency

Domain repeat strategy (repeat domain knowledge 4x in mix)

Model Optimization

LoRA

System Optimization

Trained on 8 Ascend-910B NPUs

Training Optimization

Continued pretraining on domain corpusSFTTwo-phase DPO alignment (preference tuning)

Inference Optimization

Flash AttentionGQA (Group Query Attention)

Reproducibility

Code AvailableYes

Data AvailableNo

Open Source StatusPartial

LicenseUnknown

Code URLs

https://github.com/NCTIE/ChipExpert https://huggingface.co/China-NCTIEDA/ChipExpert-8B-Instruct

Data URLs

https://huggingface.co/datasets/China-NCTIEDA/ChatICD-Bench

Risks & Boundaries

Limitations

Weaker performance reported on analog circuit domain compared to GPT-4

Pretraining relies on publicly available texts; may miss proprietary or newest datasets

When Not To Use

High-assurance analog circuit design decisions without human verification

Tasks requiring diagram/graph interpretation (no multimodal model yet)

Failure Modes

Hallucinations when RAG retrieval misses or returns irrelevant passages

Overfitting if further fine-tuned without fresh domain data

Core Entities

Models

ChipExpert-8B-Instruct (fine-tuned Llama-3 8B)Llama-3 8B (base)

Metrics

Human expert rating (0-1)Automatic LLM multi-agent scoring + referee debate

Datasets

Custom IC continue-pretraining corpus (4.7B tokens)Supervised QA pairs (>70k)ChatICD-Bench (released)

Benchmarks

ChatICD-Bench

Overview

Trust Signals

Reproducibility

At A Glance

Authors

Links

Why It Matters For Business

Who Should Care

Summary TLDR

Problem Statement

Main Contribution

Key Findings

ChipExpert beats GPT-4 on foundational EDA questions.

ChipExpert outperforms GPT-4 in many advanced IC subdomains.

Results

What To Try In 7 Days

Agent Features

Optimization Features

Reproducibility

Code URLs

Data URLs

Risks & Boundaries

Limitations

When Not To Use

Failure Modes

Core Entities

Models

Metrics

Datasets

Benchmarks

You May Also Want to Read

Chemistry foundation models power structure-focused multimodal RAG inside hierarchical multi-agent workflows

Key finding

Survey of financial LLMs: techniques, benchmarks, and practical gaps

Key finding

SNFinLLM: Chinese financial LLM with domain pretraining, instruction tuning, DPO alignment, and calculator integration

Key finding

PIXIU: open financial LLM + 136K instruction examples and FLARE benchmark

Key finding

Build a modular Chinese financial LLM by instruction data and four task-specific LoRA experts

Key finding