Use formal EDA feedback inside a multi-agent controller to improve Verilog generation without expensive fine-tuning.

September 24, 20256 min

Overview

Decision SnapshotNeeds Validation

The method is a practical system-level recipe: clear engineering value, public prototype, and numeric gains on benchmarks. Results are empirical and benchmark-limited; some trade-offs (tokens, power, pass@10) appear in tables.

Citations0

Evidence Strength0.70

Confidence0.85

Risk Signals9

Trust Signals

Findings with numeric evidence: 3/3

Findings with evidence refs: 3/3

Results with explicit delta: 3/3

Reproducibility

Status: Code + data available

Open source: Partial

At A Glance

Cost impact: 70%

Production readiness: 60%

Novelty: 50%

Authors

Amulya Bhattaram, Janani Ramamoorthy, Ranit Gupta, Diana Marculescu, Dimitrios Stamoulis

Links

Abstract / PDF / Code

Why It Matters For Business

You can improve RTL code generation accuracy and tune for hardware metrics without expensive model re-training. That cuts data and compute cost and lets engineering teams iterate quickly with existing LLMs.

Who Should Care

Summary TLDR

VeriMaAS is a multi-agent controller that adaptively composes prompting operators (CoT, ReAct, Self-Refine, Debate, etc.) and uses synthesis/verification logs (Yosys, OpenSTA) as feedback. On standard RTL benchmarks it raises pass@k accuracy versus single-agent prompting and some fine-tuned baselines, while needing only a few hundred examples to tune the controller instead of tens of thousands for full fine-tuning.

Problem Statement

HDL/RTL code generation suffers from scarce public data and high fine-tuning costs. Existing single-agent prompting or fine-tuned models either require large supervision or high inference cost. The paper asks: can we use automated multi-agent workflows that read EDA tool feedback to find good Verilog without heavy fine-tuning?

Main Contribution

VeriMaAS: a cascading multi-agent controller that adaptively selects prompting operators and uses synthesis/verification logs to guide generation.

A lightweight tuning procedure for the controller that needs only a few hundred examples to set per-stage thresholds (versus tens of thousands for full fine-tuning).

Key Findings

VeriMaAS increases top-1 syntactic/functional accuracy (pass@1) on evaluated RTL benchmarks.

NumbersQwen2.5-7B: VeriThoughts pass@1 44.90 -> 56.62 (+11.72) (Table 1).

Practical UseIf you run open LLMs on RTL tasks, wrapping them with VeriMaAS can yield large absolute gains in top-1 correct designs without changing model weights. Try the controller before investing in fine-tuning.

Evidence RefTable 1

Controller tuning needs only a few hundred examples instead of tens of thousands required for fine-tuning.

NumbersController tuned using 500 sampled VeriThoughts datapoints; described as 'a few hundred' and 'order-of-magnitude' less.

Practical UseYou can get most of the multi-agent benefit by collecting a small validation set (~500 tasks) and tuning thresholds, saving large compute and data costs.

Evidence RefSection 2 (controller tuning)

Results

MetricValueBaselineDeltaSplit / DatasetEvidenceEvidence Ref
VeriThoughts pass@1Qwen2.5-7B + VeriMaAS = 56.62 (baseline 44.90) -> +11.72Qwen2.5-7B baseline 44.90+11.72VeriThoughtsTable 1 (pass@1 numbers for Qwen2.5-7B)Table 1
VeriThoughts pass@1GPT 4o-mini + VeriMaAS = 83.09 (baseline 80.64) -> +2.45GPT 4o-mini baseline 80.64+2.45VeriThoughtsTable 1 (GPT 4o-mini rows)Table 1

What To Try In 7 Days

Run VeriMaAS prototype on a small internal set (≈500 tasks) and tune per-stage thresholds.

Plug Yosys and OpenSTA into generator loop to collect synthesis logs as feedback.

Test PPA-aware tuning on a few high-value kernels to see area/delay trade-offs before broad rollout.

Agent Features

Planning
stage-wise cascade planning
Tool Use
YosysOpenSTASkywater PDK
Frameworks
VeriMaAS
Is Agentic

Yes

Architectures
cascading controllermulti-agent operator sequences
Collaboration
multi-agent coordination via controller

Optimization Features

Token Efficiency
moderate token overhead vs single CoT; lower than iterative Self-Refine in many cases
System Optimization
PPA-aware controller objective to reduce area/delay
Training Optimization
threshold tuning using a few hundred examples
Inference Optimization
adaptive stopping based on failure percentagecontroller trades tokens vs utility (λ=1e-3)

Reproducibility

Code AvailableYes
Data AvailableYes
Open Source StatusPartial
LicenseUnknown

Risks & Boundaries

Limitations

Benchmarks and PPA gains are limited to the evaluated datasets and the Skywater 130nm flow.

PPA-aware tuning can trade accuracy for area/power on some tasks.

When Not To Use

If you already have a heavily fine-tuned RTL model and cannot afford any extra inference tokens.

When commercial PDKs or proprietary EDA tools are required and not available to integrate.

Failure Modes

Controller may add token cost and latency compared to a single prompt chain.

PPA optimization can increase power or slightly reduce pass@10 for some benchmarks.

Core Entities

Models

GPT 4o-minio4-miniQwen2.5-7BQwen2.5-14BQwen3-8BQwen3-14BRTLCoder-7BRTLCoder-DeepSeek-7BVeriThoughts-14BDeepSeek-R1-Qwen-14B

Metrics

pass@1pass@10tokens per queryArea (post-synthesis)Power (static)Delay

Datasets

VeriThoughtsVerilogEvalMetRex (synthesis benchmark)

Benchmarks

VeriThoughtsVerilogEvalMetRex