Overview
The method is a practical system-level recipe: clear engineering value, public prototype, and numeric gains on benchmarks. Results are empirical and benchmark-limited; some trade-offs (tokens, power, pass@10) appear in tables.
Citations0
Evidence Strength0.70
Confidence0.85
Risk Signals9
Trust Signals
Findings with numeric evidence: 3/3
Findings with evidence refs: 3/3
Results with explicit delta: 3/3
Reproducibility
Status: Code + data available
Open source: Partial
At A Glance
Cost impact: 70%
Production readiness: 60%
Novelty: 50%
Why It Matters For Business
You can improve RTL code generation accuracy and tune for hardware metrics without expensive model re-training. That cuts data and compute cost and lets engineering teams iterate quickly with existing LLMs.
Who Should Care
Summary TLDR
VeriMaAS is a multi-agent controller that adaptively composes prompting operators (CoT, ReAct, Self-Refine, Debate, etc.) and uses synthesis/verification logs (Yosys, OpenSTA) as feedback. On standard RTL benchmarks it raises pass@k accuracy versus single-agent prompting and some fine-tuned baselines, while needing only a few hundred examples to tune the controller instead of tens of thousands for full fine-tuning.
Problem Statement
HDL/RTL code generation suffers from scarce public data and high fine-tuning costs. Existing single-agent prompting or fine-tuned models either require large supervision or high inference cost. The paper asks: can we use automated multi-agent workflows that read EDA tool feedback to find good Verilog without heavy fine-tuning?
Main Contribution
VeriMaAS: a cascading multi-agent controller that adaptively selects prompting operators and uses synthesis/verification logs to guide generation.
A lightweight tuning procedure for the controller that needs only a few hundred examples to set per-stage thresholds (versus tens of thousands for full fine-tuning).
Key Findings
VeriMaAS increases top-1 syntactic/functional accuracy (pass@1) on evaluated RTL benchmarks.
Controller tuning needs only a few hundred examples instead of tens of thousands required for fine-tuning.
Results
| Metric | Value | Baseline | Delta | Split / Dataset | Evidence | Evidence Ref |
|---|---|---|---|---|---|---|
| VeriThoughts pass@1 | Qwen2.5-7B + VeriMaAS = 56.62 (baseline 44.90) -> +11.72 | Qwen2.5-7B baseline 44.90 | +11.72 | VeriThoughts | Table 1 (pass@1 numbers for Qwen2.5-7B) | Table 1 |
| VeriThoughts pass@1 | GPT 4o-mini + VeriMaAS = 83.09 (baseline 80.64) -> +2.45 | GPT 4o-mini baseline 80.64 | +2.45 | VeriThoughts | Table 1 (GPT 4o-mini rows) | Table 1 |
What To Try In 7 Days
Run VeriMaAS prototype on a small internal set (≈500 tasks) and tune per-stage thresholds.
Plug Yosys and OpenSTA into generator loop to collect synthesis logs as feedback.
Test PPA-aware tuning on a few high-value kernels to see area/delay trade-offs before broad rollout.
Agent Features
Planning
Tool Use
Frameworks
Is Agentic
Yes
Architectures
Collaboration
Optimization Features
Token Efficiency
System Optimization
Training Optimization
Inference Optimization
Reproducibility
Risks & Boundaries
Limitations
Benchmarks and PPA gains are limited to the evaluated datasets and the Skywater 130nm flow.
PPA-aware tuning can trade accuracy for area/power on some tasks.
When Not To Use
If you already have a heavily fine-tuned RTL model and cannot afford any extra inference tokens.
When commercial PDKs or proprietary EDA tools are required and not available to integrate.
Failure Modes
Controller may add token cost and latency compared to a single prompt chain.
PPA optimization can increase power or slightly reduce pass@10 for some benchmarks.

