Overview
Production Readiness
0.6
Novelty Score
0.5
Cost Impact Score
0.7
Citation Count
0
Why It Matters For Business
You can improve RTL code generation accuracy and tune for hardware metrics without expensive model re-training. That cuts data and compute cost and lets engineering teams iterate quickly with existing LLMs.
Summary TLDR
VeriMaAS is a multi-agent controller that adaptively composes prompting operators (CoT, ReAct, Self-Refine, Debate, etc.) and uses synthesis/verification logs (Yosys, OpenSTA) as feedback. On standard RTL benchmarks it raises pass@k accuracy versus single-agent prompting and some fine-tuned baselines, while needing only a few hundred examples to tune the controller instead of tens of thousands for full fine-tuning.
Problem Statement
HDL/RTL code generation suffers from scarce public data and high fine-tuning costs. Existing single-agent prompting or fine-tuned models either require large supervision or high inference cost. The paper asks: can we use automated multi-agent workflows that read EDA tool feedback to find good Verilog without heavy fine-tuning?
Main Contribution
VeriMaAS: a cascading multi-agent controller that adaptively selects prompting operators and uses synthesis/verification logs to guide generation.
A lightweight tuning procedure for the controller that needs only a few hundred examples to set per-stage thresholds (versus tens of thousands for full fine-tuning).
Demonstration on VerilogEval and VeriThoughts benchmarks showing improved pass@k and optional PPA-aware optimization (area/power/delay) using the same controller framework.
Key Findings
VeriMaAS increases top-1 syntactic/functional accuracy (pass@1) on evaluated RTL benchmarks.
Controller tuning needs only a few hundred examples instead of tens of thousands required for fine-tuning.
The controller can be re-optimized for hardware metrics (PPA) and yield large area/delay reductions, with trade-offs.
Results
VeriThoughts pass@1
VeriThoughts pass@1
PPA area reduction
Who Should Care
What To Try In 7 Days
Run VeriMaAS prototype on a small internal set (≈500 tasks) and tune per-stage thresholds.
Plug Yosys and OpenSTA into generator loop to collect synthesis logs as feedback.
Test PPA-aware tuning on a few high-value kernels to see area/delay trade-offs before broad rollout.
Agent Features
Planning
- stage-wise cascade planning
Tool Use
- Yosys
- OpenSTA
- Skywater PDK
Frameworks
- VeriMaAS
Is Agentic
true
Architectures
- cascading controller
- multi-agent operator sequences
Collaboration
- multi-agent coordination via controller
Optimization Features
Token Efficiency
- moderate token overhead vs single CoT; lower than iterative Self-Refine in many cases
System Optimization
- PPA-aware controller objective to reduce area/delay
Training Optimization
- threshold tuning using a few hundred examples
Inference Optimization
- adaptive stopping based on failure percentage
- controller trades tokens vs utility (λ=1e-3)
Reproducibility
Code Urls
Code Available
Data Available
Open Source Status
- partial
Risks & Boundaries
Limitations
- Benchmarks and PPA gains are limited to the evaluated datasets and the Skywater 130nm flow.
- PPA-aware tuning can trade accuracy for area/power on some tasks.
- Some Verilog tasks (simple gates) have little room for PPA gains, so optimization impact varies by task.
When Not To Use
- If you already have a heavily fine-tuned RTL model and cannot afford any extra inference tokens.
- When commercial PDKs or proprietary EDA tools are required and not available to integrate.
- For extremely small-scale tasks where controller overhead outweighs gains (e.g., trivial gate tasks).
Failure Modes
- Controller may add token cost and latency compared to a single prompt chain.
- PPA optimization can increase power or slightly reduce pass@10 for some benchmarks.
- Controller thresholds tuned on one dataset may not transfer perfectly; retuning may be required.
Core Entities
Models
- GPT 4o-mini
- o4-mini
- Qwen2.5-7B
- Qwen2.5-14B
- Qwen3-8B
- Qwen3-14B
- RTLCoder-7B
- RTLCoder-DeepSeek-7B
- VeriThoughts-14B
- DeepSeek-R1-Qwen-14B
Metrics
- pass@1
- pass@10
- tokens per query
- Area (post-synthesis)
- Power (static)
- Delay
Datasets
- VeriThoughts
- VerilogEval
- MetRex (synthesis benchmark)
Benchmarks
- VeriThoughts
- VerilogEval
- MetRex

