Overview
Production Readiness
0.6
Novelty Score
0.6
Cost Impact Score
0.6
Citation Count
6
Why It Matters For Business
Text2BIM lets designers describe early-stage buildings in plain language and get editable BIM models, reducing manual modeling effort and speeding concept-to-BIM workflows while preserving the ability to refine results in standard BIM tools.
Summary TLDR
Text2BIM is a multi-agent system that converts natural-language design requests into executable Python code that calls high-level BIM tool functions. The agents (Instruction Enhancer, Architect, Programmer, Reviewer) cooperate, run the code in a sandboxed interpreter, and use Solibri rule-checking feedback to iteratively fix problems. On 25 test prompts (3 runs each, 534 IFCs), modern LLMs produced editable BIM models with internal layout and semantics and high automated quality (most Solibri rule pass rates ≳0.95). The system is a feasibility prototype for early-stage, editable BIM generation, not a finished production tool (limited toolset, partial architectural/structural rules).
Problem Statement
Creating editable, semantically rich BIM models from plain text is hard because Text-to-3D methods produce surface geometry without BIM semantics. Designers must still learn complex authoring tools. The paper asks: can LLMs be orchestrated to generate executable modeling code and iterate with deterministic rule checks to produce native BIM models aligned with user intent?
Main Contribution
A code-centric multi-agent framework (Instruction Enhancer, Architect, Programmer, Reviewer) that converts text into Python code invoking high-level BIM tool functions to create native BIM models.
A rule-based model-checking loop (Solibri with 30 rules) that feeds back deterministic issues to agents so models are iteratively fixed.
A Vectorworks prototype integrating the framework, and a 25-prompt evaluation comparing three LLMs (o4-mini-high, Claude-sonnet-4, Gemini-2.5-pro) across 534 runs.
Key Findings
The framework produced editable IFC/BIM models for 25 diverse prompts with 534 generated runs.
Automated quality (30-rule Solibri pass rate) was high: most average pass rates exceed 0.95 across prompts and LLMs.
Iterative quality optimization resolved roughly 70% of detected issues for the better LLMs.
Generated modeling code matches reference logic with high similarity by CodeBERTScore.
Results
Number of generated IFC models
Final model checking pass rate (median/most prompts)
Issue resolution via optimization loop
Code generation quality (CodeBERTScore F1)
Who Should Care
What To Try In 7 Days
Run the Text2BIM prototype on 5 typical early-stage briefs to gauge quality and editability.
Define a small set of high-level tool functions (create_wall, add_door, create_slab) and test LLM-generated code execution in a sandbox.
Integrate a rule-checker (e.g., Solibri) and validate a fix-loop: generate → check → patch code → re-run.
Agent Features
Memory
- local loop-internal memory for optimization
- global memory for chat/code history
Planning
- self-reflection loop (code error correction)
- quality-optimization loop (model checker feedback)
Tool Use
- function calling to Architect
- program-generated Python invoking high-level tool functions
Frameworks
- Reflexion-inspired looped feedback
Is Agentic
true
Architectures
- multi-agent LLM pipeline
Collaboration
- Instruction Enhancer ↔ Architect (spec refinement)
- Programmer (code writer)
- Reviewer (issue solver)
Optimization Features
Token Efficiency
- Few-shot example used for Architect; minimal examples for Programmer
Infra Optimization
- Sandboxed AST Python interpreter to execute code and maintain state
System Optimization
- Rule-based feedback loop to direct code fixes
- Tool abstraction (26 high-level functions) reduces code verbosity
Training Optimization
- No fine-tuning; uses prompt engineering and few-shot for Architect
Inference Optimization
- Zero-shot/ few-shot prompting and function-calling rather than model finetune
Reproducibility
Code Urls
Code Available
Data Available
Open Source Status
- partial
Risks & Boundaries
Limitations
- Prototype focused on early-stage regular (non-curved) models; cannot generate advanced engineering elements (stairs, beams, columns).
- Limited toolset (26 functions) constrains shape complexity and LOD.
- Rule-checker covers basic geometric/semantic rules but not full building codes or structural checks.
- Some manual steps remain (IFC export requires minor user intervention).
When Not To Use
- For final construction documents or structural design requiring code compliance.
- When irregular/curved geometry or high LOD (>LOD200) is required.
- For safety-critical automated changes without human review.
Failure Modes
- Hallucinations in long or complex prompts leading to missing or wrong coordinates.
- Reviewer fixes introducing new collisions (saw-tooth issue counts) in complex models.
- Cascading errors when Architect produces incorrect base geometry that downstream agents repeat.
Core Entities
Models
- o4-mini-high
- Claude-sonnet-4
- Gemini-2.5-pro
- Whisper
Metrics
- Solibri 30-rule pass rate
- Issue amount (count of problems reported by Solibri)
- CodeBERTScore (F1)
Datasets
- 25-prompt evaluation set (custom)
- Vectorworks usage logs (1 day, ~25M records) used to design toolset
Context Entities
Models
- DreamFusion
- Magic3D
- 3D-GPT (related work)

