Generate editable BIM models from plain language by orchestrating LLM agents that write modeling code

August 15, 20248 min

Overview

Production Readiness

0.6

Novelty Score

0.6

Cost Impact Score

0.6

Citation Count

6

Authors

Changyu Du, Sebastian Esser, Stavros Nousias, André Borrmann

Links

Abstract / PDF

Why It Matters For Business

Text2BIM lets designers describe early-stage buildings in plain language and get editable BIM models, reducing manual modeling effort and speeding concept-to-BIM workflows while preserving the ability to refine results in standard BIM tools.

Summary TLDR

Text2BIM is a multi-agent system that converts natural-language design requests into executable Python code that calls high-level BIM tool functions. The agents (Instruction Enhancer, Architect, Programmer, Reviewer) cooperate, run the code in a sandboxed interpreter, and use Solibri rule-checking feedback to iteratively fix problems. On 25 test prompts (3 runs each, 534 IFCs), modern LLMs produced editable BIM models with internal layout and semantics and high automated quality (most Solibri rule pass rates ≳0.95). The system is a feasibility prototype for early-stage, editable BIM generation, not a finished production tool (limited toolset, partial architectural/structural rules).

Problem Statement

Creating editable, semantically rich BIM models from plain text is hard because Text-to-3D methods produce surface geometry without BIM semantics. Designers must still learn complex authoring tools. The paper asks: can LLMs be orchestrated to generate executable modeling code and iterate with deterministic rule checks to produce native BIM models aligned with user intent?

Main Contribution

A code-centric multi-agent framework (Instruction Enhancer, Architect, Programmer, Reviewer) that converts text into Python code invoking high-level BIM tool functions to create native BIM models.

A rule-based model-checking loop (Solibri with 30 rules) that feeds back deterministic issues to agents so models are iteratively fixed.

A Vectorworks prototype integrating the framework, and a 25-prompt evaluation comparing three LLMs (o4-mini-high, Claude-sonnet-4, Gemini-2.5-pro) across 534 runs.

Key Findings

The framework produced editable IFC/BIM models for 25 diverse prompts with 534 generated runs.

Numbers534 IFC models generated (25 prompts × 3 LLMs × 3 repeats incl. intermediate runs)

Automated quality (30-rule Solibri pass rate) was high: most average pass rates exceed 0.95 across prompts and LLMs.

NumbersMost prompt-LLM averages ≳0.95 (Table 4)

Iterative quality optimization resolved roughly 70% of detected issues for the better LLMs.

Numbers≈70% of issues fixed by Claude-sonnet-4 and Gemini-2.5-pro after optimization passes

Generated modeling code matches reference logic with high similarity by CodeBERTScore.

NumbersCodeBERTScore mean per-prompt roughly 0.85–0.95 (Table 7)

Results

Number of generated IFC models

Value534

Final model checking pass rate (median/most prompts)

ValueMost averages ≳0.95

Issue resolution via optimization loop

Value≈70% issues fixed (Claude & Gemini)

Code generation quality (CodeBERTScore F1)

Value≈0.85–0.95 mean

Who Should Care

What To Try In 7 Days

Run the Text2BIM prototype on 5 typical early-stage briefs to gauge quality and editability.

Define a small set of high-level tool functions (create_wall, add_door, create_slab) and test LLM-generated code execution in a sandbox.

Integrate a rule-checker (e.g., Solibri) and validate a fix-loop: generate → check → patch code → re-run.

Agent Features

Memory

  • local loop-internal memory for optimization
  • global memory for chat/code history

Planning

  • self-reflection loop (code error correction)
  • quality-optimization loop (model checker feedback)

Tool Use

  • function calling to Architect
  • program-generated Python invoking high-level tool functions

Frameworks

  • Reflexion-inspired looped feedback

Is Agentic

true

Architectures

  • multi-agent LLM pipeline

Collaboration

  • Instruction Enhancer ↔ Architect (spec refinement)
  • Programmer (code writer)
  • Reviewer (issue solver)

Optimization Features

Token Efficiency

  • Few-shot example used for Architect; minimal examples for Programmer

Infra Optimization

  • Sandboxed AST Python interpreter to execute code and maintain state

System Optimization

  • Rule-based feedback loop to direct code fixes
  • Tool abstraction (26 high-level functions) reduces code verbosity

Training Optimization

  • No fine-tuning; uses prompt engineering and few-shot for Architect

Inference Optimization

  • Zero-shot/ few-shot prompting and function-calling rather than model finetune

Reproducibility

Code Available

Data Available

Open Source Status

  • partial

Risks & Boundaries

Limitations

  • Prototype focused on early-stage regular (non-curved) models; cannot generate advanced engineering elements (stairs, beams, columns).
  • Limited toolset (26 functions) constrains shape complexity and LOD.
  • Rule-checker covers basic geometric/semantic rules but not full building codes or structural checks.
  • Some manual steps remain (IFC export requires minor user intervention).

When Not To Use

  • For final construction documents or structural design requiring code compliance.
  • When irregular/curved geometry or high LOD (>LOD200) is required.
  • For safety-critical automated changes without human review.

Failure Modes

  • Hallucinations in long or complex prompts leading to missing or wrong coordinates.
  • Reviewer fixes introducing new collisions (saw-tooth issue counts) in complex models.
  • Cascading errors when Architect produces incorrect base geometry that downstream agents repeat.

Core Entities

Models

  • o4-mini-high
  • Claude-sonnet-4
  • Gemini-2.5-pro
  • Whisper

Metrics

  • Solibri 30-rule pass rate
  • Issue amount (count of problems reported by Solibri)
  • CodeBERTScore (F1)

Datasets

  • 25-prompt evaluation set (custom)
  • Vectorworks usage logs (1 day, ~25M records) used to design toolset

Context Entities

Models

  • DreamFusion
  • Magic3D
  • 3D-GPT (related work)