Overview
Production Readiness
0.6
Novelty Score
0.6
Cost Impact Score
0.7
Citation Count
0
Why It Matters For Business
Automating the end-to-end design loop lets teams generate and triage thousands of candidate biologics quickly. This cuts the early discovery cycle time and lets experimental teams focus on a smaller, higher-quality set for wet-lab testing. The system also shows how to map compute cost vs. value by filtering cheaply and
Summary TLDR
StructBioReasoner is a multi-agent pipeline that combines retrieval-augmented LLM planning, structure prediction, molecular dynamics, and iterative binder design to target intrinsically disordered proteins (IDPs). On two case studies it produced large pools of in silico-validated binders: for Der f 21, 787 validated designs with 50.98% outperforming a literature reference by MM-PBSA; for NMNAT-2 it produced 97,066 validated binders and identified three binding modes including NMNAT-2:p53. The system runs at scale on the Aurora supercomputer, with MD sampling ≈26.6 µs/hour and multi-agent throughput measured in thousands of MM-PBSA calculations and ~15k peptides/hour design generation. Key I/
Problem Statement
Intrinsically disordered proteins (IDPs) lack a single stable 3D structure, so conventional design methods fail. Practitioners need an autonomous, scalable way to choose tools, reason across ensembles, and run expensive simulations to produce candidate biologics at scale.
Main Contribution
A tournament-style multi-agent architecture (StructBioReasoner) that lets specialized agents compete and refine binder hypotheses in parallel.
An integrated stack combining retrieval-augmented literature (HiPerRAG), LLM-driven planning, structure prediction, molecular dynamics, MM-PBSA scoring, and iterative binder design.
Scaling demonstration and empirical case studies: Der f 21 (787 validated designs; >50% beat reference in silico) and NMNAT-2 (97,066 validated binders; three binding modes discovered), run on Aurora with measured agent throughput and I/O bottleneck analysis.
Key Findings
Der f 21: 50.98% of 787 in-silico validated designs had more favorable MM-PBSA binding free energy than the literature reference.
NMNAT-2: 97,066 binders passed sequence and structural QC; analysis revealed three major binding modes, including a NMNAT-2:p53 interface.
Scaling: MD agent achieved ~26.6 microseconds aggregate sampling per hour and 80.4% parallel efficiency at 256 nodes; MM-PBSA processed >4,000 calculations/hour at 64 nodes; design agent produced ~15,000 peptides/hour with 84.4% efficiency at 256 nodes.
HiPerRAG vector store: literature corpus built from ~1,520 NMNAT-2 papers and ~38 Der f 21 papers to ground LLM reasoning and construct a shared knowledge graph.
Results
Der f 21 reference binding energy (MM-PBSA)
Validated designs for Der f 21
Fraction outperforming reference (Der f 21)
NMNAT-2 validated binders
MD throughput (aggregate sampling)
MM-PBSA throughput
Binder design throughput
Who Should Care
What To Try In 7 Days
Build a small domain vector store (10–100 papers) and attach a RAG layer to an LLM to ground planning for a single target.
Prototype a short agent loop: design 100 binders with an existing design tool, run quick MD (10 ns) and compute cheap interaction energies to triage top 10.
Benchmark a key analysis stage (MM-PBSA or surrogate) on available hardware to find I/O vs compute limits before scaling.
Agent Features
Memory
- short-term memory trimming for context management
- long-term memory for persistent findings
Planning
- LLM planning with structured state summaries
- cross-hypothesis learning to form soft constraints
Tool Use
- on-demand tool invocation (structure prediction, MD, MM-PBSA, design)
- human-in-the-loop checkpoints
Frameworks
- Academy (execution layer)
- HiPerRAG (RAG layer)
Is Agentic
true
Architectures
- tournament-based multi-agent
- LLM-driven planner-reasoner
Collaboration
- agents compete in tournaments and share a knowledge graph
Optimization Features
Infra Optimization
- Parsl and Globus Compute for federated execution; node scaling tuned to avoid I/O saturation
System Optimization
- node-level parallelism, staging considerations called out
Training Optimization
- direct preference optimization (DPO) for fine-tuning generative policy
Inference Optimization
- multi-stage filtering to avoid expensive MM-PBSA on all candidates
Reproducibility
Code Urls
Code Available
Open Source Status
- partial
Risks & Boundaries
Limitations
- All biological efficacy claims are in silico (MM-PBSA approximations). Experimental validation required before therapeutic claims.
- MM-PBSA is approximate and sensitive to simulation length and forcefield choices.
- I/O bottlenecks limit scaling of MM-PBSA beyond ~64 nodes; file-system contention affects throughput.
- LLM reasoning remains dependent on curated literature; residual hallucinations possible without careful RAG curation.
When Not To Use
- When you need immediate wet-lab validated candidates without further experiments.
- If you lack access to large HPC resources or cannot tolerate high I/O demands.
- For targets where single-structure methods suffice (ordered proteins) and simpler pipelines are cheaper.
Failure Modes
- MD crashes due to bad inputs (NaN coordinates, segmentation faults) requiring diagnostic agents.
- Silent file-format corruption during CIF→PDB conversion causing downstream failures.
- I/O saturation on parallel filesystem that negates compute scaling.
- LLM hallucinations if the RAG corpus is incomplete or noisy.
Core Entities
Models
- RFDiffusion
- BindCraft
- AlphaFold 3
- AlphaFold-Multimer
- Chai-1
- Boltz-2x
- BioMNI
- GPT-OSS-120B
- ESM-2 650M
- ProteinMPNN
Metrics
- Binding free energy (MM-PBSA, kcal/mol)
- RMSD / RMSF (stability)
- Aggregate MD sampling (µs/hour)
- Agent parallel efficiency (%)
- Design throughput (peptides/hour)
Datasets
- Custom HiPerRAG vector store (≈1,520 NMNAT-2 + ≈38 Der f 21 papers)
- PDB
- DisProt (mentioned)
Benchmarks
- Der f 21
- NMNAT-2
Context Entities
Models
- OpenFold
- xTrimo-PGLM
- GenSLM
Metrics
- MM-PBSA std. dev. (kcal/mol)
- I/O parallel efficiency
Datasets
- Europe PMC
- OpenAlex
- Crossref
- Unpaywall
Benchmarks
- DisProt

