A multi-agent system uses LLM planning, retrieval, and large-scale simulation to design peptide/protein binders for disordered proteins on a

December 17, 202510 min

Overview

Decision SnapshotNeeds Validation

The system is a functioning end-to-end prototype with concrete scaling and case-study results. Evidence is strong for in-silico performance and HPC scaling. Wet-lab validation and public release of orchestration code/data remain limited, reducing immediate production readiness.

Citations0

Evidence Strength0.75

Confidence0.80

Risk Signals11

Trust Signals

Findings with numeric evidence: 4/4

Findings with evidence refs: 4/4

Results with explicit delta: 6/7

Reproducibility

Status: Partial assets available

Open source: Partial

At A Glance

Cost impact: 70%

Production readiness: 60%

Novelty: 60%

Authors

Matthew Sinclair, Moeen Meigooni, Archit Vasan, Ozan Gokdemir, Xinran Lian, Heng Ma, Yadu Babuji, Alexander Brace, Khalid Hossain, Carlo Siebenschuh, Thomas Brettin, Kyle Chard, Christopher Henry, Venkatram Vishwanath, Rick L. Stevens, Ian T. Foster, Arvind Ramanathan

Links

Abstract / PDF / Code

Why It Matters For Business

Automating the end-to-end design loop lets teams generate and triage thousands of candidate biologics quickly. This cuts the early discovery cycle time and lets experimental teams focus on a smaller, higher-quality set for wet-lab testing. The system also shows how to map compute cost vs. value by filtering cheaply and

Who Should Care

Summary TLDR

StructBioReasoner is a multi-agent pipeline that combines retrieval-augmented LLM planning, structure prediction, molecular dynamics, and iterative binder design to target intrinsically disordered proteins (IDPs). On two case studies it produced large pools of in silico-validated binders: for Der f 21, 787 validated designs with 50.98% outperforming a literature reference by MM-PBSA; for NMNAT-2 it produced 97,066 validated binders and identified three binding modes including NMNAT-2:p53. The system runs at scale on the Aurora supercomputer, with MD sampling ≈26.6 µs/hour and multi-agent throughput measured in thousands of MM-PBSA calculations and ~15k peptides/hour design generation. Key I/

Problem Statement

Intrinsically disordered proteins (IDPs) lack a single stable 3D structure, so conventional design methods fail. Practitioners need an autonomous, scalable way to choose tools, reason across ensembles, and run expensive simulations to produce candidate biologics at scale.

Main Contribution

A tournament-style multi-agent architecture (StructBioReasoner) that lets specialized agents compete and refine binder hypotheses in parallel.

An integrated stack combining retrieval-augmented literature (HiPerRAG), LLM-driven planning, structure prediction, molecular dynamics, MM-PBSA scoring, and iterative binder design.

Key Findings

Der f 21: 50.98% of 787 in-silico validated designs had more favorable MM-PBSA binding free energy than the literature reference.

Numbers50.98% of 787 designs; 'more favorable' = mean ≤ -145.25 kcal/mol

Practical UseYou can use an agentic pipeline to generate hundreds of high-quality candidates quickly; expect a substantial fraction to beat an existing in-silico reference, but follow up with wet-lab validation because MM-PBSA is an/

Evidence RefSection 4.1; Figure 3B-C

NMNAT-2: 97,066 binders passed sequence and structural QC; analysis revealed three major binding modes, including a NMNAT-2:p53 interface.

Numbers97,066 validated binders (out of 266,606 generated); three binding modes

Practical UseAgentic, interactome-driven searches can find biologically relevant interfaces (e.g., p53) and produce large candidate sets for downstream screening and experimental follow-up.

Evidence RefSection 4.2; Figure 4A-D

Results

MetricValueBaselineDeltaSplit / DatasetEvidenceEvidence Ref
Der f 21 reference binding energy (MM-PBSA)-135.00 ± 10.25 kcal/molDer f 21; reference binder 1012 replicates, 600 ns total simulationSection 4.1
Validated designs for Der f 21787 designs passed QC and structural checks842 total designed93.47% pass rateDer f 21 design campaignTwo design cycles; sequence+structure QCSection 4.1

What To Try In 7 Days

Build a small domain vector store (10–100 papers) and attach a RAG layer to an LLM to ground planning for a single target.

Prototype a short agent loop: design 100 binders with an existing design tool, run quick MD (10 ns) and compute cheap interaction energies to triage top 10.

Benchmark a key analysis stage (MM-PBSA or surrogate) on available hardware to find I/O vs compute limits before scaling.

Agent Features

Memory
short-term memory trimming for context managementlong-term memory for persistent findings
Planning
LLM planning with structured state summariescross-hypothesis learning to form soft constraints
Tool Use
on-demand tool invocation (structure prediction, MD, MM-PBSA, design)human-in-the-loop checkpoints
Frameworks
Academy (execution layer)HiPerRAG (RAG layer)
Is Agentic

Yes

Architectures
tournament-based multi-agentLLM-driven planner-reasoner
Collaboration
agents compete in tournaments and share a knowledge graph

Optimization Features

Infra Optimization
Parsl and Globus Compute for federated execution; node scaling tuned to avoid I/O saturation
System Optimization
node-level parallelism, staging considerations called out
Training Optimization
direct preference optimization (DPO) for fine-tuning generative policy
Inference Optimization
multi-stage filtering to avoid expensive MM-PBSA on all candidates

Reproducibility

Code AvailableYes
Data AvailableNo
Open Source StatusPartial
LicenseUnknown

Risks & Boundaries

Limitations

All biological efficacy claims are in silico (MM-PBSA approximations). Experimental validation required before therapeutic claims.

MM-PBSA is approximate and sensitive to simulation length and forcefield choices.

When Not To Use

When you need immediate wet-lab validated candidates without further experiments.

If you lack access to large HPC resources or cannot tolerate high I/O demands.

Failure Modes

MD crashes due to bad inputs (NaN coordinates, segmentation faults) requiring diagnostic agents.

Silent file-format corruption during CIF→PDB conversion causing downstream failures.

Core Entities

Models

RFDiffusionBindCraftAlphaFold 3AlphaFold-MultimerChai-1Boltz-2xBioMNIGPT-OSS-120BESM-2 650MProteinMPNN

Metrics

Binding free energy (MM-PBSA, kcal/mol)RMSD / RMSF (stability)Aggregate MD sampling (µs/hour)Agent parallel efficiency (%)Design throughput (peptides/hour)

Datasets

Custom HiPerRAG vector store (≈1,520 NMNAT-2 + ≈38 Der f 21 papers)PDBDisProt (mentioned)

Benchmarks

Der f 21NMNAT-2

Context Entities

Models

OpenFoldxTrimo-PGLMGenSLM

Metrics

MM-PBSA std. dev. (kcal/mol)I/O parallel efficiency

Datasets

Europe PMCOpenAlexCrossrefUnpaywall

Benchmarks

DisProt