Overview
Production Readiness
0.7
Novelty Score
0.6
Cost Impact Score
0.6
Citation Count
1
Why It Matters For Business
Tippy shows how to turn multi-agent lab automation from a concept into a deployable platform, enabling more automated DMTA cycles, reproducible deployments, and scalable instrument orchestration.
Summary TLDR
This paper describes the engineering and deployment of Tippy, a production-focused multi-agent system for automating drug-discovery lab workflows. Tippy uses a Supervisor agent plus specialized Molecule, Lab, Analysis, and Report agents, integrates tools through the Model Context Protocol (MCP), and runs on Kubernetes with Helm, CI/CD, an Envoy proxy, and vector DBs for retrieval. The authors detail tool lists, orchestration via OpenAI Agents SDK, Git-based configuration tracking, observability, and safety guardrails, but provide no quantitative benchmark of scientific outcomes.
Problem Statement
Laboratory automation needs reliable, production-grade software that coordinates heterogeneous instruments, human operators, and multi-step DMTA (Design-Make-Test-Analyze) workflows. Existing lab systems handle data and basic tracking but lack coordinated autonomous orchestration, standardized tool integration, and deployment practices required for scalable, repeatable automation.
Main Contribution
A production multi-agent microservices architecture with a Supervisor plus specialized Molecule, Lab, Analysis, and Report agents.
Integration pattern using the Model Context Protocol (MCP) to expose lab tools and instrument controls to agents.
Deployment and ops design: Kubernetes with Helm, Docker containers, HPA autoscaling, Envoy proxy, and Git-based configuration & CI/CD.
Retrieval-augmented generation using vector databases for persistent memory across campaigns.
Safety guardrail using content moderation and prompt-injection detection and observability via tracing.
Key Findings
Tippy uses five specialized agents (Supervisor, Molecule, Lab, Analysis, Report) plus a Safety Guardrail.
The Lab Agent exposes 13 MCP tools covering job creation, instrument control, lookup, and execution.
Molecule, Analysis, and Report agents provide 4, 10, and 8 MCP tools respectively.
System deployed on Kubernetes with Helm, Docker, HPA autoscaling, blue-green/rolling updates, and Git-based config.
Retrieval-augmented generation implemented via integrated vector databases to provide persistent memory across campaigns.
Safety handled by a Guardrail Agent using OpenAI content filters and prompt-injection detection rather than external MCP tools.
MolMIM model is used for property-guided molecule generation with GPU acceleration.
Results
Specialized agents
Lab Agent tool count
Molecule / Analysis / Report tool counts
Deployment stack
Who Should Care
What To Try In 7 Days
Map key lab capabilities to MCP-like tool APIs (job start, status, instrument control).
Containerize one agent and deploy it with Helm on a local Kubernetes cluster.
Add Git-based config/versioning for agent prompts and tool configs to enable rollbacks.
Agent Features
Memory
- retrieval memory (vector DB)
- context window management
Planning
- dynamic agent routing
- context sharing for task handoff
Tool Use
- Model Context Protocol (MCP) tool integration
- function calling for tool selection
- RAG-backed retrieval for memory
Frameworks
- OpenAI Agents SDK
- OpenAI Responses API
- MCP Federation
Is Agentic
true
Architectures
- microservices
- supervisor-agent pattern
- distributed multi-agent
Collaboration
- handoff via OpenAI Agents SDK
- asynchronous communication patterns
Optimization Features
Infra Optimization
- Kubernetes + Helm
- Docker containerization
- Envoy reverse proxy
- CI/CD with GitHub Actions
Model Optimization
- GPU acceleration for MolMIM
System Optimization
- Horizontal Pod Autoscaling
- blue-green and rolling updates
Inference Optimization
- GPU inference clusters
- configurable resource limits per agent
Reproducibility
Open Source Status
- unknown
Risks & Boundaries
Limitations
- No quantitative benchmarks showing scientific or time-to-result gains.
- Safety guardrail is described at a high level and not evaluated with adversarial tests.
- Code and data are not published with the paper, limiting reproducibility.
When Not To Use
- If you lack Kubernetes/container expertise or GPU resources.
- When provable, auditable safety certification is required before lab actions.
- For small one-off experiments where full production ops are unnecessary.
Failure Modes
- Tool or instrument API failures leading to incorrect job execution.
- Model hallucination or incorrect protocol interpretation causing unsafe actions.
- Prompt-injection or malicious inputs bypassing guardrails if not updated.
- State divergence between MCP federation nodes causing inconsistent tool views.
Core Entities
Models
- MolMIM
- OpenAI Responses API
- OpenAI Agents SDK
Metrics
- workflow duration
- job status
- actor workload
Context Entities
Models
- foundational LLMs (via OpenAI Responses API)
Metrics
- CPU utilization
- message queue depth

