Tippy: a production-ready multi-agent system that automates drug discovery lab workflows

July 18, 20257 min

Overview

Decision SnapshotNeeds Validation

The implementation covers end-to-end deployment concerns (Kubernetes, CI/CD, observability) and detailed tool lists, showing strong engineering readiness; however, the paper lacks quantitative benchmarks of lab outcome improvements and public code, so evidence is moderate.

Citations1

Evidence Strength0.60

Confidence0.80

Risk Signals10

Trust Signals

Findings with numeric evidence: 7/7

Findings with evidence refs: 7/7

Results with explicit delta: 0/4

Reproducibility

Status: No open assets linked

Open source: Unknown

At A Glance

Cost impact: 60%

Production readiness: 70%

Novelty: 60%

Authors

Yao Fehlis, Charles Crain, Aidan Jensen, Michael Watson, James Juhasz, Paul Mandel, Betty Liu, Shawn Mahon, Daren Wilson, Nick Lynch-Jonely, Ben Leedom, David Fuller

Links

Abstract / PDF

Why It Matters For Business

Tippy shows how to turn multi-agent lab automation from a concept into a deployable platform, enabling more automated DMTA cycles, reproducible deployments, and scalable instrument orchestration.

Who Should Care

Summary TLDR

This paper describes the engineering and deployment of Tippy, a production-focused multi-agent system for automating drug-discovery lab workflows. Tippy uses a Supervisor agent plus specialized Molecule, Lab, Analysis, and Report agents, integrates tools through the Model Context Protocol (MCP), and runs on Kubernetes with Helm, CI/CD, an Envoy proxy, and vector DBs for retrieval. The authors detail tool lists, orchestration via OpenAI Agents SDK, Git-based configuration tracking, observability, and safety guardrails, but provide no quantitative benchmark of scientific outcomes.

Problem Statement

Laboratory automation needs reliable, production-grade software that coordinates heterogeneous instruments, human operators, and multi-step DMTA (Design-Make-Test-Analyze) workflows. Existing lab systems handle data and basic tracking but lack coordinated autonomous orchestration, standardized tool integration, and deployment practices required for scalable, repeatable automation.

Main Contribution

A production multi-agent microservices architecture with a Supervisor plus specialized Molecule, Lab, Analysis, and Report agents.

Integration pattern using the Model Context Protocol (MCP) to expose lab tools and instrument controls to agents.

Key Findings

Tippy uses five specialized agents (Supervisor, Molecule, Lab, Analysis, Report) plus a Safety Guardrail.

Numbers5 specialized agents

Practical UseDesign services as focused agents; separate responsibilities (design, execution, analysis, reporting, safety) to simplify integration and scaling.

Evidence RefSections 2, Figure 1

The Lab Agent exposes 13 MCP tools covering job creation, instrument control, lookup, and execution.

NumbersLab Agent: 13 MCP tools

Practical UseMap each lab capability (scheduling, control, queries) to explicit tool APIs to let agents reason about and safely invoke instruments.

Evidence RefSection 2.4

Results

MetricValueBaselineDeltaSplit / DatasetEvidenceEvidence Ref
Specialized agents5 (Supervisor + Molecule + Lab + Analysis + Report)Figure 1; Sections 2–2.7Sections 2, Figure 1
Lab Agent tool count13 MCP toolsSection 2.4 lists 13 toolsSection 2.4

What To Try In 7 Days

Map key lab capabilities to MCP-like tool APIs (job start, status, instrument control).

Containerize one agent and deploy it with Helm on a local Kubernetes cluster.

Add Git-based config/versioning for agent prompts and tool configs to enable rollbacks.

Agent Features

Memory
retrieval memory (vector DB)context window management
Planning
dynamic agent routingcontext sharing for task handoff
Tool Use
Model Context Protocol (MCP) tool integrationfunction calling for tool selectionRAG-backed retrieval for memory
Frameworks
OpenAI Agents SDKOpenAI Responses APIMCP Federation
Is Agentic

Yes

Architectures
microservicessupervisor-agent patterndistributed multi-agent
Collaboration
handoff via OpenAI Agents SDKasynchronous communication patterns

Optimization Features

Infra Optimization
Kubernetes + HelmDocker containerizationEnvoy reverse proxyCI/CD with GitHub Actions
Model Optimization
GPU acceleration for MolMIM
System Optimization
Horizontal Pod Autoscalingblue-green and rolling updates
Inference Optimization
GPU inference clustersconfigurable resource limits per agent

Reproducibility

Code AvailableNo
Data AvailableNo
Open Source StatusUnknown
LicenseUnknown

Risks & Boundaries

Limitations

No quantitative benchmarks showing scientific or time-to-result gains.

Safety guardrail is described at a high level and not evaluated with adversarial tests.

When Not To Use

If you lack Kubernetes/container expertise or GPU resources.

When provable, auditable safety certification is required before lab actions.

Failure Modes

Tool or instrument API failures leading to incorrect job execution.

Model hallucination or incorrect protocol interpretation causing unsafe actions.

Core Entities

Models

MolMIMOpenAI Responses APIOpenAI Agents SDK

Metrics

workflow durationjob statusactor workload

Context Entities

Models

foundational LLMs (via OpenAI Responses API)

Metrics

CPU utilizationmessage queue depth