Tippy: a production-ready multi-agent system that automates drug discovery lab workflows

July 18, 20257 min

Overview

Production Readiness

0.7

Novelty Score

0.6

Cost Impact Score

0.6

Citation Count

1

Authors

Yao Fehlis, Charles Crain, Aidan Jensen, Michael Watson, James Juhasz, Paul Mandel, Betty Liu, Shawn Mahon, Daren Wilson, Nick Lynch-Jonely, Ben Leedom, David Fuller

Links

Abstract / PDF

Why It Matters For Business

Tippy shows how to turn multi-agent lab automation from a concept into a deployable platform, enabling more automated DMTA cycles, reproducible deployments, and scalable instrument orchestration.

Summary TLDR

This paper describes the engineering and deployment of Tippy, a production-focused multi-agent system for automating drug-discovery lab workflows. Tippy uses a Supervisor agent plus specialized Molecule, Lab, Analysis, and Report agents, integrates tools through the Model Context Protocol (MCP), and runs on Kubernetes with Helm, CI/CD, an Envoy proxy, and vector DBs for retrieval. The authors detail tool lists, orchestration via OpenAI Agents SDK, Git-based configuration tracking, observability, and safety guardrails, but provide no quantitative benchmark of scientific outcomes.

Problem Statement

Laboratory automation needs reliable, production-grade software that coordinates heterogeneous instruments, human operators, and multi-step DMTA (Design-Make-Test-Analyze) workflows. Existing lab systems handle data and basic tracking but lack coordinated autonomous orchestration, standardized tool integration, and deployment practices required for scalable, repeatable automation.

Main Contribution

A production multi-agent microservices architecture with a Supervisor plus specialized Molecule, Lab, Analysis, and Report agents.

Integration pattern using the Model Context Protocol (MCP) to expose lab tools and instrument controls to agents.

Deployment and ops design: Kubernetes with Helm, Docker containers, HPA autoscaling, Envoy proxy, and Git-based configuration & CI/CD.

Retrieval-augmented generation using vector databases for persistent memory across campaigns.

Safety guardrail using content moderation and prompt-injection detection and observability via tracing.

Key Findings

Tippy uses five specialized agents (Supervisor, Molecule, Lab, Analysis, Report) plus a Safety Guardrail.

Numbers5 specialized agents

The Lab Agent exposes 13 MCP tools covering job creation, instrument control, lookup, and execution.

NumbersLab Agent: 13 MCP tools

Molecule, Analysis, and Report agents provide 4, 10, and 8 MCP tools respectively.

NumbersMolecule 4; Analysis 10; Report 8

System deployed on Kubernetes with Helm, Docker, HPA autoscaling, blue-green/rolling updates, and Git-based config.

NumbersUses Helm, HPA, GitOps, CI/CD

Retrieval-augmented generation implemented via integrated vector databases to provide persistent memory across campaigns.

NumbersVector DBs used for RAG

Safety handled by a Guardrail Agent using OpenAI content filters and prompt-injection detection rather than external MCP tools.

NumbersBuilt-in content moderation

MolMIM model is used for property-guided molecule generation with GPU acceleration.

NumbersMolMIM on NVIDIA GPUs

Results

Specialized agents

Value5 (Supervisor + Molecule + Lab + Analysis + Report)

Lab Agent tool count

Value13 MCP tools

Molecule / Analysis / Report tool counts

ValueMolecule 4; Analysis 10; Report 8

Deployment stack

ValueKubernetes + Helm + Envoy + Docker + GitHub Actions

Who Should Care

What To Try In 7 Days

Map key lab capabilities to MCP-like tool APIs (job start, status, instrument control).

Containerize one agent and deploy it with Helm on a local Kubernetes cluster.

Add Git-based config/versioning for agent prompts and tool configs to enable rollbacks.

Agent Features

Memory

  • retrieval memory (vector DB)
  • context window management

Planning

  • dynamic agent routing
  • context sharing for task handoff

Tool Use

  • Model Context Protocol (MCP) tool integration
  • function calling for tool selection
  • RAG-backed retrieval for memory

Frameworks

  • OpenAI Agents SDK
  • OpenAI Responses API
  • MCP Federation

Is Agentic

true

Architectures

  • microservices
  • supervisor-agent pattern
  • distributed multi-agent

Collaboration

  • handoff via OpenAI Agents SDK
  • asynchronous communication patterns

Optimization Features

Infra Optimization

  • Kubernetes + Helm
  • Docker containerization
  • Envoy reverse proxy
  • CI/CD with GitHub Actions

Model Optimization

  • GPU acceleration for MolMIM

System Optimization

  • Horizontal Pod Autoscaling
  • blue-green and rolling updates

Inference Optimization

  • GPU inference clusters
  • configurable resource limits per agent

Reproducibility

Open Source Status

  • unknown

Risks & Boundaries

Limitations

  • No quantitative benchmarks showing scientific or time-to-result gains.
  • Safety guardrail is described at a high level and not evaluated with adversarial tests.
  • Code and data are not published with the paper, limiting reproducibility.

When Not To Use

  • If you lack Kubernetes/container expertise or GPU resources.
  • When provable, auditable safety certification is required before lab actions.
  • For small one-off experiments where full production ops are unnecessary.

Failure Modes

  • Tool or instrument API failures leading to incorrect job execution.
  • Model hallucination or incorrect protocol interpretation causing unsafe actions.
  • Prompt-injection or malicious inputs bypassing guardrails if not updated.
  • State divergence between MCP federation nodes causing inconsistent tool views.

Core Entities

Models

  • MolMIM
  • OpenAI Responses API
  • OpenAI Agents SDK

Metrics

  • workflow duration
  • job status
  • actor workload

Context Entities

Models

  • foundational LLMs (via OpenAI Responses API)

Metrics

  • CPU utilization
  • message queue depth