An LLM agent that plans CRISPR experiments, designs guides and protocols, and was validated in a wet‑lab knockout

April 27, 20248 min

Overview

Production Readiness

0.6

Novelty Score

0.7

Cost Impact Score

0.6

Citation Count

9

Authors

Yuanhao Qu, Kaixuan Huang, Ming Yin, Kanghong Zhan, Dyllan Liu, Di Yin, Henry C. Cousins, William A. Johnson, Xiaotong Wang, Mihir Shah, Russ B. Altman, Denny Zhou, Mengdi Wang, Le Cong

Links

Abstract / PDF

Why It Matters For Business

Automating CRISPR design reduces expert time, speeds prototyping, and lowers error risk in early‑stage research; it can cut planning cycles and standardize lab protocols for teams without CRISPR specialists.

Summary TLDR

CRISPR-GPT is an LLM-powered agent that combines a planner, a tool wrapper, and state‑machine task executors to automate CRISPR experiment design. It supports 22 task states (4 meta‑pipelines), calls tools such as Primer3 and guide libraries, includes safety filters (e.g., blocks >=20 bp sequences and warns on human targets), received higher expert ratings than base ChatGPT in design tasks, and helped non-experts run a 4‑gene knockout in A375 cells with successful NGS validation. The system is a prototype: useful for design automation but not a replacement for wet‑lab expertise or clinical use.

Problem Statement

General LLMs produce confident but sometimes incorrect or incomplete guidance for CRISPR experiments (wrong guides, missing protocol details, unsafe suggestions). Researchers need a domain-aware agent that integrates tools and checks to produce practical, verifiable experimental designs for beginners and non-experts.

Main Contribution

An agent architecture combining an LLM planner, a Tool Provider wrapper, and state‑machine Task Executor to break CRISPR workflows into subgoals.

Implementation of 22 task states across 4 predefined meta‑pipelines (knockout, base editing, prime editing, activation/repression) and 13 Auto‑Mode tasks.

Integration with external tools and resources (web search, Primer3, CRISPRPick/gRNA libraries, off‑target tools) via hand‑written textual tool wrappers.

Human expert evaluation (12 CRISPR experts) showing higher design accuracy and concise guidance vs ChatGPT 3.5/4.0.

Wet‑lab demonstration: human+agent collaboration to knockout four genes in A375 cells with NGS validation.

Built-in safety and privacy mitigations: human‑target checks, moratorium warning flow, and a ≥20 bp sequence filter before calling public LLMs.

Key Findings

Domain‑augmented agent scored higher than general ChatGPT on expert design ratings.

Numbers12 experts; 1–5 rating scale; CRISPR‑GPT > ChatGPT 3.5/4 across Accuracy, Reasoning, Completeness, Conciseness

CRISPR‑GPT executed a real knockout workflow and produced validation‑ready results.

Numbers4 target genes (TGFBR1, SNAI1, BAX, BCL2L1) in A375 cells; validated by NGS

The system is modular and rule‑based: 22 state machines, 4 meta‑tasks, Auto Mode supports 13 tasks.

Numbers22 tasks; 4 meta‑tasks; 13 Auto Mode tasks (Table 1)

Safety and privacy controls are enforced before external LLM calls.

NumbersFilter blocks sequences >= 20 bp; human‑target check triggers moratorium warning

Results

Accuracy

ValueCRISPR‑GPT scored higher than ChatGPT 3.5 and ChatGPT 4 in expert ratings (1–5 scale)

BaselineChatGPT 3.5 / ChatGPT 4

Wet‑lab validation — editing outcome

ValueConsistent high rate of expected edits across 4 targeted genes by NGS

System coverage

ValueSupports 22 task states and 4 meta‑tasks; Auto Mode supports 13 tasks

Who Should Care

What To Try In 7 Days

Run Auto Mode to design an sgRNA knockout for a non‑clinical cell line and compare with your current design workflow

Integrate Primer3 calls into your pipeline to auto‑generate and BLAST‑check PCR primers

Set up the ≥20 bp input filter and human‑target warning flow to test privacy and safety gates

Agent Features

Memory

  • Session interaction history used in prompts
  • No autonomous persistent memory or dynamic task creation

Planning

  • Task decomposition table
  • ReAct chain‑of‑thought prompting
  • Chained state machines per meta‑task

Tool Use

  • Web search
  • Primer3 primer design
  • gRNA library retrieval
  • Off‑target prediction tools (CRISPRitz)
  • BLAST checks

Frameworks

  • ReAct prompting
  • Chain‑of‑thought
  • State‑machine orchestration

Is Agentic

true

Architectures

  • LLM planner + LLM Agent
  • state‑machine Task Executor
  • Tool Provider wrapper

Collaboration

  • Human‑in‑the‑loop oversight and manual correction
  • Agent executes steps and asks for required user inputs

Reproducibility

Open Source Status

  • unknown

Risks & Boundaries

Limitations

  • Cannot generate complete DNA constructs or vectors from natural language inputs.
  • Performance degrades on rare or complex biological cases and needs up‑to‑date domain data.
  • Relies on external tools and LLM APIs—errors or hallucinations upstream can still propagate.
  • Safety gates depend on user compliance; bypasses could expose private sequences.

When Not To Use

  • Clinical decision‑making or patient care without expert oversight
  • Designs for human germline or embryo editing (legal/ethical restrictions apply)
  • Scenarios requiring fully audited, certified software for regulated labs

Failure Modes

  • Proposed sgRNA sequences that do not align to the target genome if external checks are skipped
  • Incomplete protocols missing reagent quantities or timing details in edge cases
  • API/tool failures (e.g., Primer3 or web search) breaking automated pipelines
  • False sense of safety if the user ignores moratorium warnings or privacy filters

Core Entities

Models

  • gpt-4-0613
  • ChatGPT 3.5
  • GPT-4

Metrics

  • Accuracy
  • reasoning (1-5)
  • completeness (1-5)
  • conciseness (1-5)
  • NGS editing rate

Datasets

  • Broad Institute gold-standard gRNA libraries
  • pre-designed multi-species guide RNA database

Context Entities

Models

  • Gemini
  • Claude