Overview
Production Readiness
0.6
Novelty Score
0.6
Cost Impact Score
0.5
Citation Count
3
Why It Matters For Business
ChemVLM reduces manual image-to-structure work and improves multimodal chemistry question answering; it can speed tasks that mix diagrams and text, but requires substantial compute.
Summary TLDR
ChemVLM is an open-source multimodal model (vision encoder + chemical LLM) trained and finetuned to read chemical images, answer multimodal chemistry exam questions, and predict molecular properties. It outperforms general-purpose multimodal LLMs on the authors' Chemical OCR, MMCR-Bench, and MMChemBench datasets and matches or surpasses GPT-4V on several chemistry tasks. Specialized OCR tools still give higher pure SMILES accuracy. Code and data links are provided.
Problem Statement
Text-only chemical LLMs miss structure and reaction images. Existing image->SMILES tools convert modality but do not perform deeper multimodal reasoning. Chemists spend time manually redrawing images; a chemistry-focused multimodal LLM could read images and reason jointly with text.
Main Contribution
ChemVLM: an open-source multimodal chemistry model that pairs a ViT-based image encoder with ChemLLM-20B and an MLP projector.
Three new evaluation datasets: ChemOCR (SMILES OCR), MMCR-Bench (exam-style multimodal chemistry Q&A), and MMChemBench (molecule caption + property prediction).
Two-stage training recipe using LoRA and DeepSpeed, with public code and training/test data provided.
Key Findings
ChemVLM achieves strong chemical OCR quality among multimodal LLMs.
ChemVLM produces many exact SMILES matches compared to general MLLMs.
ChemVLM edges out GPT-4V on custom multimodal exam questions.
Adding image modality improves molecular property prediction.
Results
Avg Tanimoto similarity (ChemOCR)
Tanimoto@1.0 (ChemOCR)
Accuracy
Accuracy
MMChemBench: property prediction
MMChemBench: molecule caption
Who Should Care
What To Try In 7 Days
Clone the repo and run the provided inference on a small ChemOCR sample to compare outputs.
Test ChemVLM on your molecule images to see gains in property prediction vs text-only models.
Use LoRA finetuning on a small in-house dataset to adapt the model to your lab's image styles.
Agent Features
Tool Use
- RDKit for SMILES validation
- Deepspeed for training
Frameworks
- LoRA
- DeepSpeed ZeRO-3
Architectures
- ViT-MLP-LLM
- Vision Transformer (InternViT-6B)
- ChemLLM-20B as LLM backbone
Optimization Features
Token Efficiency
- Context length 2048 tokens to allow long responses
Infra Optimization
- Trained on 16× A100 (80GB) GPUs
Model Optimization
- LoRA
System Optimization
- Gradient accumulation and small per-GPU batch sizes
Training Optimization
- Two-stage training: align projector then full finetune
- Use of DeepSpeed bf16 and ZeRO-3 for memory efficiency
Reproducibility
Data Urls
Code Available
Data Available
Open Source Status
- yes
Risks & Boundaries
Limitations
- Only image and text modalities supported; no molecular-graph or time-series inputs.
- Training was done for one epoch on large infra; generalization beyond reported datasets is unproven.
- Specialized OCR tools still exceed ChemVLM on pure SMILES extraction accuracy.
When Not To Use
- When you need highest-possible SMILES extraction accuracy (use MolScribe/Decimer).
- In low-resource settings where running large ViT+LLM models is infeasible.
- When your data includes graphs or time-series not supported by the model.
Failure Modes
- Hallucinated reasoning or incorrect chemical statements on complex questions.
- Incorrect SMILES output in cases with noisy or stylized images.
- Overfitting to exam-style question templates causing brittle generalization.
Core Entities
Models
- ChemVLM-26B
- ChemLLM-20B
- InternViT-6B
- GPT-4V
- Qwen-VL-Chat
- LLaVA-v1.5-13B
- InternVL-v1.5
- Yi-VL-Plus
- Decimer
- MolScribe
Metrics
- Avg Tanimoto similarity (%)
- Tanimoto@1.0 (%)
- Accuracy
- Total score (%)
Datasets
- ChemOCR
- MMCR-Bench
- MMChemBench
- CMMU
- ScienceQA
- ChemBench
- Scibench
- DECIMER HDM
- PEACE
- USPTO-50K
- OpenDataLab
Benchmarks
- ChemOCR
- MMCR-Bench
- MMChemBench
- CMMU
- ScienceQA
- Scibench

