Overview
Production Readiness
0.5
Novelty Score
0.6
Cost Impact Score
0.6
Citation Count
15
Why It Matters For Business
GraphGPT enables LLMs to use graph structure with low-cost tuning, improving cross-dataset predictions and saving compute by using compact graph tokens instead of long text prompts.
Summary TLDR
GraphGPT injects graph structure into an LLM by projecting precomputed graph embeddings into special "graph tokens" and instruction-tuning the LLM in two stages: (1) self-supervised graph matching to align graph tokens with text, and (2) task-specific instruction tuning. Freezing the LLM and graph encoder and tuning only a lightweight projector keeps costs low. GraphGPT improves supervised and zero-shot node classification and link prediction on OGB-arxiv, PubMed, and Cora versus standard GNNs and base LLMs, and it uses Chain-of-Thought (CoT) distillation to boost performance on hard tasks.
Problem Statement
GNNs need labeled data to generalize well. Pure-text prompts for LLMs lose graph structure or become too long. The problem: how to make LLMs understand graph structure so they generalize across graph tasks and transfer zero-shot without large labeled datasets.
Main Contribution
A text-graph grounding scheme that encodes graph structure as compact graph tokens aligned with text embeddings.
A dual-stage graph instruction tuning: (1) self-supervised graph matching to align structure and language; (2) task-specific instruction tuning for node classification/link prediction.
A lightweight graph-text alignment projector enabling tuning with frozen LLM/graph encoder, plus CoT distillation from GPT-3.5 to improve stepwise reasoning.
Key Findings
GraphGPT improves zero-shot transfer accuracy compared to base LLMs and GNNs on evaluated benchmarks.
Self-supervised graph matching stage materially improves supervised accuracy and zero-shot stability.
Freezing LLM and graph encoder and tuning only the projector cuts tuned parameters by >50× and avoids OOM.
Chain-of-Thought distillation helps on complex, high-class-count datasets (Cora).
Graph tokens drastically reduce token usage versus text-based structure prompts.
Results
Accuracy
Accuracy
Accuracy
Link prediction AUC
Who Should Care
What To Try In 7 Days
Run a small proof: freeze your LLM and graph encoder, train a linear projector on your unlabeled graph subgraphs for a target node classification task.
Compare tokenized subgraph inputs vs text-based graph prompts to measure token and latency savings.
If classes are many or reasoning is needed, add CoT-style distilled instructions from a stronger LLM to your instruction mix.
Agent Features
Frameworks
- Dual-stage instruction tuning
Architectures
- LLM + pre-trained GNN encoder
Optimization Features
Token Efficiency
- Graph tokens: 750 vs text prompts: 4,649 tokens for a 103-node subgraph
Infra Optimization
- Low batch-size training feasible due to small tuned parameter set
Model Optimization
- Freeze large model weights; only tune projector
System Optimization
- Works on single 40G A100 when freezing LLM; tuning full LLM causes OOM
Training Optimization
- Self-supervised graph matching uses unlabeled graphs as instructions
- Two-stage tuning reduces overfitting and supports multitask mixing
Inference Optimization
- Compact graph tokens reduce input length and inference latency
Reproducibility
Code Urls
Data Urls
Code Available
Data Available
Open Source Status
- partial
Risks & Boundaries
Limitations
- Evaluations focus on citation-like graphs (OGB-arxiv, PubMed, Cora); other graph types not tested.
- Method depends on a pre-trained graph encoder; end-to-end learning is not evaluated.
- Base LLM choice affects results; improvements reported for vicuna/baichuan variants only.
- CoT distillation requires access to a stronger closed-source LLM for best effect.
When Not To Use
- If your graph domain is very different from citation/text-attributed graphs and no suitable graph encoder exists.
- If you must fine-tune the LLM weights end-to-end on very large data but lack memory (this method freezes the LLM).
- When legal/operational rules forbid using closed-source models for distillation.
Failure Modes
- Overfitting when skipping self-supervised graph matching (worse zero-shot transfer).
- Poor performance on very high-class problems without CoT or richer instruction data.
- Misalignment if the projector cannot map graph embeddings into the LLM token space for novel graph structures.
Core Entities
Models
- GraphGPT-7B-v1.5
- GraphGPT-7B-v1.1
- vicuna-7B-v1.5
- vicuna-7B-v1.1
- baichuan-7B
- GPT-3.5 (used for CoT distillation)
Metrics
- Accuracy
- Macro-F1
- AUC
- AP
Datasets
- OGB-arxiv
- PubMed
- Cora (expanded, 70 classes)
Benchmarks
- Supervised node classification
- Zero-shot node classification
- Link prediction
Context Entities
Models
- GraphSAGE
- GCN
- GAT
- RevGNN
- DGI
- GKD
- GLNN
- NodeFormer
- DIFFormer
- Node2Vec
- MLP

