Overview
Production Readiness
0.7
Novelty Score
0.6
Cost Impact Score
0.7
Citation Count
0
Why It Matters For Business
LoRAP lets teams deploy low-bit GNNs with much of the original accuracy retained while keeping memory and speed gains; it is a small, trainable add-on compatible with existing QAT pipelines.
Summary TLDR
Quantizing Graph Neural Networks (GNNs) to low bit-widths hurts accuracy because feature quantization errors accumulate during neighbor aggregation. LoRAP inserts small, input-dependent low-rank prompts after aggregation (post-aggregation) to directly correct those errors. Across 4 quantization-aware training frameworks and many datasets, LoRAP + node prompts (GPF-LoRAP) consistently recovers accuracy for INT4/low-bit GNNs, sometimes exceeding FP32, while adding little compute and memory when using a fused GPU kernel.
Problem Statement
Low-bit quantization of GNNs saves memory and speed but causes large accuracy drops because quantized node features create biased aggregated messages. Pre-aggregation node prompts cannot reliably fix topology-amplified errors. We need a lightweight, input-aware way to correct aggregation-level quantization error during training.
Main Contribution
Introduce LoRAP: post-aggregation, input-dependent prompts built from a small set of low-rank basis vectors.
Show theoretically that post-aggregation prompts decouple correction from graph operator and allow node-specific bias correction.
Integrate LoRAP into existing QAT pipelines and provide a fused Triton kernel to cut prompt-injection latency.
Empirically validate across 4 QAT methods, 3 GNN architectures, and large/small datasets that LoRAP consistently recovers low-bit performance.
Key Findings
GPF-LoRAP can recover severe INT4 accuracy losses on small benchmarks.
LoRAP sometimes surpasses full-precision accuracy on evaluated tasks.
Fused LoRAP kernel halves prompt injection latency versus naive implementation.
Training overhead of LoRAP is small in practice.
Results
Accuracy
Accuracy
Accuracy
Prompt injection kernel latency
LoRA
Training time (A2Q)
Who Should Care
What To Try In 7 Days
Run a baseline INT4 QAT pipeline on one GNN task and record accuracy/latency.
Add GPF-plus (node prompts) and LoRAP (aggregation prompts) with k≈40, r≈2 and retrain.
Measure accuracy recovery and per-layer latency; enable fused Triton kernel if available for production speedups.
Optimization Features
Infra Optimization
- works with standard GPUs and Triton
Model Optimization
- post-aggregation correction
- low-rank prompt bases
System Optimization
- kernel fusion to reduce DRAM accesses
Training Optimization
- jointly optimize prompts and quantized weights
- small extra training cost
Inference Optimization
- fused Triton kernel to reduce memory traffic
- keep activations/weights low-bit
Reproducibility
Code Available
Data Available
Open Source Status
- partial
Risks & Boundaries
Limitations
- LoRAP uses high-precision prompt generation which adds small FP32 work and storage.
- Approximation limited by prompt rank k; too small k may underfit quantization error.
- Node prompting (pre-aggregation) can be unstable for some aggregations; LoRAP requires implementation changes.
- Best latency requires fused Triton kernel; without it, overhead is larger.
When Not To Use
- When full-precision FP32 models run fine and memory/speed are not constrained.
- When you lack access to retraining or QAT pipeline to jointly optimize prompts.
- When target hardware cannot run fused GPU kernels or support mixed precision.
Failure Modes
- Poor k/r choices lead to under- or over-correction and accuracy drop.
- EdgePrompt+ style additions can hurt performance if combined incorrectly.
- If prompts are not fused, memory traffic can erase speed gains.
Core Entities
Models
- GIN
- GCN
- GAT
- LoRA
- GPF-plus
Metrics
- Accuracy
- Mean Absolute Error (MAE)
- Latency (ms / µs)
- Training time (s)
Datasets
- Cora
- CiteSeer
- PubMed
- ogb-arxiv
- ogb-products
- ogbn-mag
- MNIST (superpixel)
- CIFAR-10 (superpixel)
- REDDIT-BINARY
- ZINC
Benchmarks
- OGB (ogb-arxiv, ogb-products, ogbn-mag)
- REDDIT-BINARY

