Overview
Production Readiness
0.6
Novelty Score
0.6
Cost Impact Score
0.7
Citation Count
0
Why It Matters For Business
GRASP LoRA cuts tuning runs and labeled dev needs by learning a pruning rate online, lowering compute and development cost while often improving quality on low-resource language transfer.
Summary TLDR
GRASP LoRA learns a single global prune ratio for merged LoRA adapters using a lightweight GRPO controller that probes candidate sparsities on a tiny micro dev slice. This replaces expensive grid search with one controller pass plus one final fine-tune. On English→Arabic/Chinese transfer (XL-Sum summarization and MLQA QA) it finds fractional prune rates, improves generation/QA metrics over strong baselines, and cuts end-to-end runtime 3.9×–7.45×. It is robust to very small micro devs but was validated only on one backbone and two tasks.
Problem Statement
When merging LoRA adapters for cross-lingual transfer, people pick a global prune ratio by grid search. Grid search needs many full training runs and large dev sets, misses fractional optima, and can be brittle. We need a cheap, stable way to learn the right overall sparsity during training.
Main Contribution
GRASP LoRA: treat the global prune ratio as a learnable control variable and optimize it online with a GRPO controller.
A training pipeline that interleaves controller probing on a tiny micro dev slice with normal fine tuning, then performs one final prune+fine-tune at the chosen ratio.
Empirical gains: improved XL-Sum and MLQA metrics on Arabic and Chinese and a 3.90×–7.45× reduction in total runtime versus an 8-point grid search.
Key Findings
GRASP LoRA improves summarization metrics over best grid-search baseline on XL-Sum.
GRASP LoRA improves extractive QA over best grid baseline on MLQA.
Controller-run + final run cuts end-to-end runtime vs. 8-point grid search.
Micro dev size can be very small without breaking result stability.
Controller regularizers stabilize pruning and avoid harmful over-pruning.
Results
XL-Sum Arabic BERTScore-F1
XL-Sum Chinese BERTScore-F1
MLQA Arabic Exact Match (EM)
MLQA Chinese BERTScore-F1
Who Should Care
What To Try In 7 Days
Run GRASP LoRA with a 16-example micro dev slice and compare end-to-end runtime vs your current grid search.
Apply GRASP LoRA to one existing English→target adapter pair and measure target dev metrics after the single final prune+fine-tune.
Ablate entropy/anchor settings to find a stable commit schedule for your data.
Optimization Features
Infra Optimization
- runtime savings 3.9×–7.45× on a single A100 setup
Model Optimization
- adapter-level magnitude pruning
- importance-based per-tensor top-k masking
System Optimization
- replaces N grid-search full runs with one controller pass plus one final run
Training Optimization
- learned global prune ratio optimized online
- controller probes without gradient updates during evaluation
Inference Optimization
- single final sparse adapter reduces served parameter count (implied)
Reproducibility
Code Urls
- github: GRASP LoRA (paper indicates a GitHub repository reference)
Data Urls
- XL-Sum (public)
- MLQA (public)
Code Available
Data Available
Open Source Status
- partial
Risks & Boundaries
Limitations
- Single backbone (Llama 3 8B) and one hardware setup only.
- Only two tasks (XL-Sum, MLQA) and two target languages (Arabic, Chinese).
- Micro dev rewards depend on a tiny fixed slice; out-of-distribution micro devs may mislead the controller.
- No deployment metrics reported (latency, memory, energy).
- Dialectal and broader language family behaviors untested.
When Not To Use
- You can afford a full grid search and want explicit control over discrete pruning points.
- You need layer-wise or per-module pruning policies (GRASP learns a single global ratio).
- You must certify worst-case behavior under extreme distribution shift.
Failure Modes
- Over-pruning if entropy bonus or anchoring are disabled (controller collapses to high p).
- Mask thrashing if max-commit ∆max or commit rules are misconfigured.
- Micro dev not representative leads to a poor chosen p and degraded target performance.
- Optimizer state clearing for newly pruned entries may destabilize training if commits are frequent.
Core Entities
Models
- Llama 3 8B
- LoRA
Metrics
- BERTScore-F1
- BLEU-4
- ROUGE-L
- Exact Match
- Token F1
- ROUGE-1
- ROUGE-2
- chrF
Datasets
- XL-Sum
- MLQA
Context Entities
Models
- SparseGPT
- LLMPruner
- LoRA
- AMC
- HAQ
Datasets
- mBART/mT5 style multilingual pretraining (cited)

