Overview
The method shows consistent metric gains and large runtime reductions on two tasks and two languages using one backbone; however evaluation is limited to a single model, tasks, and hardware which reduces generality.
Citations0
Evidence Strength0.70
Confidence0.78
Risk Signals12
Trust Signals
Findings with numeric evidence: 5/5
Findings with evidence refs: 5/5
Results with explicit delta: 4/4
Reproducibility
Status: Code + data available
Open source: Partial
At A Glance
Cost impact: 70%
Production readiness: 60%
Novelty: 60%
Why It Matters For Business
GRASP LoRA cuts tuning runs and labeled dev needs by learning a pruning rate online, lowering compute and development cost while often improving quality on low-resource language transfer.
Who Should Care
Summary TLDR
GRASP LoRA learns a single global prune ratio for merged LoRA adapters using a lightweight GRPO controller that probes candidate sparsities on a tiny micro dev slice. This replaces expensive grid search with one controller pass plus one final fine-tune. On English→Arabic/Chinese transfer (XL-Sum summarization and MLQA QA) it finds fractional prune rates, improves generation/QA metrics over strong baselines, and cuts end-to-end runtime 3.9×–7.45×. It is robust to very small micro devs but was validated only on one backbone and two tasks.
Problem Statement
When merging LoRA adapters for cross-lingual transfer, people pick a global prune ratio by grid search. Grid search needs many full training runs and large dev sets, misses fractional optima, and can be brittle. We need a cheap, stable way to learn the right overall sparsity during training.
Main Contribution
GRASP LoRA: treat the global prune ratio as a learnable control variable and optimize it online with a GRPO controller.
A training pipeline that interleaves controller probing on a tiny micro dev slice with normal fine tuning, then performs one final prune+fine-tune at the chosen ratio.
Key Findings
GRASP LoRA improves summarization metrics over best grid-search baseline on XL-Sum.
GRASP LoRA improves extractive QA over best grid baseline on MLQA.
Results
| Metric | Value | Baseline | Delta | Split / Dataset | Evidence | Evidence Ref |
|---|---|---|---|---|---|---|
| XL-Sum Arabic BERTScore-F1 | 75.84 ± 0.13 | Best grid searched merge+prune 74.96 ± 0.25 | +0.88 | XL-Sum test (Arabic) | Table 2 reports GRASP LoRA 75.84 ±0.13 vs grid 74.96 ±0.25 | Table 2 |
| XL-Sum Chinese BERTScore-F1 | 33.62 ± 0.16 | Best grid searched merge+prune 32.00 ± 0.36 | +1.62 | XL-Sum test (Chinese) | Table 2 reports GRASP LoRA 33.62 ±0.16 vs grid 32.00 ±0.36 | Table 2 |
What To Try In 7 Days
Run GRASP LoRA with a 16-example micro dev slice and compare end-to-end runtime vs your current grid search.
Apply GRASP LoRA to one existing English→target adapter pair and measure target dev metrics after the single final prune+fine-tune.
Ablate entropy/anchor settings to find a stable commit schedule for your data.
Optimization Features
Infra Optimization
Model Optimization
System Optimization
Training Optimization
Inference Optimization
Reproducibility
Code URLs
Data URLs
Risks & Boundaries
Limitations
Single backbone (Llama 3 8B) and one hardware setup only.
Only two tasks (XL-Sum, MLQA) and two target languages (Arabic, Chinese).
When Not To Use
You can afford a full grid search and want explicit control over discrete pruning points.
You need layer-wise or per-module pruning policies (GRASP learns a single global ratio).
Failure Modes
Over-pruning if entropy bonus or anchoring are disabled (controller collapses to high p).
Mask thrashing if max-commit ∆max or commit rules are misconfigured.

