Overview
Production Readiness
0.6
Novelty Score
0.6
Cost Impact Score
0.7
Citation Count
0
Why It Matters For Business
Deployments can silently corrupt weights during conversion or serialization. Recover-LoRA offers a low-cost way to restore accuracy without labeled data or full retraining, saving time and lowering risk for edge and on-device models.
Summary TLDR
Recover-LoRA trains small LoRA adapters using synthetic data and logit distillation to restore accuracy in functionally degraded small language models (SLMs). On four SLMs, Recover-LoRA recovered an average of +5–17% accuracy on three models while using far less data and far fewer trainable parameters than full-model distillation or supervised finetuning. It can fail on some architectures, so adapter placement and matching tokenizer/data generation matter.
Problem Statement
Deployment conversions or bad serialization can silently corrupt model weights and drop task accuracy. Full retraining or labeled data may be unavailable. How can we cheaply restore accuracy when weights are degraded, without labeled data?
Main Contribution
Recover-LoRA: a lightweight, data-free method that trains only LoRA adapters to align a degraded model to its full-precision reference via logit distillation.
Empirical study on four small models (1B–2B) across seven downstream tasks showing Recover-LoRA often recovers accuracy while using much less data and fewer trainable parameters than alternatives.
Practical guidance: synthetic hybrid sampling, adapter placement choices (e.g., K/V vs attention+MLP), and trade-offs for deployment.
Key Findings
Recover-LoRA recovered non-zero accuracy on three of four tested SLMs.
Recover-LoRA failed or reduced accuracy on at least one model architecture (Gemma2 2B).
Recover-LoRA uses much less synthetic data than supervised finetuning while staying parameter-efficient.
Distilling and updating all model parameters (LLM QAT* adaptation) worsened degradation in experiments.
The simulated corruption magnitude (L2 norm) is small but measurable between original and perturbed weights.
Results
LoRA
LoRA
LoRA
LoRA
Average AR% (LLM QAT* baseline)
Synthetic data used
Who Should Care
What To Try In 7 Days
Reproduce: generate 100k synthetic samples from the original tokenizer and run Recover-LoRA on your degraded SLM.
Adapter search: test LoRA on K/V layers first, then try attention+MLP if recovery is limited.
Baseline check: compare AR% versus a small supervised LoRA run (if labeled data exists) to validate synthetic-data approach.
Optimization Features
Model Optimization
- LoRA
Training Optimization
- logit distillation with synthetic data
- hybrid sampling (first 3–5 tokens greedy, rest stochastic)
Reproducibility
Open Source Status
- partial
Risks & Boundaries
Limitations
- Works best on tested small models (1B–2B); larger models (7B+) not evaluated.
- Model-dependent: adapter placement (K/V vs attention+MLP) materially affects success.
- Tested corruption is simulated improper serialization; other degradation sources (heavy quantization, pruning) need more study.
- Requires synthetic data generated with a matching tokenizer/vocabulary; mismatch hurts results.
When Not To Use
- If the model architecture or tokenizer prevents matching synthetic-data generation.
- If degradation is structural (missing layers) rather than small weight corruption.
- If supervised labeled data and resources for full finetuning are available and preferred.
Failure Modes
- Negative AR% (method can worsen performance) as observed for Gemma2 2B.
- Overfitting when updating all parameters (LLM QAT*), leading to worse accuracy.
- Sensitivity to synthetic data quality and quantity; too few or mismatched samples reduce recovery.
Core Entities
Models
- SFT
- Llama3.2 1B
- Gemma2 2B
- DeepSeek-R1-Distill-Qwen 1.5B
Metrics
- Accuracy
- L2 norm difference (weight perturbation)
Datasets
- HellaSwag
- MMLU (three subsets: Philosophy, Management, Astronomy)
- ARC Challenge
- WinoGrande
- PiQA
- OpenBookQA
- BoolQ

