Recover lost accuracy in corrupted small LMs by training tiny LoRA adapters with synthetic data and logit distillation

Overview

Decision SnapshotNeeds Validation

Method is simple and pragmatic, with consistent gains on three of four SLMs and clear data/parameter efficiency. Evidence is limited to small models and a simulated corruption setup.

Citations0

Evidence Strength0.70

Confidence0.80

Risk Signals10

Trust Signals

Findings with numeric evidence: 5/5

Findings with evidence refs: 5/5

Results with explicit delta: 5/6

Reproducibility

Status: No open assets linked

Open source: Partial

At A Glance

Cost impact: 70%

Production readiness: 60%

Novelty: 60%

Authors

Devleena Das, Rajeev Patwari, Ashish Sirasao

Links

Abstract / PDF

Why It Matters For Business

Deployments can silently corrupt weights during conversion or serialization. Recover-LoRA offers a low-cost way to restore accuracy without labeled data or full retraining, saving time and lowering risk for edge and on-device models.

Who Should Care

CTO ML Engineer Engineering Lead Product Manager

Summary TLDR

Recover-LoRA trains small LoRA adapters using synthetic data and logit distillation to restore accuracy in functionally degraded small language models (SLMs). On four SLMs, Recover-LoRA recovered an average of +5–17% accuracy on three models while using far less data and far fewer trainable parameters than full-model distillation or supervised finetuning. It can fail on some architectures, so adapter placement and matching tokenizer/data generation matter.

Problem Statement

Deployment conversions or bad serialization can silently corrupt model weights and drop task accuracy. Full retraining or labeled data may be unavailable. How can we cheaply restore accuracy when weights are degraded, without labeled data?

Main Contribution

Recover-LoRA: a lightweight, data-free method that trains only LoRA adapters to align a degraded model to its full-precision reference via logit distillation.

Empirical study on four small models (1B–2B) across seven downstream tasks showing Recover-LoRA often recovers accuracy while using much less data and fewer trainable parameters than alternatives.

Key Findings

Recover-LoRA recovered non-zero accuracy on three of four tested SLMs.

NumbersAR% = +17.24 (AMD-OLMO-SFT 1B), +13.38 (Llama3.2 1B), +4.95 (DeepSeek-R1 1.5B)

Practical UseTry Recover-LoRA first for small corrupted models: it often restores tens of percent of lost accuracy without labeled data.

Evidence RefTable 2

Recover-LoRA failed or reduced accuracy on at least one model architecture (Gemma2 2B).

NumbersAR% = -7.45 (Gemma2 2B)

Practical UseDon't assume universal success: validate per-model. If AR% is negative, change adapter placement, increase synthetic data, or revert to supervised finetuning.

Evidence RefTable 2

Results

Metric	Value	Baseline	Delta	Split / Dataset	Evidence	Evidence Ref
LoRA	AMD-OLMO-SFT 1B: +17.24%	Degraded vs pretrained	+17.24%	Average across 7 eval datasets	Table 2 reports Avg AR% for Recover-LoRA on AMD-OLMO-SFT	Table 2
LoRA	Llama3.2 1B: +13.38%	Degraded vs pretrained	+13.38%	Average across 7 eval datasets	Table 2 reports Avg AR% for Recover-LoRA on Llama3.2	Table 2

What To Try In 7 Days

Reproduce: generate 100k synthetic samples from the original tokenizer and run Recover-LoRA on your degraded SLM.

Adapter search: test LoRA on K/V layers first, then try attention+MLP if recovery is limited.

Baseline check: compare AR% versus a small supervised LoRA run (if labeled data exists) to validate synthetic-data approach.

Optimization Features

Model Optimization

LoRA

Training Optimization

logit distillation with synthetic datahybrid sampling (first 3–5 tokens greedy, rest stochastic)

Reproducibility

Code AvailableNo

Data AvailableNo

Open Source StatusPartial

LicenseUnknown

Risks & Boundaries

Limitations

Works best on tested small models (1B–2B); larger models (7B+) not evaluated.

Model-dependent: adapter placement (K/V vs attention+MLP) materially affects success.

When Not To Use

If the model architecture or tokenizer prevents matching synthetic-data generation.

If degradation is structural (missing layers) rather than small weight corruption.

Failure Modes

Negative AR% (method can worsen performance) as observed for Gemma2 2B.

Overfitting when updating all parameters (LLM QAT*), leading to worse accuracy.

Core Entities

Models

SFTLlama3.2 1BGemma2 2BDeepSeek-R1-Distill-Qwen 1.5B

Metrics

AccuracyL2 norm difference (weight perturbation)

Datasets

HellaSwagMMLU (three subsets: Philosophy, Management, Astronomy)ARC ChallengeWinoGrandePiQAOpenBookQABoolQ

Overview

Trust Signals

Reproducibility

At A Glance

Authors

Links

Why It Matters For Business

Who Should Care

Summary TLDR

Problem Statement

Main Contribution

Key Findings

Recover-LoRA recovered non-zero accuracy on three of four tested SLMs.

Recover-LoRA failed or reduced accuracy on at least one model architecture (Gemma2 2B).

Results

What To Try In 7 Days

Optimization Features

Reproducibility

Risks & Boundaries

Limitations

When Not To Use

Failure Modes

Core Entities

Models

Metrics

Datasets

You May Also Want to Read

Automatically pick high-quality instruction examples to finetune LLMs and cut training cost

Key finding

Pick 5–15% of instruction data using gradient signal-to-noise from a LoRA ensemble to match or beat full-data fine-tuning

Key finding

UrduLLaMA 1.0: fine-tuning LLaMA-3.1 for Urdu with 128M tokens and LoRA

Key finding

Find better pretraining data mixes cheaply by merging component models instead of training many proxies

Key finding

RIRO: reshape inputs then refine outputs to boost LLMs on tiny domain datasets

Key finding