Recover lost accuracy in corrupted small LMs by training tiny LoRA adapters with synthetic data and logit distillation

October 6, 20257 min

Overview

Decision SnapshotNeeds Validation

Method is simple and pragmatic, with consistent gains on three of four SLMs and clear data/parameter efficiency. Evidence is limited to small models and a simulated corruption setup.

Citations0

Evidence Strength0.70

Confidence0.80

Risk Signals10

Trust Signals

Findings with numeric evidence: 5/5

Findings with evidence refs: 5/5

Results with explicit delta: 5/6

Reproducibility

Status: No open assets linked

Open source: Partial

At A Glance

Cost impact: 70%

Production readiness: 60%

Novelty: 60%

Authors

Devleena Das, Rajeev Patwari, Ashish Sirasao

Links

Abstract / PDF

Why It Matters For Business

Deployments can silently corrupt weights during conversion or serialization. Recover-LoRA offers a low-cost way to restore accuracy without labeled data or full retraining, saving time and lowering risk for edge and on-device models.

Who Should Care

Summary TLDR

Recover-LoRA trains small LoRA adapters using synthetic data and logit distillation to restore accuracy in functionally degraded small language models (SLMs). On four SLMs, Recover-LoRA recovered an average of +5–17% accuracy on three models while using far less data and far fewer trainable parameters than full-model distillation or supervised finetuning. It can fail on some architectures, so adapter placement and matching tokenizer/data generation matter.

Problem Statement

Deployment conversions or bad serialization can silently corrupt model weights and drop task accuracy. Full retraining or labeled data may be unavailable. How can we cheaply restore accuracy when weights are degraded, without labeled data?

Main Contribution

Recover-LoRA: a lightweight, data-free method that trains only LoRA adapters to align a degraded model to its full-precision reference via logit distillation.

Empirical study on four small models (1B–2B) across seven downstream tasks showing Recover-LoRA often recovers accuracy while using much less data and fewer trainable parameters than alternatives.

Key Findings

Recover-LoRA recovered non-zero accuracy on three of four tested SLMs.

NumbersAR% = +17.24 (AMD-OLMO-SFT 1B), +13.38 (Llama3.2 1B), +4.95 (DeepSeek-R1 1.5B)

Practical UseTry Recover-LoRA first for small corrupted models: it often restores tens of percent of lost accuracy without labeled data.

Evidence RefTable 2

Recover-LoRA failed or reduced accuracy on at least one model architecture (Gemma2 2B).

NumbersAR% = -7.45 (Gemma2 2B)

Practical UseDon't assume universal success: validate per-model. If AR% is negative, change adapter placement, increase synthetic data, or revert to supervised finetuning.

Evidence RefTable 2

Results

MetricValueBaselineDeltaSplit / DatasetEvidenceEvidence Ref
LoRAAMD-OLMO-SFT 1B: +17.24%Degraded vs pretrained+17.24%Average across 7 eval datasetsTable 2 reports Avg AR% for Recover-LoRA on AMD-OLMO-SFTTable 2
LoRALlama3.2 1B: +13.38%Degraded vs pretrained+13.38%Average across 7 eval datasetsTable 2 reports Avg AR% for Recover-LoRA on Llama3.2Table 2

What To Try In 7 Days

Reproduce: generate 100k synthetic samples from the original tokenizer and run Recover-LoRA on your degraded SLM.

Adapter search: test LoRA on K/V layers first, then try attention+MLP if recovery is limited.

Baseline check: compare AR% versus a small supervised LoRA run (if labeled data exists) to validate synthetic-data approach.

Optimization Features

Model Optimization
LoRA
Training Optimization
logit distillation with synthetic datahybrid sampling (first 3–5 tokens greedy, rest stochastic)

Reproducibility

Code AvailableNo
Data AvailableNo
Open Source StatusPartial
LicenseUnknown

Risks & Boundaries

Limitations

Works best on tested small models (1B–2B); larger models (7B+) not evaluated.

Model-dependent: adapter placement (K/V vs attention+MLP) materially affects success.

When Not To Use

If the model architecture or tokenizer prevents matching synthetic-data generation.

If degradation is structural (missing layers) rather than small weight corruption.

Failure Modes

Negative AR% (method can worsen performance) as observed for Gemma2 2B.

Overfitting when updating all parameters (LLM QAT*), leading to worse accuracy.

Core Entities

Models

SFTLlama3.2 1BGemma2 2BDeepSeek-R1-Distill-Qwen 1.5B

Metrics

AccuracyL2 norm difference (weight perturbation)

Datasets

HellaSwagMMLU (three subsets: Philosophy, Management, Astronomy)ARC ChallengeWinoGrandePiQAOpenBookQABoolQ