Overview
Experiments use a controlled oracle and several datasets/model sizes; findings are robust within the tested T5-family setups but need validation for very large models and different domains.
Citations1
Evidence Strength0.80
Confidence0.88
Risk Signals8
Trust Signals
Findings with numeric evidence: 4/4
Findings with evidence refs: 4/4
Results with explicit delta: 0/4
Reproducibility
Status: Partial assets available
Open source: Unknown
At A Glance
Cost impact: 50%
Production readiness: 60%
Novelty: 60%
Why It Matters For Business
If you distill models from imperfect teachers, fixed offline distillation can degrade real-world quality; using online or more diverse data keeps smaller models reliable.
Who Should Care
Summary TLDR
The paper defines and tests "teacher hacking": during distillation a student can learn to exploit teacher imperfections and drift away from true behavior. In a controlled setup with an oracle model, they show teacher hacking appears when distilling from a fixed offline dataset (multi‑epoch). Online generation (sampling fresh responses each epoch) prevents hacking. Two cheaper fixes also work: increase prompt diversity or pre-generate multiple completions per prompt. They provide a practical diagnostic (proxy-golden curves and deviations from polynomial convergence) that can be measured without access to an oracle.
Problem Statement
Knowledge distillation trains small models to imitate larger teacher LMs, but teachers are imperfect proxies for the true data. Does distillation cause students to overfit to teacher flaws ("teacher hacking")? When does it happen, how to detect it from training logs, and how to prevent it in practice?
Main Contribution
Formal definition of "teacher hacking": student moves closer to teacher while moving away from ground truth.
Controlled semi-synthetic experimental framework using an oracle model to measure true distance.
Key Findings
Teacher hacking appears when distilling on a fixed offline dataset and training for many epochs.
Online response generation (fresh samples per epoch) prevents teacher hacking across datasets and model sizes.
Results
| Metric | Value | Baseline | Delta | Split / Dataset | Evidence | Evidence Ref |
|---|---|---|---|---|---|---|
| Occurrence of teacher hacking | Observed (yes) for offline fixed datasets with long training | No hacking under online generation | — | XSum; confirmed on WMT-14 en-de and Natural Instructions | Figure 4, Figure 8 | Section 4 |
| Minimum online data fraction needed to stabilize golden metric | ≈10% online student data substantially stabilizes golden metric | 0% online (fully offline) shows hacking | — | XSum (mixture experiment) | Figure 13 | Section A.4 |
What To Try In 7 Days
Add online generation to distillation (even 10% student-generated batches helps).
Increase prompt diversity for your distillation dataset rather than repeating prompts.
If online is impossible, pre-generate several completions per prompt to expand coverage.
Optimization Features
Model Optimization
Training Optimization
Reproducibility
Data URLs
Risks & Boundaries
Limitations
Semi-synthetic setup uses an oracle model, which may not capture all real-world teacher biases.
Experiments are limited to T5-family models and three datasets; behavior on much larger LMs is untested.
When Not To Use
If you can run only one epoch of distillation (teacher hacking is minimal for 1–3 epochs).
If your prompt pool is tiny and cannot be diversified; then pre-generate many completions instead.
Failure Modes
Standard overfitting (proxy metric increases) can occur and needs different remedies.
Teacher hacking may transfer unsafe or misleading behaviors from teacher to student.

