Overview
SEA is practical: compute SVD on collected activations offline, apply inexpensive projections at inference; linear SEA is low-risk, Φ-SEA trades larger bias gains for some degradation in other skills and needs careful testing.
Citations1
Evidence Strength0.80
Confidence0.80
Risk Signals9
Trust Signals
Findings with numeric evidence: 5/5
Findings with evidence refs: 5/5
Results with explicit delta: 5/5
Reproducibility
Status: Code + data available
Open source: Partial
At A Glance
Cost impact: 75%
Production readiness: 75%
Novelty: 70%
Why It Matters For Business
SEA gives a low-cost way to reduce hallucinations and bias at inference time, letting teams improve trustworthiness without full model fine-tuning or heavy compute.
Who Should Care
Summary TLDR
SEA (Spectral Editing of Activations) is a training-free, inference-time method that uses singular value decomposition (SVD) on cross-covariances of model activations to push activations toward ‘positive’ demonstrations and away from ‘negative’ ones. Linear SEA gives modest but consistent gains in truthfulness and fairness with very low compute overhead; a non-linear variant (Φ-SEA) gives larger bias fixes but can slightly degrade some other skills. SEA works with small demonstration sets (as few as 25 examples) and across multiple open-source LLM families.
Problem Statement
Large language models still produce hallucinations and biased outputs. Existing fixes need expensive fine-tuning or complex decoding tweaks. Can we change model behavior cheaply at inference time by editing internal activations so outputs become more truthful and fair without retraining?
Main Contribution
Introduce SEA: a training-free method that finds linear editing projections by SVD on cross-covariances between neutral, positive and negative activations.
Extend SEA to non-linear editing (Φ-SEA) via invertible feature maps and pseudo-inverses to capture non-linearly separable behaviors.
Key Findings
Linear SEA raises MC1 truthfulness on TruthfulQA for LLaMA-2-chat-7B.
Φ-SEA (non-linear) greatly improves bias accuracy on BBQ for LLaMA-2-chat-7B.
Results
| Metric | Value | Baseline | Delta | Split / Dataset | Evidence | Evidence Ref |
|---|---|---|---|---|---|---|
| TruthfulQA MC1 (LLaMA-2-Chat-7B) | 39.41 | 36.96 (ICL) | +2.45 | TruthfulQA multiple-choice | Table 1 (SEA N=2000, K=99.8%, L=21) | Table 1; Section 4.1 |
| Accuracy | 56.17 | 43.02 (ICL) | +13.15 | BBQ disambiguated evaluation | Section 4.2; Table 3 (Φ-SEA, squared-exponential) | Table 3; Section 4.2 |
What To Try In 7 Days
Collect 25–200 positive/negative demonstration pairs for your task.
Compute linear SEA projections (SVD on cross-covariances) for the top layers.
Apply edits on the last few MLP outputs and compare output accuracy and latency vs baseline ICL and LoRA-FT on a dev set.
Optimization Features
Token Efficiency
Infra Optimization
Model Optimization
Training Optimization
Inference Optimization
Reproducibility
Code URLs
Risks & Boundaries
Limitations
Φ-SEA's pseudo-inverse feature transforms are not lossless and can reduce performance on some control tasks.
Requires paired positive and negative demonstrations; quality of these demos strongly affects results.
When Not To Use
You lack reliable positive/negative demonstrations for the target behavior.
Your application cannot tolerate any drop in downstream control tasks (e.g., math or commonsense) from non-linear edits.
Failure Modes
Overfitting to idiosyncratic patterns in the demonstration set, causing brittle edits.
Introducing new biases if demonstrations themselves are biased or unrepresentative.

