Task vectors can tune fairness: scale them to trade accuracy for group parity

May 30, 20258 min

Overview

Decision SnapshotReady For Pilot

Empirical evidence spans three datasets and four model families with bootstrap CIs; theoretical bounds explain observed trends and predict sensitivity to vector norms.

Citations0

Evidence Strength0.78

Confidence0.79

Risk Signals9

Trust Signals

Findings with numeric evidence: 4/4

Findings with evidence refs: 4/4

Results with explicit delta: 5/5

Reproducibility

Status: Code + data available

Open source: Yes

At A Glance

Cost impact: 65%

Production readiness: 60%

Novelty: 45%

Authors

Hiroki Naganuma, Kotaro Yoshida, Laura Gomezjurado Gonzalez, Takafumi Horie, Yuji Naraki, Ryotaro Shimizu

Links

Abstract / PDF / Code / Data

Why It Matters For Business

Task-vector editing is a low-cost way to tune subgroup parity without full retraining; it can be used as an operational knob to reduce worst-case demographic gaps while keeping accuracy near existing adapt methods.

Who Should Care

Summary TLDR

This paper measures how "task vectors"—the weight differences between a fine-tuned model and its base model—affect group fairness in binary text and image classification. Across hate-speech, toxicity, and age detection datasets and multiple models (LLaMA2-7B, DistilBERT, Qwen-2.5, ViT-Base), uniformly scaled task-vector merges (a single scalar λ) often preserve accuracy while substantially reducing demographic parity difference (DPD) and equalized odds difference (EOD). Injecting subgroup-specific vectors offers an additional, targeted knob: some subgroup vectors improve parity for certain groups, others worsen it. The paper also gives a theoretical bound connecting vector scaling and parity

Problem Statement

Fine-tuning large models is expensive and can keep or amplify subgroup biases. Task arithmetic (adding/subtracting model-weight differences) is cheap, but its effects on group fairness are not well understood. This paper asks: can task-vector edits match accuracy while reducing group disparities, and can scaling or subgroup-specific vectors be used as practical fairness controls?

Main Contribution

First systematic empirical study of group fairness for task arithmetic vs full fine-tuning (FFT) and LoRA across text and vision tasks.

Show that sweeping a single global scaling coefficient λ over merged subgroup task vectors traces a smooth fairness-accuracy frontier: for many λ values, task addition preserves accuracy and reduces DPD/EOD.

Key Findings

Uniformly scaled task-vector merges can reduce group disparities while keeping accuracy close to FFT/LoRA.

NumbersCivil Comments (DistilBERT): Task Addition accuracy ≈0.9395; worst-DPD 0.0454; worst-EOD 0.3358 (Table 2)

Practical UseTry a single λ sweep on validation: many λ ≥ 0.3 deliver similar accuracy and materially lower worst-case DPD/EOD versus FFT/LoRA.

Evidence RefTables 2,3; Figures 2,5,7; Section 5.2; Appendix F

Injecting subgroup-specific task vectors moves fairness in group-dependent ways: some subgroup vectors improve parity, others hurt it.

NumbersExample: Adding Men or Asian vectors reduced DPD/EOD in some settings; adding Native American vector increased DPD/EOD (

Practical UseBefore deploying an injected subgroup vector, validate its effect on all subgroups; don’t assume benefits generalize beyond the targeted subgroup.

Evidence RefFigures 3–4,10–11; Section 5.4

Results

MetricValueBaselineDeltaSplit / DatasetEvidenceEvidence Ref
Accuracy0.9395 (point estimate)DistilBERT SFT 0.94570.9476≈ -0.006-0.008Civil Comments (Gender)Table 2 DistilBERT entriesTable 2
Civil Comments (DistilBERT) Worst-case DPD0.0454 (Task Addition)SFT 0.08870.1101; LoRA 0.07350.0812≈ -0.043 (vs SFT midpoint)Civil Comments (Gender)Table 2 DistilBERT entriesTable 2

What To Try In 7 Days

Compute subgroup-specific task vectors by fine-tuning small models on subgroup slices and subtracting base weights.

Merge vectors with a single scalar λ and sweep λ on validation to find acceptable accuracy–DPD/EOD trade-offs.

Test injecting worst-performing subgroup vectors into FFT models and measure effects across all subgroups before deployment.

Optimization Features

Infra Optimization
Experiments reported with ~30 GPU-hours total (H100); smaller budgets feasible per-vector
Model Optimization
Use of task-vector arithmetic to edit behavior without retraining
System Optimization
One-dimensional λ control reduces tuning complexity
Training Optimization
Compute subgroup fine-tuned models separately; store differences
Inference Optimization
Merged weights applied at load time avoid per-input compute overhead

Reproducibility

Risks & Boundaries

Limitations

Scope limited to open-weight 0.5–7B models; no evaluation on large proprietary API-only models.

Single global λ cannot express richer, intersectional fairness constraints.

When Not To Use

When you only have API access and cannot edit model weights.

When fairness goals require intersectional or counterfactual guarantees beyond DPD/EOD.

Failure Modes

Negative transfer: merging vectors that benefit one group may degrade others.

Overfitting to validation: selecting λ to maximize validation accuracy can increase worst-group disparity.

Core Entities

Models

LLaMA2-7BDistilBERTQwen2.5-0.5BViT-Base/16

Metrics

Demographic Parity Difference (DPD)Equalized Odds Difference (EOD)Accuracy

Datasets

Berkeley D-Lab Hate SpeechCivil CommentsUTKFace