Task vectors can tune fairness: scale them to trade accuracy for group parity

Overview

Decision SnapshotReady For Pilot

Empirical evidence spans three datasets and four model families with bootstrap CIs; theoretical bounds explain observed trends and predict sensitivity to vector norms.

Citations0

Evidence Strength0.78

Confidence0.79

Risk Signals9

Trust Signals

Findings with numeric evidence: 4/4

Findings with evidence refs: 4/4

Results with explicit delta: 5/5

Reproducibility

Status: Code + data available

Open source: Yes

At A Glance

Cost impact: 65%

Production readiness: 60%

Novelty: 45%

Authors

Hiroki Naganuma, Kotaro Yoshida, Laura Gomezjurado Gonzalez, Takafumi Horie, Yuji Naraki, Ryotaro Shimizu

Links

Abstract / PDF / Code / Data

Why It Matters For Business

Task-vector editing is a low-cost way to tune subgroup parity without full retraining; it can be used as an operational knob to reduce worst-case demographic gaps while keeping accuracy near existing adapt methods.

Who Should Care

CTO Product Manager ML Engineer Data Scientist

Summary TLDR

This paper measures how "task vectors"—the weight differences between a fine-tuned model and its base model—affect group fairness in binary text and image classification. Across hate-speech, toxicity, and age detection datasets and multiple models (LLaMA2-7B, DistilBERT, Qwen-2.5, ViT-Base), uniformly scaled task-vector merges (a single scalar λ) often preserve accuracy while substantially reducing demographic parity difference (DPD) and equalized odds difference (EOD). Injecting subgroup-specific vectors offers an additional, targeted knob: some subgroup vectors improve parity for certain groups, others worsen it. The paper also gives a theoretical bound connecting vector scaling and parity

Problem Statement

Fine-tuning large models is expensive and can keep or amplify subgroup biases. Task arithmetic (adding/subtracting model-weight differences) is cheap, but its effects on group fairness are not well understood. This paper asks: can task-vector edits match accuracy while reducing group disparities, and can scaling or subgroup-specific vectors be used as practical fairness controls?

Main Contribution

First systematic empirical study of group fairness for task arithmetic vs full fine-tuning (FFT) and LoRA across text and vision tasks.

Show that sweeping a single global scaling coefficient λ over merged subgroup task vectors traces a smooth fairness-accuracy frontier: for many λ values, task addition preserves accuracy and reduces DPD/EOD.

Key Findings

Uniformly scaled task-vector merges can reduce group disparities while keeping accuracy close to FFT/LoRA.

NumbersCivil Comments (DistilBERT): Task Addition accuracy ≈0.9395; worst-DPD 0.0454; worst-EOD 0.3358 (Table 2)

Practical UseTry a single λ sweep on validation: many λ ≥ 0.3 deliver similar accuracy and materially lower worst-case DPD/EOD versus FFT/LoRA.

Evidence RefTables 2,3; Figures 2,5,7; Section 5.2; Appendix F

Injecting subgroup-specific task vectors moves fairness in group-dependent ways: some subgroup vectors improve parity, others hurt it.

NumbersExample: Adding Men or Asian vectors reduced DPD/EOD in some settings; adding Native American vector increased DPD/EOD (

Practical UseBefore deploying an injected subgroup vector, validate its effect on all subgroups; don’t assume benefits generalize beyond the targeted subgroup.

Evidence RefFigures 3–4,10–11; Section 5.4

Results

Metric	Value	Baseline	Delta	Split / Dataset	Evidence	Evidence Ref
Accuracy	0.9395 (point estimate)	DistilBERT SFT 0.9457–0.9476	≈ -0.006– -0.008	Civil Comments (Gender)	Table 2 DistilBERT entries	Table 2
Civil Comments (DistilBERT) Worst-case DPD	0.0454 (Task Addition)	SFT 0.0887–0.1101; LoRA 0.0735–0.0812	≈ -0.043 (vs SFT midpoint)	Civil Comments (Gender)	Table 2 DistilBERT entries	Table 2

What To Try In 7 Days

Compute subgroup-specific task vectors by fine-tuning small models on subgroup slices and subtracting base weights.

Merge vectors with a single scalar λ and sweep λ on validation to find acceptable accuracy–DPD/EOD trade-offs.

Test injecting worst-performing subgroup vectors into FFT models and measure effects across all subgroups before deployment.

Optimization Features

Infra Optimization

Experiments reported with ~30 GPU-hours total (H100); smaller budgets feasible per-vector

Model Optimization

Use of task-vector arithmetic to edit behavior without retraining

System Optimization

One-dimensional λ control reduces tuning complexity

Training Optimization

Compute subgroup fine-tuned models separately; store differences

Inference Optimization

Merged weights applied at load time avoid per-input compute overhead

Reproducibility

Code AvailableYes

Data AvailableYes

Open Source StatusYes

LicenseUnknown

Code URLs

https://github.com/LauraGomezjurado/fairness_task_vector_deploy

Data URLs

https://huggingface.co/datasets/ucberkeley-dlab/measuring-hate-speech https://github.com/oar-medium/civilcomments (Civil Comments public dataset)https://susanqq.github.io/UTKFace/ (UTKFace)

Risks & Boundaries

Limitations

Scope limited to open-weight 0.5–7B models; no evaluation on large proprietary API-only models.

Single global λ cannot express richer, intersectional fairness constraints.

When Not To Use

When you only have API access and cannot edit model weights.

When fairness goals require intersectional or counterfactual guarantees beyond DPD/EOD.

Failure Modes

Negative transfer: merging vectors that benefit one group may degrade others.

Overfitting to validation: selecting λ to maximize validation accuracy can increase worst-group disparity.

Core Entities

Models

LLaMA2-7BDistilBERTQwen2.5-0.5BViT-Base/16

Metrics

Demographic Parity Difference (DPD)Equalized Odds Difference (EOD)Accuracy

Datasets

Berkeley D-Lab Hate SpeechCivil CommentsUTKFace

Overview

Trust Signals

Reproducibility

At A Glance

Authors

Links

Why It Matters For Business

Who Should Care

Summary TLDR

Problem Statement

Main Contribution

Key Findings

Uniformly scaled task-vector merges can reduce group disparities while keeping accuracy close to FFT/LoRA.

Injecting subgroup-specific task vectors moves fairness in group-dependent ways: some subgroup vectors improve parity, others hurt it.

Results

What To Try In 7 Days

Optimization Features

Reproducibility

Code URLs

Data URLs

Risks & Boundaries

Limitations

When Not To Use

Failure Modes

Core Entities

Models

Metrics

Datasets

You May Also Want to Read

Treat fairness as an emergent property in multi-agent systems; a framework and simulation show demographic parity narrows group reward gaps

Key finding

FaiRLLM: a benchmark showing ChatGPT gives uneven recommendations across user attributes

Key finding

FairPy: Open toolkit to measure and reduce token-level bias in common language models

Key finding

Audit how LLM agents communicate: tone and explanations change decisions even when outcomes don't

Key finding

Pool auditors' queries to cut fairness-estimate error — but avoid heavy pre-coordination when many agents join.

Key finding