Task vectors can tune fairness: scale them to trade accuracy for group parity

May 30, 20258 min

Overview

Production Readiness

0.6

Novelty Score

0.45

Cost Impact Score

0.65

Citation Count

0

Authors

Hiroki Naganuma, Kotaro Yoshida, Laura Gomezjurado Gonzalez, Takafumi Horie, Yuji Naraki, Ryotaro Shimizu

Links

Abstract / PDF

Why It Matters For Business

Task-vector editing is a low-cost way to tune subgroup parity without full retraining; it can be used as an operational knob to reduce worst-case demographic gaps while keeping accuracy near existing adapt methods.

Summary TLDR

This paper measures how "task vectors"—the weight differences between a fine-tuned model and its base model—affect group fairness in binary text and image classification. Across hate-speech, toxicity, and age detection datasets and multiple models (LLaMA2-7B, DistilBERT, Qwen-2.5, ViT-Base), uniformly scaled task-vector merges (a single scalar λ) often preserve accuracy while substantially reducing demographic parity difference (DPD) and equalized odds difference (EOD). Injecting subgroup-specific vectors offers an additional, targeted knob: some subgroup vectors improve parity for certain groups, others worsen it. The paper also gives a theoretical bound connecting vector scaling and parity

Problem Statement

Fine-tuning large models is expensive and can keep or amplify subgroup biases. Task arithmetic (adding/subtracting model-weight differences) is cheap, but its effects on group fairness are not well understood. This paper asks: can task-vector edits match accuracy while reducing group disparities, and can scaling or subgroup-specific vectors be used as practical fairness controls?

Main Contribution

First systematic empirical study of group fairness for task arithmetic vs full fine-tuning (FFT) and LoRA across text and vision tasks.

Show that sweeping a single global scaling coefficient λ over merged subgroup task vectors traces a smooth fairness-accuracy frontier: for many λ values, task addition preserves accuracy and reduces DPD/EOD.

Demonstrate subgroup-targeted edits: injecting specific subgroup task vectors into an FFT model can selectively improve or worsen fairness for particular groups.

Provide a theoretical upper bound linking deviations in task-vector scaling to increases in DPD and EOD, explaining empirical sensitivity to vector norms.

Key Findings

Uniformly scaled task-vector merges can reduce group disparities while keeping accuracy close to FFT/LoRA.

NumbersCivil Comments (DistilBERT): Task Addition accuracy ≈0.9395; worst-DPD 0.0454; worst-EOD 0.3358 (Table 2)

Injecting subgroup-specific task vectors moves fairness in group-dependent ways: some subgroup vectors improve parity, others hurt it.

NumbersExample: Adding Men or Asian vectors reduced DPD/EOD in some settings; adding Native American vector increased DPD/EOD (

A theoretical bound shows fairness gaps scale with differences in scaling λ and with the norms of subgroup task vectors.

NumbersUpper bounds in Section 5.1 / Appendix C show DPD/EOD ≤ U(λ), where U(λ)→0 as λg→1 and sensitivity scales with ∥∆θg∥2

Task arithmetic can recover much of LoRA’s fairness gains while closing some of LoRA’s accuracy loss on vision and text tasks.

NumbersUTKFace (ViT): SFT acc 0.8463–0.8599, LoRA acc 0.6389–0.6572, Task Addition acc 0.7227–0.7384 with DPD/EOD close to LoRA

Results

Accuracy

Value0.9395 (point estimate)

BaselineDistilBERT SFT 0.9457–0.9476

Civil Comments (DistilBERT) Worst-case DPD

Value0.0454 (Task Addition)

BaselineSFT 0.0887–0.1101; LoRA 0.0735–0.0812

Accuracy

Value0.810–0.820

BaselineQwen SFT 0.884–0.886; LoRA 0.774–0.790

Accuracy

Value0.7227–0.7384

BaselineSFT 0.8463–0.8599; LoRA 0.6389–0.6572

Typical fairness reductions on Civil Comments (midpoint comparisons)

ValueDPD drops ≈ 41–58%; EOD drops ≈ 34–73% (varies by model/attribute)

BaselineSFT/LoRA comparisons reported in Appendix F

Who Should Care

What To Try In 7 Days

Compute subgroup-specific task vectors by fine-tuning small models on subgroup slices and subtracting base weights.

Merge vectors with a single scalar λ and sweep λ on validation to find acceptable accuracy–DPD/EOD trade-offs.

Test injecting worst-performing subgroup vectors into FFT models and measure effects across all subgroups before deployment.

Optimization Features

Infra Optimization

  • Experiments reported with ~30 GPU-hours total (H100); smaller budgets feasible per-vector

Model Optimization

  • Use of task-vector arithmetic to edit behavior without retraining

System Optimization

  • One-dimensional λ control reduces tuning complexity

Training Optimization

  • Compute subgroup fine-tuned models separately; store differences

Inference Optimization

  • Merged weights applied at load time avoid per-input compute overhead

Reproducibility

Code Available

Data Available

Open Source Status

  • yes

Risks & Boundaries

Limitations

  • Scope limited to open-weight 0.5–7B models; no evaluation on large proprietary API-only models.
  • Single global λ cannot express richer, intersectional fairness constraints.
  • Binary prediction tasks only; multi-label or generative settings not evaluated.

When Not To Use

  • When you only have API access and cannot edit model weights.
  • When fairness goals require intersectional or counterfactual guarantees beyond DPD/EOD.
  • When subgroup task vectors have very large norms and even small λ changes cause large parity swings.

Failure Modes

  • Negative transfer: merging vectors that benefit one group may degrade others.
  • Overfitting to validation: selecting λ to maximize validation accuracy can increase worst-group disparity.
  • Unstable effects on rare subgroups due to small subgroup sample sizes.

Core Entities

Models

  • LLaMA2-7B
  • DistilBERT
  • Qwen2.5-0.5B
  • ViT-Base/16

Metrics

  • Demographic Parity Difference (DPD)
  • Equalized Odds Difference (EOD)
  • Accuracy

Datasets

  • Berkeley D-Lab Hate Speech
  • Civil Comments
  • UTKFace