Train task-focused supervised fine-tuning and preference alignment in parallel, then sparsify and merge adapters to avoid alignment tax.

June 25, 20247 min

Overview

Decision SnapshotReady For Pilot

Results are robust across two base models and two alignment algorithms, but rely on public benchmarks and GPT-4 judgments; dataset quality (UltraChat/UltraFeedback) and merging stability are open risks.

Citations4

Evidence Strength0.80

Confidence0.80

Risk Signals10

Trust Signals

Findings with numeric evidence: 4/4

Findings with evidence refs: 4/4

Results with explicit delta: 4/5

Reproducibility

Status: Partial assets available

Open source: Partial

At A Glance

Cost impact: 60%

Production readiness: 70%

Novelty: 60%

Authors

Shiva Kumar Pentyala, Zhichao Wang, Bin Bi, Kiran Ramnath, Xiang-Bo Mao, Regunathan Radhakrishnan, Sitaram Asur, Na, Cheng

Links

Abstract / PDF / Data

Why It Matters For Business

PAFT can preserve both task accuracy and alignment without retraining large models end-to-end; companies can run SFT and alignment in parallel, sparsify adapters, and merge them to ship stronger, aligned models faster.

Who Should Care

Summary TLDR

PAFT trains supervised fine-tuning (SFT) and preference alignment (DPO/ORPO) in parallel on the same pre-trained model, makes the SFT adapter sparse via an L1 penalty, and then merges the two adapters into a single model. Sparsifying SFT adapters (over 90% sparsity reported) reduces parameter interference during merging and yields stronger merged models. On public benchmarks PAFT-ed models top the HuggingFace Open LLM Leaderboard for the tested size classes and improve AlpacaEval performance versus many baselines.

Problem Statement

Sequentially applying SFT then preference alignment often causes 'alignment tax'—the aligned model loses or degrades capabilities learned by SFT. The paper asks whether training SFT and alignment in parallel, plus sparsifying adapters, reduces that tax and yields a stronger merged model.

Main Contribution

Introduce PAFT: learn SFT and preference-alignment adapters in parallel on the same base model and fuse them by weight merging.

Show SFT adapters are dense while alignment adapters are naturally sparse; add L1 during SFT to push sparsity and reduce interference.

Key Findings

Parallel training (PAFT) plus L1-sparsified SFT improves merged-model scores versus sequential or standalone training on the 6-task Open LLM suite.

NumbersPAFT (SFTsparse + DPO) avg=0.65243 vs DPO-alone 0.6333 (Mistral-7B)

Practical UseRun SFT and preference alignment concurrently and sparsify SFT adapters to raise merged-model accuracy on broad benchmarks.

Evidence RefTable 1 (Mistral-7B, TIES)

Inducing sparsity in the SFT adapter greatly reduces merging interference and can yield large gains for some merge methods.

NumbersTIES: PAFT 0.65243 vs Parallel SFT+DPO 0.58928≈+0.0631)

Practical UseIf you plan to merge adapters, add an L1 term during SFT; merging non-sparse SFT adapters risks big performance drops.

Evidence RefTable 1 (Mistral-7B, TIES)

Results

MetricValueBaselineDeltaSplit / DatasetEvidenceEvidence Ref
Avg score on 6-task Open LLM suite (Mistral-7B, TIES merge)0.65243DPO-alone 0.6333+0.01913Open LLM Leaderboard (ARC,HellaSwag,MMLU,TruthfulQA,Winograde,GSM8K)Table 1 (Mistral-7B, PAFT SFTsparse+DPO TIES)Table 1
TIES merge gap: sparse vs non-sparse (Mistral-7B)PAFT 0.65243 vs Parallel SFT+DPO 0.58928Parallel SFT+DPO 0.58928+0.06315Open LLM Leaderboard (6-task avg)Table 1 (TIES rows)Table 1

What To Try In 7 Days

Train SFT and DPO adapters in parallel on your base model using LoRA.

Add small L1 regularization (λ≈1e-4 or 1e-3) to SFT to induce sparsity.

Experiment with simple merging (TIES, Task Arithmetic or linear) and evaluate merged model on your core metrics.

Optimization Features

Infra Optimization
LoRA
Model Optimization
Merge sparse adapters into base weightsUse TIES/Task Arithmetic/SLERP merges
System Optimization
Avoid retraining full model by merging adapters
Training Optimization
SFTLoRA
Inference Optimization
Merged single model for inference (no extra runtime adapters)

Reproducibility

Code AvailableNo
Data AvailableYes
Open Source StatusPartial
LicenseUnknown

Data URLs

UltraChat (Zephyr/UltraChat dataset referenced)UltraFeedback (Zephyr/UltraFeedback dataset referenced)

Risks & Boundaries

Limitations

No causal explanation why DPO adapters are naturally sparse and SFT adapters are dense.

Scalability and operational workflow for iterative merges in production is underexplored.

When Not To Use

You cannot merge adapters reliably due to incompatible architectures or runtime constraints.

Your SFT data is not similar to dialogue or is highly out-of-domain relative to alignment data.

Failure Modes

Merged model still suffers from parameter interference if SFT sparsity is insufficient.

Retraining the merged model can induce catastrophic forgetting of earlier traits.

Core Entities

Models

Mistral-7BLlama-3-8BNeurotic-7BMoMo70BEin-70BPAFT-ed 7BPAFT-ed 70B

Metrics

Average over ARC/HellaSwag/MMLU/TruthfulQA/Winograde/GSM8KAlpacaEval pairwise win-rate vs GPT-4

Datasets

UltraChatUltraFeedback

Benchmarks

HuggingFace Open LLM Leaderboard (6-task suite)AlpacaEval