Overview
Merging is production-ready for settings where models share a pretrained base and tasks are moderately related; advanced methods (sparsification, routing, search) address failures but add complexity and compute.
Citations0
Evidence Strength0.70
Confidence0.85
Risk Signals14
Trust Signals
Findings with numeric evidence: 6/6
Findings with evidence refs: 6/6
Results with explicit delta: 4/5
Reproducibility
Status: Partial assets available
Open source: Partial
At A Glance
Cost impact: 78%
Production readiness: 70%
Novelty: 65%
Why It Matters For Business
Model merging lets you cheaply combine specialist LLMs into one deployable model, saving training cost and inference overhead while enabling rapid capability composition.
Who Should Care
Summary TLDR
This survey organizes model merging — combining multiple trained models into one — around a four-part FUSE taxonomy: Foundations (why merging works), Unification strategies (how to merge), Scenarios (where merging helps), and Ecosystem (tools and gaps). It reviews weight averaging, task-vector arithmetic, sparsification (e.g., TIES, DARE), manifold-aware interpolation (SLERP), MoE-style expert routing, and automated search. It summarizes empirical strengths, common failure modes (interference, sign conflicts, permutation symmetry), practical toolkits (mergekit, Model Soups, LoRA-based hubs), and open problems in scalability, evaluation, and cross-architecture merging.
Problem Statement
Fine-tuned LLMs are proliferating, but training or ensembles are costly. Practitioners need ways to combine existing specialized models into one unified model cheaply and reliably. The core challenges are weight-space symmetries (permutation), parameter interference (sign/magnitude conflicts), need for shared initialization, architectural mismatch, and missing standardized evaluation.
Main Contribution
Proposes the FUSE taxonomy (Foundations, Unification, Scenarios, Ecosystem) to structure model merging research.
Systematically reviews algorithmic families: weight averaging, task-vector arithmetic, sparsification, geometric interpolation, MoE-style routing, and search-based merging.
Key Findings
Merging can preserve most task performance when models share a pretrained initialization.
Sparsification methods reduce interference and enable larger multi-model merges.
Results
| Metric | Value | Baseline | Delta | Split / Dataset | Evidence | Evidence Ref |
|---|---|---|---|---|---|---|
| Accuracy | within 2–3% on evaluated benchmarks | individual fine-tuned models | −2% to −3% typical | multi-task NLP benchmarks (reported experiments) | Section 6.1 (Ilharco et al., 2023; survey synthesis) | Section 6.1 |
| DARE retention | >90% task performance when merging up to six LLMs | individual task models | — | multi-task merging experiments (text) | Section 6.1 (Yu et al., 2023; text) | Section 6.1 |
What To Try In 7 Days
Run Model Soups: average a few compatible fine-tuned checkpoints and validate on a held-out set.
Extract a task vector (fine-tuned minus base) and add/subtract it to modulate capability.
Apply TIES trimming on two task vectors to test sparsification and compare retention ratios.
Optimization Features
Token Efficiency
Infra Optimization
Model Optimization
System Optimization
Training Optimization
Inference Optimization
Reproducibility
Code URLs
Risks & Boundaries
Limitations
Requires architectural identity or adapter-based alternatives for cross-model merges.
Shared pretrained initialization is often necessary for reliable linear combinations.
When Not To Use
Source models trained from different random initializations without alignment tools.
Strongly conflicting task specializations without interference mitigation.
Failure Modes
Negative transfer: merged model performs worse than each constituent on its task.
Emergent unsafe behavior or amplified backdoors from constituent models.

