Use off-the-shelf LLMs plus arithmetic coding to losslessly compress gradients

September 26, 20247 min

Overview

Decision SnapshotNeeds Validation

Method shows consistent gains across models and datasets, but throughput and implementation bottlenecks lower short-term readiness for production.

Citations0

Evidence Strength0.80

Confidence0.85

Risk Signals12

Trust Signals

Findings with numeric evidence: 4/4

Findings with evidence refs: 4/4

Results with explicit delta: 3/4

Reproducibility

Status: Code + data available

Open source: Yes

At A Glance

Cost impact: 60%

Production readiness: 40%

Novelty: 80%

Authors

Hui-Po Wang, Mario Fritz

Links

Abstract / PDF / Code / Data

Why It Matters For Business

LM-GC can cut gradient bytes by ~6%–17% losslessly, lowering network costs in federated or distributed training, but current runtime is slow and needs systems work before production use.

Who Should Care

Summary TLDR

The authors introduce LM-GC: convert 32-bit gradients into grouped hexadecimal text, feed that text to a frozen pre-trained LLM to get token probabilities, and apply arithmetic coding to compress losslessly. Proper serialization (hex + separators) yields up to 38× token savings and improves lossless compression over general-purpose codecs by about 10%–17.2% on evaluated image-model gradients. LM-GC also combines with quantization and sparsification but is currently slow (≈4 hours to compress 28 MB). Code is available.

Problem Statement

Gradient arrays are high-dimensional and structured, but existing lossless compressors lack a strong statistical prior tailored to gradients. Training a gradient-specific generative model is costly. The paper asks: can off-the-shelf LLMs act as zero-shot priors to enable practical lossless gradient compression?

Main Contribution

LM-GC: a pipeline that serializes 32-bit floats into grouped hexadecimal text, queries a frozen LLM for token probabilities, and uses arithmetic coding for lossless compression.

Showed serialization matters: grouped hex with separators dramatically improves token efficiency and compression compared to raw or ISO encodings.

Key Findings

LM-GC improves lossless compression vs. best baseline on evaluated datasets.

Numbers17.2% improvement (TinyImageNet vs FPZIP); 5.9% (CIFAR-10); 8.8% (MNIST)

Practical UseExpect ~6%–17% smaller gradient payloads in distributed training on similar image-model workloads by replacing generic codecs with LM-GC.

Evidence RefTable 3

Serializing floats as grouped hexadecimal tokens with separators gives large token savings and affects compression strongly.

Numbers≈38× token efficiency; serialization choices caused up to ~70% compression difference (ISO vs Hs)

Practical UseAlways convert floats to grouped hex with clear separators before using LLM priors; poor serialization can increase size instead of reducing it.

Evidence RefSec.4, Table 1

Results

MetricValueBaselineDeltaSplit / DatasetEvidenceEvidence Ref
Compression improvement over best baseline17.2% (TinyImageNet)FPZIP17.2%TinyImageNet (ConvNet)Table 3 shows LM-GC (Hs) = 71.90±0.0 vs FPZIP = 86.88±0.1Table 3
Compression improvement over best baseline5.9%FPZIP5.9%CIFAR-10 (ConvNet)Table 3: LM-GC = 38.83±0.4 vs FPZIP = 41.26±0.8Table 3

What To Try In 7 Days

Serialize a small gradient checkpoint to grouped hex with spaces and run LM-GC on a Tinyllama model to measure compression vs your current codec.

Profile pipeline to find bottleneck: LLM inference vs arithmetic coding; try a quantized LLM or faster arithmetic coder.

Combine LM-GC with your existing quantization/sparsification to see additive bandwidth savings on a small training run.

Optimization Features

Token Efficiency
Grouped hexadecimal serialization with separatorsByte grouping aligned to float fields (sign/exponent/mantissa)
Infra Optimization
Balance context window size vs HW memoryUse A100-like GPUs or optimized inference stacks
Model Optimization
Use larger LLMs for better priors
System Optimization
Move arithmetic coding to optimized C++ or faster CPUParallelize token probability computation
Training Optimization
Combine with quantization and sparsification
Inference Optimization
Quantize LLMs, faster attention, KV-cache optimizations

Reproducibility

Code AvailableYes
Data AvailableYes
Open Source StatusYes
LicenseUnknown

Data URLs

MNIST: http://yann.lecun.com/exdb/mnistCIFAR-10: https://www.cs.toronto.edu/~kriz/cifar.htmlTinyImageNet: https://tiny-imagenet.herokuapp.com/

Risks & Boundaries

Limitations

Current implementation is slow: ~4 hours to compress 28 MB.

Experiments limited to image-model gradients (ConvNet, VGG, ResNet, ViT).

When Not To Use

When you need low-latency gradient transfer or real-time training checkpoints.

When you lack GPU/LLM inference infrastructure or want minimal CPU overhead.

Failure Modes

Poor serialization (e.g., ISO or no separators) increases compressed size.

Small LLM or short context window fails to model dependencies, losing gains.

Core Entities

Models

Tinyllama 1.1BOpenllama 3BLLAMA 2 7B

Metrics

compression rate (%)token efficiency (×)bytes compressedthroughput (time per MB)

Datasets

MNISTCIFAR-10TinyImageNet

Context Entities

Models

PNGFLACGZIPLZMAFPZIPRun-length encoding (RLE)