PETALS: run and fine-tune 50B+ LLMs by pooling unreliable consumer GPUs over the Internet
PETALS lets teams share idle consumer GPUs to run 50B+ models interactively, cutting the need for expensive multi‑GPU servers and lowering inference latency versus RAM offloading; consider privacy and trust tradeoffs.
Key finding
Distributed approach (PETALS) gives big interactive speedups vs single‑GPU offloading.
Numbers: ≥10× faster for autoregressive generation (paper claim)

