QLoRA drastically lowers hardware cost and complexity for finetuning large LLMs, enabling teams to build custom chatbots and models on single consumer or pro GPUs and therefore speed development, lower cloud spend, and protect data privacy.
Key finding
QLoRA reduces the memory needed to finetune a 65B model from more than 780 GB to under 48 GB
Numbers: >780 GB -> <48 GB

