Pick subsets of open-source LLMs per query to improve quality while cutting inference cost
You can cut ensemble inference cost by roughly 4× while improving automatic quality, making LLM deployment cheaper and more scalable for high-throughput services.
Key finding
MODI achieves higher automatic-quality than prior ensembling on MixInstruct.
Numbers: BARTScore: MODI −2.14 vs LLM-BLENDER −2.77 (Δ +0.63)

