Question-aware prompt compression that speeds up LLMs and often improves accuracy on very long contexts
If you run LLMs on long documents, compressing prompts per question saves API cost and latency while often improving answer quality, so you can serve more queries at lower cost.
Key finding
Compressed prompts can improve accuracy vs. original long prompts on multi-document QA.
Numbers: NaturalQuestions: up to +21.4% (Abstract; Table 1)

