PodParley PodParley
Inference Scaling for Long-Context RAG

EPISODE · Oct 20, 2024 · 12 MIN

Inference Scaling for Long-Context RAG

from LlamaCast · host Shahriar Shariati

🗓 Inference Scaling for Long-Context Retrieval Augmented GenerationThis research paper explores the effectiveness of inference scaling for retrieval augmented generation (RAG), a technique that enhances large language models (LLMs) by incorporating external knowledge. The authors introduce two strategies, demonstration-based RAG (DRAG) and iterative demonstration-based RAG (IterDRAG), for effectively scaling inference computation. They demonstrate that increasing inference computation, when optimally allocated, leads to nearly linear gains in RAG performance. Furthermore, they develop a computation allocation model to predict the optimal test-time compute allocation for various tasks and scenarios, showcasing its effectiveness in achieving performance gains and aligning with experimental results.📎 Link to paper

NOW PLAYING

Inference Scaling for Long-Context RAG

0:00 12:18

No transcript for this episode yet

We transcribe on demand. Request one and we'll notify you when it's ready — usually under 10 minutes.

No similar episodes found.

No similar podcasts found.

URL copied to clipboard!