EPISODE · Oct 20, 2024 · 12 MIN
Inference Scaling for Long-Context RAG
from LlamaCast · host Shahriar Shariati
🗓 Inference Scaling for Long-Context Retrieval Augmented GenerationThis research paper explores the effectiveness of inference scaling for retrieval augmented generation (RAG), a technique that enhances large language models (LLMs) by incorporating external knowledge. The authors introduce two strategies, demonstration-based RAG (DRAG) and iterative demonstration-based RAG (IterDRAG), for effectively scaling inference computation. They demonstrate that increasing inference computation, when optimally allocated, leads to nearly linear gains in RAG performance. Furthermore, they develop a computation allocation model to predict the optimal test-time compute allocation for various tasks and scenarios, showcasing its effectiveness in achieving performance gains and aligning with experimental results.📎 Link to paper
NOW PLAYING
Inference Scaling for Long-Context RAG
No transcript for this episode yet
Similar Episodes
No similar episodes found.
Similar Podcasts
No similar podcasts found.