LONGREPS: Reasoning Path Supervision for Long-Context Language Models
An episode of the Build Wiz AI Show podcast, hosted by Build Wiz AI, titled "LONGREPS: Reasoning Path Supervision for Long-Context Language Models" was published on March 17, 2025 and runs 17 minutes.
March 17, 2025 ·17m · Build Wiz AI Show
Summary
The provided paper, "Chain-of-Thought Matters: Improving Long-Context Language Models with Reasoning Path Supervision," investigates the effectiveness of Chain-of-Thought (CoT) prompting for large language models dealing with long-context tasks, finding that CoT's benefits generally extend and amplify with longer contexts. To enhance performance in these scenarios, the authors introduce LONGREPS, a novel process-supervised framework that trains models to generate high-quality reasoning paths. This framework employs self-sampling of reasoning paths and a specific quality assessment protocol tailored for long contexts, evaluating both answer correctness and process reliability through source faithfulness and intrinsic consistency. Experimental results demonstrate that LONGREPS significantly improves long-context question answering and generalization capabilities compared to standard outcome supervision.
Episode Description
The provided paper, "Chain-of-Thought Matters: Improving Long-Context Language Models with Reasoning Path Supervision," investigates the effectiveness of Chain-of-Thought (CoT) prompting for large language models dealing with long-context tasks, finding that CoT's benefits generally extend and amplify with longer contexts. To enhance performance in these scenarios, the authors introduce LONGREPS, a novel process-supervised framework that trains models to generate high-quality reasoning paths. This framework employs self-sampling of reasoning paths and a specific quality assessment protocol tailored for long contexts, evaluating both answer correctness and process reliability through source faithfulness and intrinsic consistency. Experimental results demonstrate that LONGREPS significantly improves long-context question answering and generalization capabilities compared to standard outcome supervision.
Similar Episodes
Jan 15, 2016 ·18m
Dec 23, 2015 ·40m
Dec 18, 2015 ·9m
Dec 7, 2015 ·16m
Nov 11, 2015 ·10m