Arash Ahmadian on Rethinking RLHF episode artwork

EPISODE · Mar 25, 2024 · 33 MIN

Arash Ahmadian on Rethinking RLHF

from TalkRL: The Reinforcement Learning Podcast · host Robin Ranjit Singh Chauhan

Arash Ahmadian is a Researcher at Cohere and Cohere For AI focussed on Preference Training of large language models. He’s also a researcher at the Vector Institute of AI.Featured ReferenceBack to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMsArash Ahmadian, Chris Cremer, Matthias Gallé, Marzieh Fadaee, Julia Kreutzer, Olivier Pietquin, Ahmet Üstün, Sara HookerAdditional ReferencesSelf-Rewarding Language Models, Yuan et al 2024 Reinforcement Learning: An Introduction, Sutton and Barto 1992Learning from Delayed Rewards, Chris Watkins 1989Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning, Williams 1992

Arash Ahmadian is a Researcher at Cohere and Cohere For AI focussed on Preference Training of large language models. He’s also a researcher at the Vector Institute of AI.Featured ReferenceBack to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMsArash Ahmadian, Chris Cremer, Matthias Gallé, Marzieh Fadaee, Julia Kreutzer, Olivier Pietquin, Ahmet Üstün, Sara HookerAdditional ReferencesSelf-Rewarding Language Models, Yuan et al 2024 Reinforcement Learning: An Introduction, Sutton and Barto 1992Learning from Delayed Rewards, Chris Watkins 1989Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning, Williams 1992

NOW PLAYING

Arash Ahmadian on Rethinking RLHF

0:00 33:30

No transcript for this episode yet

We transcribe on demand. Request one and we'll notify you when it's ready — usually under 10 minutes.

Frequently Asked Questions

How long is this episode of TalkRL: The Reinforcement Learning Podcast?

This episode is 33 minutes long.

When was this TalkRL: The Reinforcement Learning Podcast episode published?

This episode was published on March 25, 2024.

What is this episode about?

Arash Ahmadian is a Researcher at Cohere and Cohere For AI focussed on Preference Training of large language models. He’s also a researcher at the Vector Institute of AI.Featured ReferenceBack to Basics: Revisiting REINFORCE Style Optimization for...

Is there a transcript available for this episode?

Yes, a full transcript is available for this episode. You can read the complete transcript on the episode page.

Can I download this TalkRL: The Reinforcement Learning Podcast episode?

Yes, you can download this episode by clicking the download button on the episode player, or subscribe to the podcast in your preferred podcast app for automatic downloads.
URL copied to clipboard!