Natasha Jaques 2 episode artwork

EPISODE · Mar 14, 2023 · 46 MIN

Natasha Jaques 2

from TalkRL: The Reinforcement Learning Podcast · host Robin Ranjit Singh Chauhan

Hear about why OpenAI cites her work in RLHF and dialog models, approaches to rewards in RLHF, ChatGPT, Industry vs Academia, PsiPhi-Learning, AGI and more!  Dr Natasha Jaques is a Senior Research Scientist at Google Brain. Featured References Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog Natasha Jaques, Asma Ghandeharioun, Judy Hanwen Shen, Craig Ferguson, Agata Lapedriza, Noah Jones, Shixiang Gu, Rosalind Picard  Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control Natasha Jaques, Shixiang Gu, Dzmitry Bahdanau, José Miguel Hernández-Lobato, Richard E. Turner, Douglas Eck  PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning Angelos Filos, Clare Lyle, Yarin Gal, Sergey Levine, Natasha Jaques, Gregory Farquhar  Basis for Intentions: Efficient Inverse Reinforcement Learning using Past Experience Marwa Abdulhai, Natasha Jaques, Sergey Levine  Additional References  Fine-Tuning Language Models from Human Preferences, Daniel M. Ziegler et al 2019  Learning to summarize from human feedback, Nisan Stiennon et al 2020  Training language models to follow instructions with human feedback, Long Ouyang et al 2022  

Hear about why OpenAI cites her work in RLHF and dialog models, approaches to rewards in RLHF, ChatGPT, Industry vs Academia, PsiPhi-Learning, AGI and more!  Dr Natasha Jaques is a Senior Research Scientist at Google Brain. Featured References Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog Natasha Jaques, Asma Ghandeharioun, Judy Hanwen Shen, Craig Ferguson, Agata Lapedriza, Noah Jones, Shixiang Gu, Rosalind Picard  Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control Natasha Jaques, Shixiang Gu, Dzmitry Bahdanau, José Miguel Hernández-Lobato, Richard E. Turner, Douglas Eck  PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning Angelos Filos, Clare Lyle, Yarin Gal, Sergey Levine, Natasha Jaques, Gregory Farquhar  Basis for Intentions: Efficient Inverse Reinforcement Learning using Past Experience Marwa Abdulhai, Natasha Jaques, Sergey Levine  Additional References  Fine-Tuning Language Models from Human Preferences, Daniel M. Ziegler et al 2019  Learning to summarize from human feedback, Nisan Stiennon et al 2020  Training language models to follow instructions with human feedback, Long Ouyang et al 2022

NOW PLAYING

Natasha Jaques 2

0:00 46:02

No transcript for this episode yet

We transcribe on demand. Request one and we'll notify you when it's ready — usually under 10 minutes.

Frequently Asked Questions

How long is this episode of TalkRL: The Reinforcement Learning Podcast?

This episode is 46 minutes long.

When was this TalkRL: The Reinforcement Learning Podcast episode published?

This episode was published on March 14, 2023.

What is this episode about?

Hear about why OpenAI cites her work in RLHF and dialog models, approaches to rewards in RLHF, ChatGPT, Industry vs Academia, PsiPhi-Learning, AGI and more!  Dr Natasha Jaques is a Senior Research Scientist at Google Brain. Featured References Way...

Is there a transcript available for this episode?

Yes, a full transcript is available for this episode. You can read the complete transcript on the episode page.

Can I download this TalkRL: The Reinforcement Learning Podcast episode?

Yes, you can download this episode by clicking the download button on the episode player, or subscribe to the podcast in your preferred podcast app for automatic downloads.
URL copied to clipboard!