PODCAST · technology

ProductLore

by Manikanta

Craft Stories. Build Products.

2

#ProductLore - EP 02 - #DeepSeek R1: Reasoning via Reinforcement Learning

This podcast episode explores the groundbreaking research behind DeepSeek-R1, a state-of-the-art, open-source reasoning model. The episode delves into how DeepSeek-R1 is trained using large-scale reinforcement learning techniques. It explains the key differences between DeepSeek-R1 and DeepSeek-R1-Zero, highlighting that DeepSeek-R1-Zero is trained without supervised fine-tuning. Key topics covered include: • The Group Relative Policy Optimization (GRPO) method, a rule-based reinforcement learning approach used by DeepSeek. This method uses accuracy and format rewards. • The self-evolution process of DeepSeek-R1-Zero, where the model learns to allocate more thinking time for reasoning tasks. • The "Aha moment" phenomenon, where DeepSeek-R1-Zero reevaluates and corrects its reasoning. • The multi-stage training pipeline of DeepSeek-R1, which includes cold-start, reasoning reinforcement learning, rejection sampling, and diverse reinforcement learning. • How DeepSeek-R1 addresses readability issues and language inconsistencies found in DeepSeek-R1-Zero. • The impressive performance of DeepSeek-R1, which is comparable to or surpasses OpenAI's o1 model on various benchmarks. • The distillation of DeepSeek-R1 into smaller models with high reasoning capabilities. This episode also touches on the DeepSeek team's unsuccessful attempts using process reward models and Monte Carlo Tree Search, and how these experiences helped refine their current approach. Finally, the podcast underscores the importance of reinforcement learning in enhancing model reasoning capabilities, showing how DeepSeek-R1 demonstrates a significant advancement in the field Content Source: www.github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf www.aipapersacademy.com/deepseek-r1/ Tech Help: notebooklm.google.com wavve.co wondercraft.ai #podcast #deepseek #openai #generativeai #aiproductmanagement #wavve #wondercraftai #notebooklm #gemini #artificialintelligence #machinelearning #businessanalyst #productmanagement #businessnews #trending #aitools #trendingtopic #nvidia #llm

Feb 2, 2025

18m
1

Product Lore - EP 1 - AI Product Management, Model Training Costs, and Export Controls

A comprehensive look at how AI is reshaping product management, highlighting the need for a new kind of PM with a blend of technical, strategic, and adaptable skills. The podcast underscores AI's transformative role across all stages of the product lifecycle, while acknowledging the challenges and ethical considerations involved in adopting AI technologies. Content Sources: https://www.researchgate.net/publication/379393439_AI_AND_PRODUCT_MANAGEMENT_A_THEORETICAL_OVERVIEW_FROM_IDEA_TO_MARKET https://www.deeplearning.ai/the-batch/issue-284/

Jan 17, 2025

18m

Type above to search every episode's transcript for a word or phrase. Matches are scoped to this podcast.

Searching…

We're indexing this podcast's transcripts for the first time — this can take a minute or two. We'll show results as soon as they're ready.

No matches for "" in this podcast's transcripts.

Showing of matches

No topics indexed yet for this podcast.

Loading reviews...

Share your thoughts

ABOUT THIS SHOW

Craft Stories. Build Products.

HOSTED BY

Manikanta

Frequently Asked Questions

How many episodes does ProductLore have?

ProductLore currently has 2 episodes available on PodParley. New episodes are automatically indexed when they're published to the podcast feed.

What is ProductLore about?

Craft Stories. Build Products.

How often does ProductLore release new episodes?

ProductLore has 2 episodes. Check the episode list to see recent publication dates and frequency.

Where can I listen to ProductLore?

You can listen to ProductLore on PodParley by clicking any episode. We provide an embedded audio player for direct listening, and you can also subscribe via your preferred podcast app using the RSS feed.

Who hosts ProductLore?

ProductLore is created and hosted by Manikanta.

URL copied to clipboard!

#ProductLore - EP 02 - #DeepSeek R1: Reasoning via Reinforcement Learning

Product Lore - EP 1 - AI Product Management, Model Training Costs, and Export Controls

Authentication Required

Frequently Asked Questions

How many episodes does ProductLore have?

What is ProductLore about?

How often does ProductLore release new episodes?

Where can I listen to ProductLore?

Who hosts ProductLore?