Francois Chollet - ARC reflections - NeurIPS 2024 episode artwork

EPISODE · Jan 9, 2025 · 1H 26M

Francois Chollet - ARC reflections - NeurIPS 2024

from Machine Learning Street Talk (MLST)

François Chollet discusses the outcomes of the ARC-AGI (Abstraction and Reasoning Corpus) Prize competition in 2024, where accuracy rose from 33% to 55.5% on a private evaluation set. SPONSOR MESSAGES: *** CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. https://centml.ai/pricing/ Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. Are you interested in working on reasoning, or getting involved in their events? They are hosting an event in Zurich on January 9th with the ARChitects, join if you can. Goto https://tufalabs.ai/ *** Read about the recent result on o3 with ARC here (Chollet knew about it at the time of the interview but wasn't allowed to say): https://arcprize.org/blog/oai-o3-pub-breakthrough TOC: 1. Introduction and Opening [00:00:00] 1.1 Deep Learning vs. Symbolic Reasoning: François’s Long-Standing Hybrid View [00:00:48] 1.2 “Why Do They Call You a Symbolist?” – Addressing Misconceptions [00:01:31] 1.3 Defining Reasoning 3. ARC Competition 2024 Results and Evolution [00:07:26] 3.1 ARC Prize 2024: Reflecting on the Narrative Shift Toward System 2 [00:10:29] 3.2 Comparing Private Leaderboard vs. Public Leaderboard Solutions [00:13:17] 3.3 Two Winning Approaches: Deep Learning–Guided Program Synthesis and Test-Time Training 4. Transduction vs. Induction in ARC [00:16:04] 4.1 Test-Time Training, Overfitting Concerns, and Developer-Aware Generalization [00:19:35] 4.2 Gradient Descent Adaptation vs. Discrete Program Search 5. ARC-2 Development and Future Directions [00:23:51] 5.1 Ensemble Methods, Benchmark Flaws, and the Need for ARC-2 [00:25:35] 5.2 Human-Level Performance Metrics and Private Test Sets [00:29:44] 5.3 Task Diversity, Redundancy Issues, and Expanded Evaluation Methodology 6. Program Synthesis Approaches [00:30:18] 6.1 Induction vs. Transduction [00:32:11] 6.2 Challenges of Writing Algorithms for Perceptual vs. Algorithmic Tasks [00:34:23] 6.3 Combining Induction and Transduction [00:37:05] 6.4 Multi-View Insight and Overfitting Regulation 7. Latent Space and Graph-Based Synthesis [00:38:17] 7.1 Clément Bonnet’s Latent Program Search Approach [00:40:10] 7.2 Decoding to Symbolic Form and Local Discrete Search [00:41:15] 7.3 Graph of Operators vs. Token-by-Token Code Generation [00:45:50] 7.4 Iterative Program Graph Modifications and Reusable Functions 8. Compute Efficiency and Lifelong Learning [00:48:05] 8.1 Symbolic Process for Architecture Generation [00:50:33] 8.2 Logarithmic Relationship of Compute and Accuracy [00:52:20] 8.3 Learning New Building Blocks for Future Tasks 9. AI Reasoning and Future Development [00:53:15] 9.1 Consciousness as a Self-Consistency Mechanism in Iterative Reasoning [00:56:30] 9.2 Reconciling Symbolic and Connectionist Views [01:00:13] 9.3 System 2 Reasoning - Awareness and Consistency [01:03:05] 9.4 Novel Problem Solving, Abstraction, and Reusability 10. Program Synthesis and Research Lab [01:05:53] 10.1 François Leaving Google to Focus on Program Synthesis [01:09:55] 10.2 Democratizing Programming and Natural Language Instruction 11. Frontier Models and O1 Architecture [01:14:38] 11.1 Search-Based Chain of Thought vs. Standard Forward Pass [01:16:55] 11.2 o1’s Natural Language Program Generation and Test-Time Compute Scaling [01:19:35] 11.3 Logarithmic Gains with Deeper Search 12. ARC Evaluation and Human Intelligence [01:22:55] 12.1 LLMs as Guessing Machines and Agent Reliability Issues [01:25:02] 12.2 ARC-2 Human Testing and Correlation with g-Factor [01:26:16] 12.3 Closing Remarks and Future Directions SHOWNOTES PDF: https://www.dropbox.com/scl/fi/ujaai0ewpdnsosc5mc30k/CholletNeurips.pdf?rlkey=s68dp432vefpj2z0dp5wmzqz6&st=hazphyx5&dl=0

François Chollet discusses the outcomes of the ARC-AGI (Abstraction and Reasoning Corpus) Prize competition in 2024, where accuracy rose from 33% to 55.5% on a private evaluation set. SPONSOR MESSAGES: *** CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. https://centml.ai/pricing/ Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. Are you interested in working on reasoning, or getting involved in their events? They are hosting an event in Zurich on January 9th with the ARChitects, join if you can. Goto https://tufalabs.ai/ *** Read about the recent result on o3 with ARC here (Chollet knew about it at the time of the interview but wasn't allowed to say): https://arcprize.org/blog/oai-o3-pub-breakthrough TOC: 1. Introduction and Opening [00:00:00] 1.1 Deep Learning vs. Symbolic Reasoning: François’s Long-Standing Hybrid View [00:00:48] 1.2 “Why Do They Call You a Symbolist?” – Addressing Misconceptions [00:01:31] 1.3 Defining Reasoning 3. ARC Competition 2024 Results and Evolution [00:07:26] 3.1 ARC Prize 2024: Reflecting on the Narrative Shift Toward System 2 [00:10:29] 3.2 Comparing Private Leaderboard vs. Public Leaderboard Solutions [00:13:17] 3.3 Two Winning Approaches: Deep Learning–Guided Program Synthesis and Test-Time Training 4. Transduction vs. Induction in ARC [00:16:04] 4.1 Test-Time Training, Overfitting Concerns, and Developer-Aware Generalization [00:19:35] 4.2 Gradient Descent Adaptation vs. Discrete Program Search 5. ARC-2 Development and Future Directions [00:23:51] 5.1 Ensemble Methods, Benchmark Flaws, and the Need for ARC-2 [00:25:35] 5.2 Human-Level Performance Metrics and Private Test Sets [00:29:44] 5.3 Task Diversity, Redundancy Issues, and Expanded Evaluation Methodology 6. Program Synthesis Approaches [00:30:18] 6.1 Induction vs. Transduction [00:32:11] 6.2 Challenges of Writing Algorithms for Perceptual vs. Algorithmic Tasks [00:34:23] 6.3 Combining Induction and Transduction [00:37:05] 6.4 Multi-View Insight and Overfitting Regulation 7. Latent Space and Graph-Based Synthesis [00:38:17] 7.1 Clément Bonnet’s Latent Program Search Approach [00:40:10] 7.2 Decoding to Symbolic Form and Local Discrete Search [00:41:15] 7.3 Graph of Operators vs. Token-by-Token Code Generation [00:45:50] 7.4 Iterative Program Graph Modifications and Reusable Functions 8. Compute Efficiency and Lifelong Learning [00:48:05] 8.1 Symbolic Process for Architecture Generation [00:50:33] 8.2 Logarithmic Relationship of Compute and Accuracy [00:52:20] 8.3 Learning New Building Blocks for Future Tasks 9. AI Reasoning and Future Development [00:53:15] 9.1 Consciousness as a Self-Consistency Mechanism in Iterative Reasoning [00:56:30] 9.2 Reconciling Symbolic and Connectionist Views [01:00:13] 9.3 System 2 Reasoning - Awareness and Consistency [01:03:05] 9.4 Novel Problem Solving, Abstraction, and Reusability 10. Program Synthesis and Research Lab [01:05:53] 10.1 François Leaving Google to Focus on Program Synthesis [01:09:55] 10.2 Democratizing Programming and Natural Language Instruction 11. Frontier Models and O1 Architecture [01:14:38] 11.1 Search-Based Chain of Thought vs. Standard Forward Pass [01:16:55] 11.2 o1’s Natural Language Program Generation and Test-Time Compute Scaling [01:19:35] 11.3 Logarithmic Gains with Deeper Search 12. ARC Evaluation and Human Intelligence [01:22:55] 12.1 LLMs as Guessing Machines and Agent Reliability Issues [01:25:02] 12.2 ARC-2 Human Testing and Correlation with g-Factor [01:26:16] 12.3 Closing Remarks and Future Directions SHOWNOTES PDF: https://www.dropbox.com/scl/fi/ujaai0ewpdnsosc5mc30k/CholletNeurips.pdf?rlkey=s68dp432vefpj2z0dp5wmzqz6&st=hazphyx5&dl=0

NOW PLAYING

Francois Chollet - ARC reflections - NeurIPS 2024

0:00 1:26:46

No transcript for this episode yet

We transcribe on demand. Request one and we'll notify you when it's ready — usually under 10 minutes.

French Your Way Jessica: Native French teacher founder of French Your Way Boost your French listening skills and test your comprehension with this one of a kind series of podcasts. Get the chance to listen to a real conversation between native speakers talking at normal speed AND customise your learning experience through carefully designed sets of questions (2 levels of difficulty) available for download at www.frenchvoicespodcast.com. All interviews also come with the transcript. French teacher Jessica interviews native speakers of French from around the world who share a bit of their life and passion. Where else would you meet in one same place a French yoga teacher based in Melbourne, a soap manufacturer from Provence, or a couple cycling around the world? Kaizen Blueprint Aldo Chandra "Kaizen" is a Japanese term for continuous improvement. This podcast provides a blueprint to learn about health, wealth, relationships and everything else in between. Through our podcast, we strive to inspire, educate, and motivate our audience to cultivate a mindset of lifelong learning, productivity, and personal development. By sharing insights, strategies, and practical tips, we aim to guide listeners on their journey towards realizing their fullest potential, fostering success, and creating lasting positive change. One Man Went To Row PepperDawesMedia Follow the journey, from training to finish line, of a man from Derby, UK who is going from having only ever rowed on a machine to rowing 3000 miles solo across the Atlantic...just after his 70th birthday! Humanizing Change Tremendousness Join us each episode as we talk with innovators in their respective fields about their unique journeys and how they humanize change in their own work, right here, on Humanizing Change.

Frequently Asked Questions

How long is this episode of Machine Learning Street Talk (MLST)?

This episode is 1 hour and 26 minutes long.

When was this Machine Learning Street Talk (MLST) episode published?

This episode was published on January 9, 2025.

What is this episode about?

François Chollet discusses the outcomes of the ARC-AGI (Abstraction and Reasoning Corpus) Prize competition in 2024, where accuracy rose from 33% to 55.5% on a private evaluation set. SPONSOR MESSAGES: *** CentML offers competitive pricing for...

Can I download this Machine Learning Street Talk (MLST) episode?

Yes, you can download this episode by clicking the download button on the episode player, or subscribe to the podcast in your preferred podcast app for automatic downloads.
URL copied to clipboard!