Deep Learning is Not So Mysterious or Different - Prof. Andrew Gordon Wilson (NYU) episode artwork

EPISODE · Sep 19, 2025 · 2H 3M

Deep Learning is Not So Mysterious or Different - Prof. Andrew Gordon Wilson (NYU)

from Machine Learning Street Talk (MLST)

Professor Andrew Wilson from NYU explains why many common-sense ideas in artificial intelligence might be wrong. For decades, the rule of thumb in machine learning has been to fear complexity. The thinking goes: if your model has too many parameters (is "too complex") for the amount of data you have, it will "overfit" by essentially memorizing the data instead of learning the underlying patterns. This leads to poor performance on new, unseen data. This is known as the classic "bias-variance trade-off" i.e. a balancing act between a model that's too simple and one that's too complex.**SPONSOR MESSAGES**—Tufa AI Labs is an AI research lab based in Zurich. **They are hiring ML research engineers!** This is a once in a lifetime opportunity to work with one of the best labs in EuropeContact Benjamin Crouzier - https://tufalabs.ai/ —Take the Prolific human data survey - https://www.prolific.com/humandatasurvey?utm_source=mlst and be the first to see the results and benchmark their practices against the wider community!—cyber•Fund https://cyber.fund/?utm_source=mlst is a founder-led investment firm accelerating the cybernetic economyOct SF conference - https://dagihouse.com/?utm_source=mlst - Joscha Bach keynoting(!) + OAI, Anthropic, NVDA,++Hiring a SF VC Principal: https://talent.cyber.fund/companies/cyber-fund-2/jobs/57674170-ai-investment-principal#content?utm_source=mlstSubmit investment deck: https://cyber.fund/contact?utm_source=mlst— Description Continued:Professor Wilson challenges this fundamental belief (fearing complexity). He makes a few surprising points:**Bigger Can Be Better**: massive models don't just get more flexible; they also develop a stronger "simplicity bias". So, if your model is overfitting, the solution might paradoxically be to make it even bigger.**The "Bias-Variance Trade-off" is a Misnomer**: Wilson claims you don't actually have to trade one for the other. You can have a model that is incredibly expressive and flexible while also being strongly biased toward simple solutions. He points to the "double descent" phenomenon, where performance first gets worse as models get more complex, but then surprisingly starts getting better again.**Honest Beliefs and Bayesian Thinking**: His core philosophy is that we should build models that honestly represent our beliefs about the world. We believe the world is complex, so our models should be expressive. But we also believe in Occam's razor—that the simplest explanation is often the best. He champions Bayesian methods, which naturally balance these two ideas through a process called marginalization, which he describes as an automatic Occam's razor.TOC:[00:00:00] Introduction and Thesis[00:04:19] Challenging Conventional Wisdom[00:11:17] The Philosophy of a Scientist-Engineer[00:16:47] Expressiveness, Overfitting, and Bias[00:28:15] Understanding, Compression, and Kolmogorov Complexity[01:05:06] The Surprising Power of Generalization[01:13:21] The Elegance of Bayesian Inference[01:33:02] The Geometry of Learning[01:46:28] Practical Advice and The Future of AIProf. Andrew Gordon Wilson:https://x.com/andrewgwilshttps://cims.nyu.edu/~andrewgw/https://scholar.google.com/citations?user=twWX2LIAAAAJ&hl=en https://www.youtube.com/watch?v=Aja0kZeWRy4 https://www.youtube.com/watch?v=HEp4TOrkwV4 TRANSCRIPT:https://app.rescript.info/public/share/H4Io1Y7Rr54MM05FuZgAv4yphoukCfkqokyzSYJwCK8Hosts:Dr. Tim Scarfe / Dr. Keith Duggar (MIT Ph.D)REFS:Deep Learning is Not So Mysterious or Different [Andrew Gordon Wilson]https://arxiv.org/abs/2503.02113Bayesian Deep Learning and a Probabilistic Perspective of Generalization [Andrew Gordon Wilson, Pavel Izmailov]https://arxiv.org/abs/2002.08791Compute-Optimal LLMs Provably Generalize Better With Scale [Marc Finzi, Sanyam Kapoor, Diego Granziol, Anming Gu, Christopher De Sa, J. Zico Kolter, Andrew Gordon Wilson]https://arxiv.org/abs/2504.15208

Professor Andrew Wilson from NYU explains why many common-sense ideas in artificial intelligence might be wrong. For decades, the rule of thumb in machine learning has been to fear complexity. The thinking goes: if your model has too many parameters (is "too complex") for the amount of data you have, it will "overfit" by essentially memorizing the data instead of learning the underlying patterns. This leads to poor performance on new, unseen data. This is known as the classic "bias-variance trade-off" i.e. a balancing act between a model that's too simple and one that's too complex.**SPONSOR MESSAGES**—Tufa AI Labs is an AI research lab based in Zurich. **They are hiring ML research engineers!** This is a once in a lifetime opportunity to work with one of the best labs in EuropeContact Benjamin Crouzier - https://tufalabs.ai/ —Take the Prolific human data survey - https://www.prolific.com/humandatasurvey?utm_source=mlst and be the first to see the results and benchmark their practices against the wider community!—cyber•Fund https://cyber.fund/?utm_source=mlst is a founder-led investment firm accelerating the cybernetic economyOct SF conference - https://dagihouse.com/?utm_source=mlst - Joscha Bach keynoting(!) + OAI, Anthropic, NVDA,++Hiring a SF VC Principal: https://talent.cyber.fund/companies/cyber-fund-2/jobs/57674170-ai-investment-principal#content?utm_source=mlstSubmit investment deck: https://cyber.fund/contact?utm_source=mlst— Description Continued:Professor Wilson challenges this fundamental belief (fearing complexity). He makes a few surprising points:**Bigger Can Be Better**: massive models don't just get more flexible; they also develop a stronger "simplicity bias". So, if your model is overfitting, the solution might paradoxically be to make it even bigger.**The "Bias-Variance Trade-off" is a Misnomer**: Wilson claims you don't actually have to trade one for the other. You can have a model that is incredibly expressive and flexible while also being strongly biased toward simple solutions. He points to the "double descent" phenomenon, where performance first gets worse as models get more complex, but then surprisingly starts getting better again.**Honest Beliefs and Bayesian Thinking**: His core philosophy is that we should build models that honestly represent our beliefs about the world. We believe the world is complex, so our models should be expressive. But we also believe in Occam's razor—that the simplest explanation is often the best. He champions Bayesian methods, which naturally balance these two ideas through a process called marginalization, which he describes as an automatic Occam's razor.TOC:[00:00:00] Introduction and Thesis[00:04:19] Challenging Conventional Wisdom[00:11:17] The Philosophy of a Scientist-Engineer[00:16:47] Expressiveness, Overfitting, and Bias[00:28:15] Understanding, Compression, and Kolmogorov Complexity[01:05:06] The Surprising Power of Generalization[01:13:21] The Elegance of Bayesian Inference[01:33:02] The Geometry of Learning[01:46:28] Practical Advice and The Future of AIProf. Andrew Gordon Wilson:https://x.com/andrewgwilshttps://cims.nyu.edu/~andrewgw/https://scholar.google.com/citations?user=twWX2LIAAAAJ&hl=en https://www.youtube.com/watch?v=Aja0kZeWRy4 https://www.youtube.com/watch?v=HEp4TOrkwV4 TRANSCRIPT:https://app.rescript.info/public/share/H4Io1Y7Rr54MM05FuZgAv4yphoukCfkqokyzSYJwCK8Hosts:Dr. Tim Scarfe / Dr. Keith Duggar (MIT Ph.D)REFS:Deep Learning is Not So Mysterious or Different [Andrew Gordon Wilson]https://arxiv.org/abs/2503.02113Bayesian Deep Learning and a Probabilistic Perspective of Generalization [Andrew Gordon Wilson, Pavel Izmailov]https://arxiv.org/abs/2002.08791Compute-Optimal LLMs Provably Generalize Better With Scale [Marc Finzi, Sanyam Kapoor, Diego Granziol, Anming Gu, Christopher De Sa, J. Zico Kolter, Andrew Gordon Wilson]https://arxiv.org/abs/2504.15208

NOW PLAYING

Deep Learning is Not So Mysterious or Different - Prof. Andrew Gordon Wilson (NYU)

0:00 2:03:48

No transcript for this episode yet

We transcribe on demand. Request one and we'll notify you when it's ready — usually under 10 minutes.

French Your Way Jessica: Native French teacher founder of French Your Way Boost your French listening skills and test your comprehension with this one of a kind series of podcasts. Get the chance to listen to a real conversation between native speakers talking at normal speed AND customise your learning experience through carefully designed sets of questions (2 levels of difficulty) available for download at www.frenchvoicespodcast.com. All interviews also come with the transcript. French teacher Jessica interviews native speakers of French from around the world who share a bit of their life and passion. Where else would you meet in one same place a French yoga teacher based in Melbourne, a soap manufacturer from Provence, or a couple cycling around the world? Kaizen Blueprint Aldo Chandra "Kaizen" is a Japanese term for continuous improvement. This podcast provides a blueprint to learn about health, wealth, relationships and everything else in between. Through our podcast, we strive to inspire, educate, and motivate our audience to cultivate a mindset of lifelong learning, productivity, and personal development. By sharing insights, strategies, and practical tips, we aim to guide listeners on their journey towards realizing their fullest potential, fostering success, and creating lasting positive change. One Man Went To Row PepperDawesMedia Follow the journey, from training to finish line, of a man from Derby, UK who is going from having only ever rowed on a machine to rowing 3000 miles solo across the Atlantic...just after his 70th birthday! Humanizing Change Tremendousness Join us each episode as we talk with innovators in their respective fields about their unique journeys and how they humanize change in their own work, right here, on Humanizing Change.

Frequently Asked Questions

How long is this episode of Machine Learning Street Talk (MLST)?

This episode is 2 hours and 3 minutes long.

When was this Machine Learning Street Talk (MLST) episode published?

This episode was published on September 19, 2025.

What is this episode about?

Professor Andrew Wilson from NYU explains why many common-sense ideas in artificial intelligence might be wrong. For decades, the rule of thumb in machine learning has been to fear complexity. The thinking goes: if your model has too many parameters...

Can I download this Machine Learning Street Talk (MLST) episode?

Yes, you can download this episode by clicking the download button on the episode player, or subscribe to the podcast in your preferred podcast app for automatic downloads.
URL copied to clipboard!