Computer Vision | Theory: How Computers See in the Real World episode artwork

EPISODE · Nov 17, 2025 · 36 MIN

Computer Vision | Theory: How Computers See in the Real World

from Big Ideas Only · host Montanus

In this episode of Big Ideas Only, host Mikkel Svold takes a theoretical deep dive into how computers “see” with Andreas Møgelmose (Associate Professor of AI, Aalborg University; Visual Analysis & Perception Lab).We unpack the neural-network ideas behind modern vision, why 2012 was a turning point, how convolutional networks work, the difference between training, fine-tuning and adding context, plus explainability, bias traps, multimodality, and what still needs solving.In this episode, you’ll learn about:How a 2012 vision breakthrough reshaped speech and language research2. Neural networks explained simply — how they learn patterns from data3. CNNs: how computers spot shapes and textures in images4. Training, fine-tuning, and adding context to make models smarter5. From hand-crafted features to fully data-driven learning6. Explainability: the “ruler in skin-cancer photos” bias trap and what it teaches us7. Multimodal systems: models combining text, images, and tools8. Depth sensing with stereo, lidar, radar, and time-of-flight — and when 3D is essential9. Privacy and governance: why real risk lies in implementation, not vision itself10. Open challenges: fine-grained recognition, explainability, and machine unlearning11. The pace of progress: steady research with headline-making leapsEpisode Content01:09 How computer vision differs from other AI fields01:16 The 2012 breakthrough: neural networks in vision that spread to speech and text04:05 Neural networks 101: neurons, weights, and simple math scaled up to complex decisions07:06 Training at scale: millions of images, pretraining, and fine-tuning for specific tasks10:39 Fine-tuning vs. adding context in large language models; backpropagation explained16:52 Layered learning: from edges to shapes, faces, and full objects18:22 Before deep learning: feature engineering and why it hit its limits20:44 How it’s built: data collection, architecture design, training loops, and learning plateaus22:54 Bias pitfalls: the “ruler in skin-cancer photos” example and why explainability matters25:23 Regulation and trust: high-risk uses and the demand for transparency26:13 Connecting vision to action: from black-box outputs to robots with “vision in the loop”27:41 Ensemble systems: language models coordinating other models (e.g., text-to-image)29:03 True multimodality: training models jointly on text and images30:17 AGI reflections: embodiment, experience, and the limits of data32:44 Human vision vs. computer vision: depth of field, aperture, and why machines see everything in focus34:40 Is progress slowing or steady? Research milestones versus quiet, continuous work36:43 Public perception: many versions, but most still see “just ChatGPT”37:41 Why the research pace feels natural — more people means faster progressThis podcast is produced by Montanus.

In this episode of Big Ideas Only, host Mikkel Svold takes a theoretical deep dive into how computers “see” with Andreas Møgelmose (Associate Professor of AI, Aalborg University; Visual Analysis & Perception Lab).We unpack the neural-network ideas behind modern vision, why 2012 was a turning point, how convolutional networks work, the difference between training, fine-tuning and adding context, plus explainability, bias traps, multimodality, and what still needs solving.In this episode, you’ll learn about:How a 2012 vision breakthrough reshaped speech and language research2. Neural networks explained simply — how they learn patterns from data3. CNNs: how computers spot shapes and textures in images4. Training, fine-tuning, and adding context to make models smarter5. From hand-crafted features to fully data-driven learning6. Explainability: the “ruler in skin-cancer photos” bias trap and what it teaches us7. Multimodal systems: models combining text, images, and tools8. Depth sensing with stereo, lidar, radar, and time-of-flight — and when 3D is essential9. Privacy and governance: why real risk lies in implementation, not vision itself10. Open challenges: fine-grained recognition, explainability, and machine unlearning11. The pace of progress: steady research with headline-making leapsEpisode Content01:09 How computer vision differs from other AI fields01:16 The 2012 breakthrough: neural networks in vision that spread to speech and text04:05 Neural networks 101: neurons, weights, and simple math scaled up to complex decisions07:06 Training at scale: millions of images, pretraining, and fine-tuning for specific tasks10:39 Fine-tuning vs. adding context in large language models; backpropagation explained16:52 Layered learning: from edges to shapes, faces, and full objects18:22 Before deep learning: feature engineering and why it hit its limits20:44 How it’s built: data collection, architecture design, training loops, and learning plateaus22:54 Bias pitfalls: the “ruler in skin-cancer photos” example and why explainability matters25:23 Regulation and trust: high-risk uses and the demand for transparency26:13 Connecting vision to action: from black-box outputs to robots with “vision in the loop”27:41 Ensemble systems: language models coordinating other models (e.g., text-to-image)29:03 True multimodality: training models jointly on text and images30:17 AGI reflections: embodiment, experience, and the limits of data32:44 Human vision vs. computer vision: depth of field, aperture, and why machines see everything in focus34:40 Is progress slowing or steady? Research milestones versus quiet, continuous work36:43 Public perception: many versions, but most still see “just ChatGPT”37:41 Why the research pace feels natural — more people means faster progressThis podcast is produced by Montanus.

NOW PLAYING

Computer Vision | Theory: How Computers See in the Real World

0:00 36:03

No transcript for this episode yet

We transcribe on demand. Request one and we'll notify you when it's ready — usually under 10 minutes.

No similar episodes found.

Breaking News Show | eTurboNews Juergen Thomas Steinmetz News is relevant to the global travel and tourism industry, human rights and global issues.Breaking news when it happens and only from the source. HOMELAND HOMELAND The Church is a body not a building. It's the bride of Jesus Christ! Jesus is coming back for a mature bride. That means it's time for the church of Jesus Christ to move from milk to meat. This is the hour of maturity!HOMELAND is an announcement that the church is being set free. Only the church has the ability to transform the world. The kingdom's of this world will become the kingdoms of our Lord and Savior!All of creation has been waiting for this moment! Sons and daughters of God are rising up and taking their seat! LIGHTS, CAMERA, SMILE! Creatives Club Media Lights, Camera, Smile, is a podcast for anyone with a dream to share something with the world, out of the overflow of themselves - be it their mind, their heart, their personalities, and much more. Each of us are alive in this moment in time, with an innate ability to have ideas and create various things to benefit both ourselves and the people around us for a reason, and here, you will find the encouragement, the inspiration, and the motivation to do just that. Hosted by Cicily, founder of Creatives Club, she dives into various topics surrounding creativity and business. Exploring entrepreneurship for creatives in a corporate reality, sharing tips and tricks in a media centered company, answering questions regarding what a creative actually is are just a few of the things discussed on this podcast. Be encouraged to create for yourself as Cicily gets vulnerable by pivoting the camera to herself for the first time.To submit questions for Cicily to answer, or have her address certain t The Lee Olsen Show Lee Olsen CJF I want to help you improve all areas of your life by 3 types of podcasts!👉Blood, Sweat & Blessings-Interviews of normal people that have achieved BIG things!👉Series!!! For Love of the Horse- Brad Jackman DVM & Lee Olsen CJF, how to help your horse!👉Business Tips- Proven Life Changing Business Strategies with Lee Olsen

Frequently Asked Questions

How long is this episode of Big Ideas Only?

This episode is 36 minutes long.

When was this Big Ideas Only episode published?

This episode was published on November 17, 2025.

What is this episode about?

In this episode of Big Ideas Only, host Mikkel Svold takes a theoretical deep dive into how computers “see” with Andreas Møgelmose (Associate Professor of AI, Aalborg University; Visual Analysis & Perception Lab).We unpack the neural-network ideas...

Is there a transcript available for this episode?

Yes, a full transcript is available for this episode. You can read the complete transcript on the episode page.

Can I download this Big Ideas Only episode?

Yes, you can download this episode by clicking the download button on the episode player, or subscribe to the podcast in your preferred podcast app for automatic downloads.
URL copied to clipboard!