Episode 175 - Gemini: A First Look
Episode 175 of the Two Voice Devs podcast, hosted by Mark and Allen, titled "Episode 175 - Gemini: A First Look" was published on December 15, 2023 and runs 41 minutes.
December 15, 2023 ·41m · Two Voice Devs
Summary
In this in-depth chat between Allen Firstenberg and Linda Lawton, they dive into the functionalities and potential of Google's newly released Gemini model. From their initial experiences to exciting possibilities for the future, they discuss the Gemini Pro and Gemini Pro Vision models, how to #BuildWithGemini, its focus on both text and images, and speedier and more cohesive responses compared to older models. They also delve into its potential for multi-modal support, unique reasoning capabilities, and the challenges they've encountered. The conversation draws interesting insights and sparks exciting ideas on how Gemini could evolve in the future. 00:04 Introduction and Welcome 00:23 Discussing the New Gemini Model 01:33 Comparing Gemini and Bison Models 02:07 Exploring Gemini's Vision Model 03:03 Gemini's Response Quality and Speed 03:53 Gemini's Token Length and Context Window 05:05 Gemini's Pricing and Google AI Studio 05:33 Upcoming Projects and Previews 06:16 Gemini's Role in Code Generation 07:54 Gemini's Model Variants and Limitations 12:01 Creating a Python Desktop App with Gemini 14:07 Gemini's Potential for Assisting the Visually Impaired 18:35 Gemini's Ability to Reason and Count 20:15 Gemini's Multi-Step Reasoning 20:33 Testing Gemini with Multiple Images 21:52 Exploring Image Recognition Capabilities 22:13 Discussing the Limitations of 3D Object Recognition 23:53 Testing Image Recognition with Personal Photos 24:52 Potential Applications of Image Recognition 25:45 Exploring the Multimodal Capabilities of the AI 26:41 Discussing the Challenges of Using the AI in Europe 27:26 Exploring the AQA Model and Its Potential 33:37 Discussing the Future of AI and Image Recognition 37:12 Wishlist for Future AI Capabilities 40:11 Wrapping Up and Looking Forward
Episode Description
In this in-depth chat between Allen Firstenberg and Linda Lawton, they dive into the functionalities and potential of Google's newly released Gemini model. From their initial experiences to exciting possibilities for the future, they discuss the Gemini Pro and Gemini Pro Vision models, how to #BuildWithGemini, its focus on both text and images, and speedier and more cohesive responses compared to older models. They also delve into its potential for multi-modal support, unique reasoning capabilities, and the challenges they've encountered. The conversation draws interesting insights and sparks exciting ideas on how Gemini could evolve in the future.
00:04 Introduction and Welcome
00:23 Discussing the New Gemini Model
01:33 Comparing Gemini and Bison Models
02:07 Exploring Gemini's Vision Model
03:03 Gemini's Response Quality and Speed
03:53 Gemini's Token Length and Context Window
05:05 Gemini's Pricing and Google AI Studio
05:33 Upcoming Projects and Previews
06:16 Gemini's Role in Code Generation
07:54 Gemini's Model Variants and Limitations
12:01 Creating a Python Desktop App with Gemini
14:07 Gemini's Potential for Assisting the Visually Impaired
18:35 Gemini's Ability to Reason and Count
20:15 Gemini's Multi-Step Reasoning
20:33 Testing Gemini with Multiple Images
21:52 Exploring Image Recognition Capabilities
22:13 Discussing the Limitations of 3D Object Recognition
23:53 Testing Image Recognition with Personal Photos
24:52 Potential Applications of Image Recognition
25:45 Exploring the Multimodal Capabilities of the AI
26:41 Discussing the Challenges of Using the AI in Europe
27:26 Exploring the AQA Model and Its Potential
33:37 Discussing the Future of AI and Image Recognition
37:12 Wishlist for Future AI Capabilities
40:11 Wrapping Up and Looking Forward
Similar Episodes
Apr 11, 2026 ·61m
Apr 11, 2026 ·107m
Jan 2, 2026 ·13m
Jan 1, 2026 ·12m
Dec 31, 2025 ·13m
Dec 30, 2025 ·7m