EPISODE · Dec 15, 2023 · 41 MIN
Episode 175 - Gemini: A First Look
from Two Voice Devs · host Mark and Allen
In this in-depth chat between Allen Firstenberg and Linda Lawton, they dive into the functionalities and potential of Google's newly released Gemini model. From their initial experiences to exciting possibilities for the future, they discuss the Gemini Pro and Gemini Pro Vision models, how to #BuildWithGemini, its focus on both text and images, and speedier and more cohesive responses compared to older models. They also delve into its potential for multi-modal support, unique reasoning capabilities, and the challenges they've encountered. The conversation draws interesting insights and sparks exciting ideas on how Gemini could evolve in the future. 00:04 Introduction and Welcome 00:23 Discussing the New Gemini Model 01:33 Comparing Gemini and Bison Models 02:07 Exploring Gemini's Vision Model 03:03 Gemini's Response Quality and Speed 03:53 Gemini's Token Length and Context Window 05:05 Gemini's Pricing and Google AI Studio 05:33 Upcoming Projects and Previews 06:16 Gemini's Role in Code Generation 07:54 Gemini's Model Variants and Limitations 12:01 Creating a Python Desktop App with Gemini 14:07 Gemini's Potential for Assisting the Visually Impaired 18:35 Gemini's Ability to Reason and Count 20:15 Gemini's Multi-Step Reasoning 20:33 Testing Gemini with Multiple Images 21:52 Exploring Image Recognition Capabilities 22:13 Discussing the Limitations of 3D Object Recognition 23:53 Testing Image Recognition with Personal Photos 24:52 Potential Applications of Image Recognition 25:45 Exploring the Multimodal Capabilities of the AI 26:41 Discussing the Challenges of Using the AI in Europe 27:26 Exploring the AQA Model and Its Potential 33:37 Discussing the Future of AI and Image Recognition 37:12 Wishlist for Future AI Capabilities 40:11 Wrapping Up and Looking Forward
What this episode covers
In this in-depth chat between Allen Firstenberg and Linda Lawton, they dive into the functionalities and potential of Google's newly released Gemini model. From their initial experiences to exciting possibilities for the future, they discuss the Gemini Pro and Gemini Pro Vision models, how to #BuildWithGemini, its focus on both text and images, and speedier and more cohesive responses compared to older models. They also delve into its potential for multi-modal support, unique reasoning capabilities, and the challenges they've encountered. The conversation draws interesting insights and sparks exciting ideas on how Gemini could evolve in the future. 00:04 Introduction and Welcome 00:23 Discussing the New Gemini Model 01:33 Comparing Gemini and Bison Models 02:07 Exploring Gemini's Vision Model 03:03 Gemini's Response Quality and Speed 03:53 Gemini's Token Length and Context Window 05:05 Gemini's Pricing and Google AI Studio 05:33 Upcoming Projects and Previews 06:16 Gemini's Role in Code Generation 07:54 Gemini's Model Variants and Limitations 12:01 Creating a Python Desktop App with Gemini 14:07 Gemini's Potential for Assisting the Visually Impaired 18:35 Gemini's Ability to Reason and Count 20:15 Gemini's Multi-Step Reasoning 20:33 Testing Gemini with Multiple Images 21:52 Exploring Image Recognition Capabilities 22:13 Discussing the Limitations of 3D Object Recognition 23:53 Testing Image Recognition with Personal Photos 24:52 Potential Applications of Image Recognition 25:45 Exploring the Multimodal Capabilities of the AI 26:41 Discussing the Challenges of Using the AI in Europe 27:26 Exploring the AQA Model and Its Potential 33:37 Discussing the Future of AI and Image Recognition 37:12 Wishlist for Future AI Capabilities 40:11 Wrapping Up and Looking Forward
NOW PLAYING
Episode 175 - Gemini: A First Look
No transcript for this episode yet
Similar Episodes
Apr 22, 2025 ·32m
Feb 27, 2025 ·0m
Sep 20, 2024 ·57m
Aug 7, 2024 ·16m