Episode 227 - LLM Evaluation: Choosing the RIGHT Model episode artwork

EPISODE · Feb 14, 2025 · 38 MIN

Episode 227 - LLM Evaluation: Choosing the RIGHT Model

from Two Voice Devs · host Mark and Allen

Are you overwhelmed by the sheer number of Large Language Models (LLMs) available? Choosing the right LLM for your project isn't about picking the most popular one – it's about understanding your specific needs and rigorously evaluating your options.In this episode of Two Voice Devs, Allen Firstenberg and guest host Brad Nemer, a seasoned product manager, dive deep into the world of LLM evaluation. They go beyond the marketing buzz and explore practical tools and strategies for making informed decisions.Whether you're a developer, a product manager, or just curious about the practical applications of LLMs, this episode provides invaluable insights into making the right choices for your projects. Don't get caught up in the hype – learn how to evaluate LLMs effectively!More Info:https://www.udacity.com/blog/2025/01/how-to-choose-the-right-ai-model-for-your-product.html[00:00:00] Introduction: Meet Brad Niemer[00:00:38] Brad's Journey to Product Management & AI[00:03:12] Collaboration with Noble Ackerson and the LLM Evaluation Challenge[00:05:23] The Role of a Product Manager.[00:07:43] Product manager relation to engineering.[00:13:46] Exploring Evaluation Tools: Hugging Face[00:16:58] Exploring Evaluation Tools: Chatbot Arena (Human Evaluation)[00:20:30] Chatbot Arena: Code Generation Evaluation[00:24:43] Evaluating LLMs: Beyond Chatbots and Truth[00:26:11] Exploring Evaluation Tools: Artificial Analysis (Quality, Speed, Price)[00:28:47] Exploring Evaluation Tools: Galileo (Hallucination Report)[00:31:16] Case Study: DeepSeek and the Importance of Contextual Evaluation[00:34:53] The Future of LLM Testing and Quality Assurance[00:37:49] Wrap Up contact information.#LLM #LargeLanguageModels #AIEvaluation #ProductManagement #TechTalk #TwoVoiceDevs #HuggingFace #GenAI #GenerativeAI #ChatbotArena #ArtificialAnalysis #Galileo #DeepSeek #ChatGPT #Gemini #Mistral #Claude #ModelSelection #AIdevelopment #SoftwareDevelopment #Testing #QA #RAG #MachineLearning #NLP #Coding #TechPodcast #YouTubeTech #Developers

Are you overwhelmed by the sheer number of Large Language Models (LLMs) available? Choosing the right LLM for your project isn't about picking the most popular one – it's about understanding your specific needs and rigorously evaluating your options.In this episode of Two Voice Devs, Allen Firstenberg and guest host Brad Nemer, a seasoned product manager, dive deep into the world of LLM evaluation. They go beyond the marketing buzz and explore practical tools and strategies for making informed decisions.Whether you're a developer, a product manager, or just curious about the practical applications of LLMs, this episode provides invaluable insights into making the right choices for your projects. Don't get caught up in the hype – learn how to evaluate LLMs effectively!More Info:https://www.udacity.com/blog/2025/01/how-to-choose-the-right-ai-model-for-your-product.html[00:00:00] Introduction: Meet Brad Niemer[00:00:38] Brad's Journey to Product Management & AI[00:03:12] Collaboration with Noble Ackerson and the LLM Evaluation Challenge[00:05:23] The Role of a Product Manager.[00:07:43] Product manager relation to engineering.[00:13:46] Exploring Evaluation Tools: Hugging Face[00:16:58] Exploring Evaluation Tools: Chatbot Arena (Human Evaluation)[00:20:30] Chatbot Arena: Code Generation Evaluation[00:24:43] Evaluating LLMs: Beyond Chatbots and Truth[00:26:11] Exploring Evaluation Tools: Artificial Analysis (Quality, Speed, Price)[00:28:47] Exploring Evaluation Tools: Galileo (Hallucination Report)[00:31:16] Case Study: DeepSeek and the Importance of Contextual Evaluation[00:34:53] The Future of LLM Testing and Quality Assurance[00:37:49] Wrap Up contact information.#LLM #LargeLanguageModels #AIEvaluation #ProductManagement #TechTalk #TwoVoiceDevs #HuggingFace #GenAI #GenerativeAI #ChatbotArena #ArtificialAnalysis #Galileo #DeepSeek #ChatGPT #Gemini #Mistral #Claude #ModelSelection #AIdevelopment #SoftwareDevelopment #Testing #QA #RAG #MachineLearning #NLP #Coding #TechPodcast #YouTubeTech #Developers

NOW PLAYING

Episode 227 - LLM Evaluation: Choosing the RIGHT Model

0:00 38:46

No transcript for this episode yet

We transcribe on demand. Request one and we'll notify you when it's ready — usually under 10 minutes.

The Small Business Startup School – Business Notes | Financial Literacy | Retail Psychology – For Professionals & Entrepreneurs The Small Business Startup School Inc. Starting or buying a small business? While personal circumstances may vary, business patterns remain timeless. On The Small Business Startup School, we explore strategies, insights, and practical solutions to help entrepreneurs confidently navigate their journey.Hosted by Ola Williams—a retail entrepreneur, fintech founder, and financial coach with over two decades of experience—this podcast marries financial awareness and retail psychology with optimism to deliver actionable takeaways.Join us to learn, grow, and connect as we uncover the keys to business success.Let’s continue to learn together and be encouraged to keep on connecting! 2 Old Ladies Walking Rozee 2 Old Ladies Walking features the journeys, insights, and light conversation between Liz and Rosie, two women of a certain age who live in the Hudson Valley of New York. From pelvic floor challenges and life with young adult children to food, bird calls, fear of “mad lamb” disease, and myriad topics in between, we cover it all while walking on the scenic trails of the northeast, or wherever our travels take us. Join us and have a listen! Radio Maria Kenya Radio Maria Kenya A Christian voice in Kenya and in the World Two Recruiters: Zero Filter Two Recruiters At Two Recruiters: Zero Filter, we're on a mission to demystify the hiring process, share insider tips, and empower you to maneuver through the professional world with confidence. With more than 30 years of combined experience navigating the intricate web of job markets, talent acquisition, and career development, we're here to spill the tea on everything career related. But wait, there’s more! We will dive into many life topics that are interesting to us as well.  Get ready for a rollercoaster of insights, stories, and no-holds-barred advice!Join us for conversations that matter – where work, life, and authenticity collide in the most unexpected and rewarding ways.

Frequently Asked Questions

How long is this episode of Two Voice Devs?

This episode is 38 minutes long.

When was this Two Voice Devs episode published?

This episode was published on February 14, 2025.

What is this episode about?

Are you overwhelmed by the sheer number of Large Language Models (LLMs) available? Choosing the right LLM for your project isn't about picking the most popular one – it's about understanding your specific needs and rigorously evaluating your...

Can I download this Two Voice Devs episode?

Yes, you can download this episode by clicking the download button on the episode player, or subscribe to the podcast in your preferred podcast app for automatic downloads.
URL copied to clipboard!