How to Systematically Test and Evaluate Your LLMs Apps // Gideon Mendels // #269

EPISODE · Oct 18, 2024 · 1H 1M

How to Systematically Test and Evaluate Your LLMs Apps // Gideon Mendels // #269

from MLOps.community · host Demetrios

Gideon Mendels is the Chief Executive Officer at Comet, the leading solution for managing machine learning workflows. How to Systematically Test and Evaluate Your LLMs Apps // MLOps Podcast #269 with Gideon Mendels, CEO of Comet.// AbstractWhen building LLM Applications, Developers need to take a hybrid approach from both ML and SW Engineering best practices. They need to define eval metrics and track their entire experimentation to see what is and is not working. They also need to define comprehensive unit tests for their particular use case so they can confidently check if their LLM App is ready to be deployed. // BioGideon Mendels is the CEO and co-founder of Comet, the leading solution for managing machine learning workflows from experimentation to production. He is a computer scientist, ML researcher and entrepreneur at his core. Before Comet, Gideon co-founded GroupWize, where they trained and deployed NLP models processing billions of chats. His journey with NLP and Speech Recognition models began at Columbia University and Google, where he worked on hate speech and deception detection.// MLOps Swag/Merchhttps://mlops-community.myshopify.com/// Related LinksWebsite: https://www.comet.com/site/All the Hard Stuff with LLMs in Product Development // Phillip Carter // MLOps Podcast #170: https://youtu.be/DZgXln3v85sOpik by Comet: https://www.comet.com/site/products/opik/ --------------- ✌️Connect With Us ✌️ -------------Join our Slack community: https://go.mlops.community/slackFollow us on Twitter: @mlopscommunitySign up for the next meetup: https://go.mlops.community/registerCatch all episodes, blogs, newsletters, and more: https://mlops.community/Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/Connect with Gideon on LinkedIn: https://www.linkedin.com/in/gideon-mendels/Timestamps:[00:00] Gideon's preferred coffee[00:17] Takeaways[01:50] A huge shout-out to Comet ML for sponsoring this episode![02:09] Please like, share, leave a review, and subscribe to our MLOps channels![03:30] Evaluation metrics in AI[06:55] LLM Evaluation in Practice[10:57] LLM testing methodologies[16:56] LLM as a judge[18:53] OPIC track function overview[20:33] Tracking user response value[26:32] Exploring AI metrics integration[29:05] Experiment tracking and LLMs[34:27] Micro Macro collaboration in AI[38:20] RAG Pipeline Reproducibility Snapshot[40:15] Collaborative experiment tracking[45:29] Feature flags in CI/CD[48:55] Labeling challenges and solutions[54:31] LLM output quality alerts[56:32] Anomaly detection in model outputs[1:01:07] Wrap up

NOW PLAYING

How to Systematically Test and Evaluate Your LLMs Apps // Gideon Mendels // #269

0:00 1:01:42

No transcript for this episode yet

We transcribe on demand. Request one and we'll notify you when it's ready — usually under 10 minutes.

Photo Breakdown Scott Wyden Kivowitz Photo Breakdown is a podcast in which we explore the world of photography with a trusted guide, host Scott Wyden Kivowitz. His expertise and passion bring the industry to life as we explore the stories, trends, and ideas shaping it today. Join us as we dissect everything from incredible photographs and creative techniques to the latest gear releases and hot topics in the photography community.In each episode, we break down what’s happening behind the scenes - whether it’s making a powerful image, a candid discussion on industry trends, or a reflection on the tools and technology changing how we make photographs. You’ll get insights, expert opinions, and a fresh perspective on what’s top of mind for photographers right now.Anticipate short, engaging episodes brimming with ideas and inspiration. Be part of the conversation by sharing your thoughts, voice notes, and comments. Your participation is what makes our community vibrant and dynamic.It’s more than just photography - everyth Popup Chinese Popup Chinese Fresh from Beijing, PopupChinese teaches Chinese as it is actually spoken. Start with our basic Chinese lessons, and in no time you'll be speaking like a Beijinger. Our free daily podcasts, vibrant community, and love for the real China make us the most powerful and personal way to learn mandarin. Linux Game Cast on Odysee Linux Game Cast Helping the Linux community with gaming, podcasting, live streaming, and audio & video production since 2010. [LinuxGameCast Webzone](https://linuxgamecast.com/) She’s a Hazard to Herself She’s a Hazard Hi there, I’m Mallory, and I’d like to invite you into our world with “She’s a Hazard to Herself!” Join us as we navigate life with Multiple Sclerosis from the seat of my power wheelchair. Discover stories of resilience, family, and the community we’ve built around chronic illness. Whether you’re impacted by MS or want to learn from our journey, there’s something here for you. So why wait? Subscribe to “She’s a Hazard to Herself” on your favorite podcast app and be part of our journey today. Let’s lift each other up, one episode at a time!
URL copied to clipboard!