Holistic Evaluation of Generative AI Systems // Jineet Doshi // #280 episode artwork

EPISODE · Dec 23, 2024 · 57 MIN

Holistic Evaluation of Generative AI Systems // Jineet Doshi // #280

from MLOps.community · host Demetrios

Jineet Doshi is an award-winning Scientist, Machine Learning Engineer, and Leader at Intuit with over 7 years of experience. He has a proven track record of leading successful AI projects and building machine-learning models from design to production across various domains, which have impacted 100 million customers and significantly improved business metrics, leading to millions of dollars of impact.Holistic Evaluation of Generative AI Systems // MLOps Podcast #280 with Jineet Doshi, Staff AI Scientist or AI Lead at Intuit.// AbstractEvaluating LLMs is essential in establishing trust before deploying them to production. Even post-deployment, evaluation is essential to ensure LLM outputs meet expectations, making it a foundational part of LLMOps. However, evaluating LLMs remains an open problem. Unlike traditional machine learning models, LLMs can perform a wide variety of tasks, such as writing poems, Q&A, summarization, etc. This leads to the question of how to evaluate a system with such broad intelligence capabilities. This talk covers the various approaches for evaluating LLMs, such as classic NLP techniques, red teaming, and newer ones like using LLMs as a judge, along with the pros and cons of each. The talk includes an evaluation of complex GenAI systems like RAG and Agents. It also covers evaluating LLMs for safety and security, and the need to have a holistic approach for evaluating these very capable models.// BioJineet Doshi is an award-winning AI Lead and Engineer with over 7 years of experience. He has a proven track record of leading successful AI projects and building machine learning models from design to production across various domains, which have impacted millions of customers and have significantly improved business metrics, leading to millions of dollars of impact. He is currently an AI Lead at Intuit, where he is one of the architects and developers of their Generative AI platform, which is serving Generative AI experiences for more than 100 million customers around the world. Jineet is also a guest lecturer at Stanford University as part of their Building LLM Applications class. He is on the Advisory Board of the University of San Francisco’s AI Program. He holds multiple patents in the field, is on the steering committee of MLOps World Conference, and has also co-chaired workshops at top AI conferences like KDD. He holds a Master's degree from Carnegie Mellon University.// MLOps Swag/Merchhttps://shop.mlops.community/// Related LinksWebsite: https://www.intuit.com/ --------------- ✌️Connect With Us ✌️ -------------Join our Slack community: https://go.mlops.community/slackFollow us on Twitter: @mlopscommunitySign up for the next meetup: https://go.mlops.community/registerCatch all episodes, blogs, newsletters, and more: https://mlops.community/Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/Connect with Jineet on LinkedIn: https://www.linkedin.com/in/jineetdoshi/Timestamps:[00:00] Jineet's preferred coffee[00:20] Takeaways[01:24] Please like, share, leave a review, and subscribe to our MLOps channels![01:36] LLM evaluation at scale[03:13] Challenges in GenAI evaluation[08:09] Eval products vs platforms[09:28] Evaluation methods for models[14:03] NLP evaluation techniques[25:06] LLM as a judge/jury[31:56] LLMs and pizza brainstorming[34:07] Cost per answer breakdown[38:29] Evaluating RAG systems[44:00] Testing with LLMs and humans[49:23] Evaluating AI use cases[54:19] AI workflow stress testing[55:40] Wrap up

Jineet Doshi is an award-winning Scientist, Machine Learning Engineer, and Leader at Intuit with over 7 years of experience. He has a proven track record of leading successful AI projects and building machine-learning models from design to production across various domains, which have impacted 100 million customers and significantly improved business metrics, leading to millions of dollars of impact.Holistic Evaluation of Generative AI Systems // MLOps Podcast #280 with Jineet Doshi, Staff AI Scientist or AI Lead at Intuit.// AbstractEvaluating LLMs is essential in establishing trust before deploying them to production. Even post-deployment, evaluation is essential to ensure LLM outputs meet expectations, making it a foundational part of LLMOps. However, evaluating LLMs remains an open problem. Unlike traditional machine learning models, LLMs can perform a wide variety of tasks, such as writing poems, Q&A, summarization, etc. This leads to the question of how to evaluate a system with such broad intelligence capabilities. This talk covers the various approaches for evaluating LLMs, such as classic NLP techniques, red teaming, and newer ones like using LLMs as a judge, along with the pros and cons of each. The talk includes an evaluation of complex GenAI systems like RAG and Agents. It also covers evaluating LLMs for safety and security, and the need to have a holistic approach for evaluating these very capable models.// BioJineet Doshi is an award-winning AI Lead and Engineer with over 7 years of experience. He has a proven track record of leading successful AI projects and building machine learning models from design to production across various domains, which have impacted millions of customers and have significantly improved business metrics, leading to millions of dollars of impact. He is currently an AI Lead at Intuit, where he is one of the architects and developers of their Generative AI platform, which is serving Generative AI experiences for more than 100 million customers around the world. Jineet is also a guest lecturer at Stanford University as part of their Building LLM Applications class. He is on the Advisory Board of the University of San Francisco’s AI Program. He holds multiple patents in the field, is on the steering committee of MLOps World Conference, and has also co-chaired workshops at top AI conferences like KDD. He holds a Master's degree from Carnegie Mellon University.// MLOps Swag/Merchhttps://shop.mlops.community/// Related LinksWebsite: https://www.intuit.com/ --------------- ✌️Connect With Us ✌️ -------------Join our Slack community: https://go.mlops.community/slackFollow us on Twitter: @mlopscommunitySign up for the next meetup: https://go.mlops.community/registerCatch all episodes, blogs, newsletters, and more: https://mlops.community/Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/Connect with Jineet on LinkedIn: https://www.linkedin.com/in/jineetdoshi/Timestamps:[00:00] Jineet's preferred coffee[00:20] Takeaways[01:24] Please like, share, leave a review, and subscribe to our MLOps channels![01:36] LLM evaluation at scale[03:13] Challenges in GenAI evaluation[08:09] Eval products vs platforms[09:28] Evaluation methods for models[14:03] NLP evaluation techniques[25:06] LLM as a judge/jury[31:56] LLMs and pizza brainstorming[34:07] Cost per answer breakdown[38:29] Evaluating RAG systems[44:00] Testing with LLMs and humans[49:23] Evaluating AI use cases[54:19] AI workflow stress testing[55:40] Wrap up

NOW PLAYING

Holistic Evaluation of Generative AI Systems // Jineet Doshi // #280

0:00 57:33

No transcript for this episode yet

We transcribe on demand. Request one and we'll notify you when it's ready — usually under 10 minutes.

She’s a Hazard to Herself She’s a Hazard Hi there, I’m Mallory, and I’d like to invite you into our world with “She’s a Hazard to Herself!” Join us as we navigate life with Multiple Sclerosis from the seat of my power wheelchair. Discover stories of resilience, family, and the community we’ve built around chronic illness. Whether you’re impacted by MS or want to learn from our journey, there’s something here for you. So why wait? Subscribe to “She’s a Hazard to Herself” on your favorite podcast app and be part of our journey today. Let’s lift each other up, one episode at a time! Tips, News and Stories for Older Adults Esther C Kane CAPS, C.D.S. "Tips, News, and Stories for Older Adults" delivers weekly insights tailored for seniors. We bring you summaries of curated news, practical advice, and inspiring stories that matter to the 55+ community. From health and finance to technology and lifestyle, our content keeps you informed and engaged. Sourced from trusted outlets, each episode offers valuable information for navigating your golden years. Join us as we explore aging with positivity, wisdom, and engaging stories. Your perfect companion for staying active, learning, and embracing life's later chapters. Prayer Time Heir Waves Prayer Time A podcast especially for our Prayer Time community NEWMORROW SESSIONS - A PodCast Series on the Future of Hospitality Mario C. Bauer, Florian Schneider, Axel Weber & Dr. Tillman Bardt The Newmorrow PodCast is more than a podcast — it's a platform for open dialog on the future of our business, a platform for those building what doesn’t exist yet. Here, we share and embrace our passion for the hospitality industry, but we won’t romanticize the journey. We ask the tough questions, confront uncomfortable truths, and prepare for a future that resists easy answers. We believe that the tougher and wilder times become, the more openly, honestly and humanely people need to talk to each other and act together. We believe, openness, togetherness, and truthfulness should also be cornerstones of a professional community to develop our utopian idea of „open source“. This is a space where visionaries don’t just imagine the future — they wrestle with the paradoxes that shape it: success vs. happiness, data vs. instinct, stability vs. reinvention. Join leaders, entrepreneurs, and thinkers as they share not what made them — but what’s actively shaping them, now and next. So tune in

Frequently Asked Questions

How long is this episode of MLOps.community?

This episode is 57 minutes long.

When was this MLOps.community episode published?

This episode was published on December 23, 2024.

What is this episode about?

Jineet Doshi is an award-winning Scientist, Machine Learning Engineer, and Leader at Intuit with over 7 years of experience. He has a proven track record of leading successful AI projects and building machine-learning models from design to...

Can I download this MLOps.community episode?

Yes, you can download this episode by clicking the download button on the episode player, or subscribe to the podcast in your preferred podcast app for automatic downloads.
URL copied to clipboard!