Serving LLMs in Production: Performance, Cost & Scale // CAST AI Roundtable

EPISODE · Feb 19, 2026 · 1H 5M

Serving LLMs in Production: Performance, Cost & Scale // CAST AI Roundtable

from MLOps.community · host Demetrios

Roundtable CAST AI episode: Serving LLMs in Production: Performance, Cost & Scale. Join the Community: https://go.mlops.community/YTJoinInGet the newsletter: https://go.mlops.community/YTNewsletterMLOps GPU Guide: https://go.mlops.community/gpuguide// AbstractExperimenting with LLMs is easy. Running them reliably and cost-effectively in production is where things break. Most AI teams never make it past demos and proofs of concept. A smaller group is pushing real workloads to production—and running into very real challenges around infrastructure efficiency, runaway cloud costs, and reliability at scale.This session is for engineers and platform teams moving beyond experimentation and building AI systems that actually hold up in production.// BioIoana ApetreiIoana is a Senior Product Manager at CAST AI, leading the AI Enabler product, an AI Gateway platform for cost-effective LLM infrastructure deployment. She brings 12 years of experience building B2C and B2B products reaching over 10 million users. Outside of work, she enjoys assembling puzzles and LEGOs and watching motorsports.Igor ŠušićIgor is a founding Machine Learning Engineer at CAST AI’s AI Enabler, where he focuses on optimizing inference and training at scale. With a strong background in Natural Language Processing (NLP) and Recommender Systems, Igor has been tackling the challenges of large-scale model optimization long before transformers became mainstream. Prior to CAST AI, he worked at industry leaders like Bloomreach and Infobip, where he contributed to the development and deployment of large-scale AI and personalization systems from the early days of the field.// Related LinksWebsite: https://cast.ai/~~~~~~~~ ✌️Connect With Us ✌️ ~~~~~~~Catch all episodes, blogs, newsletters, and more: https://go.mlops.community/TYExploreJoin our Slack community [https://go.mlops.community/slack]Follow us on X/Twitter [@mlopscommunity](https://x.com/mlopscommunity) or [LinkedIn](https://go.mlops.community/linkedin)] Sign up for the next meetup: [https://go.mlops.community/register]MLOps Swag/Merch: [https://shop.mlops.community/]Connect with Demetrios on LinkedIn: /dpbrinkmConnect with Ioana on LinkedIn: /ioanaapetrei/Connect with Igor on LinkedIn: /igor-%C5%A1u%C5%A1i%C4%87/

NOW PLAYING

Serving LLMs in Production: Performance, Cost & Scale // CAST AI Roundtable

0:00 1:05:55

No transcript for this episode yet

We transcribe on demand. Request one and we'll notify you when it's ready — usually under 10 minutes.

Photo Breakdown Scott Wyden Kivowitz Photo Breakdown is a podcast in which we explore the world of photography with a trusted guide, host Scott Wyden Kivowitz. His expertise and passion bring the industry to life as we explore the stories, trends, and ideas shaping it today. Join us as we dissect everything from incredible photographs and creative techniques to the latest gear releases and hot topics in the photography community.In each episode, we break down what’s happening behind the scenes - whether it’s making a powerful image, a candid discussion on industry trends, or a reflection on the tools and technology changing how we make photographs. You’ll get insights, expert opinions, and a fresh perspective on what’s top of mind for photographers right now.Anticipate short, engaging episodes brimming with ideas and inspiration. Be part of the conversation by sharing your thoughts, voice notes, and comments. Your participation is what makes our community vibrant and dynamic.It’s more than just photography - everyth Popup Chinese Popup Chinese Fresh from Beijing, PopupChinese teaches Chinese as it is actually spoken. Start with our basic Chinese lessons, and in no time you'll be speaking like a Beijinger. Our free daily podcasts, vibrant community, and love for the real China make us the most powerful and personal way to learn mandarin. Linux Game Cast on Odysee Linux Game Cast Helping the Linux community with gaming, podcasting, live streaming, and audio & video production since 2010. [LinuxGameCast Webzone](https://linuxgamecast.com/) She’s a Hazard to Herself She’s a Hazard Hi there, I’m Mallory, and I’d like to invite you into our world with “She’s a Hazard to Herself!” Join us as we navigate life with Multiple Sclerosis from the seat of my power wheelchair. Discover stories of resilience, family, and the community we’ve built around chronic illness. Whether you’re impacted by MS or want to learn from our journey, there’s something here for you. So why wait? Subscribe to “She’s a Hazard to Herself” on your favorite podcast app and be part of our journey today. Let’s lift each other up, one episode at a time!
URL copied to clipboard!