Serving ML Models at a High Scale with Low Latency // Manoj Agarwal // MLOps Meetup #48 episode artwork

EPISODE · Jan 24, 2021 · 56 MIN

Serving ML Models at a High Scale with Low Latency // Manoj Agarwal // MLOps Meetup #48

from MLOps.community · host Demetrios

MLOps community meetup #48! Last Wednesday, we talked to Manoj  Agarwal, Software Architect at Salesforce.Join the Community: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://go.mlops.community/YTJoinIn⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Get the newsletter: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://go.mlops.community/YTNewsletter⁠⁠⁠⁠⁠⁠// Abstract:Serving machine learning models is a scalability challenge at many companies. Most applications require a small number of machine learning models (often < 100) to serve predictions. On the other hand, cloud platforms that support model serving, though they support hundreds of thousands of models, provision separate hardware for different customers. Salesforce has a unique challenge that only very few companies deal with: Salesforce needs to run hundreds of thousands of models sharing the underlying infrastructure for multiple tenants for cost-effectiveness.// Takeaways:This talk explains that Salesforce hosts hundreds of thousands of models on a multi-tenant infrastructure to support low-latency predictions.// Bio:Manoj Agarwal is a Software Architect in the Einstein Platform team at Salesforce. Salesforce Einstein was released back in 2016, integrated with all the major Salesforce clouds. Fast forward to today, and Einstein is delivering 80+ billion predictions across Sales, Service, Marketing & Commerce Clouds per day.//Relevant Linkshttps://engineering.salesforce.com/flow-scheduling-for-the-einstein-ml-platform-b11ec4f74f97https://engineering.salesforce.com/ml-lake-building-salesforces-data-platform-for-machine-learning-228c30e21f16----------- Connect With Us ✌️-------------   Join our Slack community: https://go.mlops.community/slackFollow us on Twitter: @mlopscommunitySign up for the next meetup: https://go.mlops.community/registerConnect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/Connect with Manoj on LinkedIn: https://www.linkedin.com/in/agarwalmk/Timestamps:[00:00] Happy birthday Manoj![00:41] Salesforce blog post about Einstein and ML Infrastructure[02:55] Intro to Serving Large Number of Models with Low Latency[03:34] Manoj' background[04:22] Machine Learning Engineering: 99% engineering + 1% machine learning - Alexey Gregorev on Twitter[04:37] Salesforce Einstein[06:42] Machine Learning: Big Picture[07:05] Feature Engineering [07:30] Model Training[08:53] Model Serving Requirements[13:01] Do you standardize on how models are packaged in order to be served, and if so, what standards does Salesforce require and enforce from model packaging?[14:29] Support Multiple Frameworks  [16:16] Is it easy to just throw a software library in there?[27:06] Along with that metadata, can you break down how that goes?  [28:27] Low Latency[32:30] Model Sharding with Replication[33:58] What would you do to speed up the transformation code run before scoring?[35:55] Model Serving Scaling[37:06] Noisy Neighbor: Shuffle Sharding[39:29] If all the Salesforce Models can be categorized into different model types, based on what they provide, what would be some of the big categories be and what's the biggest?[46:27] Retraining of the Model: Does that deal with your team, or is that distributed out, and your team deals mainly with this kind of engineering, and then another team deals with more machine learning concepts of it?[50:13] How do you ensure that different models created by different teams for data scientists expose the same data in order to be analyzed?[52:08] Are you using Kubernetes, or is it another registration engine? [53:03] How is it ensured that different models expose the same information?

MLOps community meetup #48! Last Wednesday, we talked to Manoj  Agarwal, Software Architect at Salesforce.Join the Community: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://go.mlops.community/YTJoinIn⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Get the newsletter: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://go.mlops.community/YTNewsletter⁠⁠⁠⁠⁠⁠// Abstract:Serving machine learning models is a scalability challenge at many companies. Most applications require a small number of machine learning models (often < 100) to serve predictions. On the other hand, cloud platforms that support model serving, though they support hundreds of thousands of models, provision separate hardware for different customers. Salesforce has a unique challenge that only very few companies deal with: Salesforce needs to run hundreds of thousands of models sharing the underlying infrastructure for multiple tenants for cost-effectiveness.// Takeaways:This talk explains that Salesforce hosts hundreds of thousands of models on a multi-tenant infrastructure to support low-latency predictions.// Bio:Manoj Agarwal is a Software Architect in the Einstein Platform team at Salesforce. Salesforce Einstein was released back in 2016, integrated with all the major Salesforce clouds. Fast forward to today, and Einstein is delivering 80+ billion predictions across Sales, Service, Marketing & Commerce Clouds per day.//Relevant Linkshttps://engineering.salesforce.com/flow-scheduling-for-the-einstein-ml-platform-b11ec4f74f97https://engineering.salesforce.com/ml-lake-building-salesforces-data-platform-for-machine-learning-228c30e21f16----------- Connect With Us ✌️-------------   Join our Slack community: https://go.mlops.community/slackFollow us on Twitter: @mlopscommunitySign up for the next meetup: https://go.mlops.community/registerConnect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/Connect with Manoj on LinkedIn: https://www.linkedin.com/in/agarwalmk/Timestamps:[00:00] Happy birthday Manoj![00:41] Salesforce blog post about Einstein and ML Infrastructure[02:55] Intro to Serving Large Number of Models with Low Latency[03:34] Manoj' background[04:22] Machine Learning Engineering: 99% engineering + 1% machine learning - Alexey Gregorev on Twitter[04:37] Salesforce Einstein[06:42] Machine Learning: Big Picture[07:05] Feature Engineering [07:30] Model Training[08:53] Model Serving Requirements[13:01] Do you standardize on how models are packaged in order to be served, and if so, what standards does Salesforce require and enforce from model packaging?[14:29] Support Multiple Frameworks  [16:16] Is it easy to just throw a software library in there?[27:06] Along with that metadata, can you break down how that goes?  [28:27] Low Latency[32:30] Model Sharding with Replication[33:58] What would you do to speed up the transformation code run before scoring?[35:55] Model Serving Scaling[37:06] Noisy Neighbor: Shuffle Sharding[39:29] If all the Salesforce Models can be categorized into different model types, based on what they provide, what would be some of the big categories be and what's the biggest?[46:27] Retraining of the Model: Does that deal with your team, or is that distributed out, and your team deals mainly with this kind of engineering, and then another team deals with more machine learning concepts of it?[50:13] How do you ensure that different models created by different teams for data scientists expose the same data in order to be analyzed?[52:08] Are you using Kubernetes, or is it another registration engine? [53:03] How is it ensured that different models expose the same information?

NOW PLAYING

Serving ML Models at a High Scale with Low Latency // Manoj Agarwal // MLOps Meetup #48

0:00 56:17

No transcript for this episode yet

We transcribe on demand. Request one and we'll notify you when it's ready — usually under 10 minutes.

She’s a Hazard to Herself She’s a Hazard Hi there, I’m Mallory, and I’d like to invite you into our world with “She’s a Hazard to Herself!” Join us as we navigate life with Multiple Sclerosis from the seat of my power wheelchair. Discover stories of resilience, family, and the community we’ve built around chronic illness. Whether you’re impacted by MS or want to learn from our journey, there’s something here for you. So why wait? Subscribe to “She’s a Hazard to Herself” on your favorite podcast app and be part of our journey today. Let’s lift each other up, one episode at a time! Tips, News and Stories for Older Adults Esther C Kane CAPS, C.D.S. "Tips, News, and Stories for Older Adults" delivers weekly insights tailored for seniors. We bring you summaries of curated news, practical advice, and inspiring stories that matter to the 55+ community. From health and finance to technology and lifestyle, our content keeps you informed and engaged. Sourced from trusted outlets, each episode offers valuable information for navigating your golden years. Join us as we explore aging with positivity, wisdom, and engaging stories. Your perfect companion for staying active, learning, and embracing life's later chapters. Prayer Time Heir Waves Prayer Time A podcast especially for our Prayer Time community NEWMORROW SESSIONS - A PodCast Series on the Future of Hospitality Mario C. Bauer, Florian Schneider, Axel Weber & Dr. Tillman Bardt The Newmorrow PodCast is more than a podcast — it's a platform for open dialog on the future of our business, a platform for those building what doesn’t exist yet. Here, we share and embrace our passion for the hospitality industry, but we won’t romanticize the journey. We ask the tough questions, confront uncomfortable truths, and prepare for a future that resists easy answers. We believe that the tougher and wilder times become, the more openly, honestly and humanely people need to talk to each other and act together. We believe, openness, togetherness, and truthfulness should also be cornerstones of a professional community to develop our utopian idea of „open source“. This is a space where visionaries don’t just imagine the future — they wrestle with the paradoxes that shape it: success vs. happiness, data vs. instinct, stability vs. reinvention. Join leaders, entrepreneurs, and thinkers as they share not what made them — but what’s actively shaping them, now and next. So tune in

Frequently Asked Questions

How long is this episode of MLOps.community?

This episode is 56 minutes long.

When was this MLOps.community episode published?

This episode was published on January 24, 2021.

What is this episode about?

MLOps community meetup #48! Last Wednesday, we talked to Manoj  Agarwal, Software Architect at Salesforce.Join the Community:...

Can I download this MLOps.community episode?

Yes, you can download this episode by clicking the download button on the episode player, or subscribe to the podcast in your preferred podcast app for automatic downloads.
URL copied to clipboard!