PODCAST · education
Snacks Weekly on Data Science
by Pan Wu
This podcast is about making data science and machine learning knowledge accessible and less intimidating. Every week, I will handpick one selected industrial tech blog to break it down. We will discuss some key data science concepts and machine learning algorithms, and how they are applied in those real-world applications.Subscribe to the channel and enjoy Snacks Weekly on Data Science!
-
139
Hybrid Search for Improved Content Discovery [OLX]
In this episode, we explore how OLX improved discovery by combining keyword search and vector search instead of forcing a choice between the two. Keyword systems remain excellent for precision, while vector systems add semantic understanding. Together, they create a smarter and more user-friendly marketplace experience.For more details, you can refer to their published tech blog, linked here for your reference: https://tech.olx.com/hybrid-search-where-keywords-meet-vectors-enabling-classifieds-discovery-b7c383fe4fc4
-
138
Localization-Led Generative AI Product [Udemy]
In this episode, we explore how Udemy built a multilingual AI platform to bring its generative AI features to learners around the world. The team approached localization across three levels: a translation-first approach for broad and fast coverage, a fully native multilingual system for markets where fluency and cultural precision are essential, and a hybrid solution in between that intelligently routes between the two depending on the situationFor more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/udemy-engineering/from-zero-to-hero-localization-led-generative-ai-at-udemy-a422e4f968d4
-
137
Ladder of Evidence to Understand Product Effectiveness [Meta]
In this episode, we explore how Meta uses the “Ladder of Evidence” framework to evaluate the effectiveness of new product features. Instead of relying on a single analytical method, this framework helps teams choose the right type of evidence based on real-world constraints, leading to better and more informed product decisions.For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/@AnalyticsAtMeta/ladder-of-evidence-in-understanding-effectiveness-of-new-products-part-i-ad8dee70906c
-
136
Customized AI System for Subtitle Translation [Vimeo]
In this episode, we explore how Vimeo built a customized AI system for subtitle translation—one that goes beyond basic text translation to tackle the much more challenging problem of synchronizing language with timing. We discuss how the team designed a split-brain architecture to separate translation quality from timing constraints, and how they implemented fallback mechanisms to ensure the system remains reliable in real-world scenarios.For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/vimeo-engineering-blog/how-we-built-ai-powered-subtitles-at-vimeo-ff11f1d64b2a
-
135
Scaling Unit Test Coverage with AI Tools [NYTimes]
In this episode, we explore how the New York Times engineering team used AI agents to scale unit test coverage across their News site. They accomplished this by building a custom coverage measurement tool, designing a two-loop human–AI workflow, and investing heavily in prompt engineering, including strict guardrails to prevent the agent from cheating or drifting. The key takeaway is that AI works best when it is tightly constrained, carefully monitored, and used to amplify human judgment. For more details, you can refer to their published tech blog, linked here for your reference: https://open.nytimes.com/how-the-new-york-times-is-scaling-unit-test-coverage-using-ai-tools-fa796bf9b8d2
-
134
Product classification evolution [Shopify]
In this episode, we explore how Shopify evolved its product classification system across three major stages: from a traditional logistic regression model with TF-IDF features, to a multi-modal approach combining text and images, and finally to Vision Language Models built on top of a standardized and evolving product taxonomy. We also look at how architectural design and inference optimization are just as important as model accuracy in real-world machine learning systems.For more details, you can refer to their published tech blog, linked here for your reference: https://shopify.engineering/evolution-product-classification
-
133
Building an Ads System from Scratch [Faire]
In this episode, we explore how Faire built its ads system from scratch. On the business side, we discuss why ads matter for a growing marketplace: enabling brand discovery, creating a new revenue stream, and strengthening the overall ecosystem. On the technical side, we break down the three core components—Ads Delivery, Ads Manager, and Ads Foundation—and examine key considerations such as optimizing for long-term brand–retailer relationships and shipping a complex system within just six months.For more details, you can refer to their published tech blog, which is linked in the episode description: https://craft.faire.com/building-faires-ads-system-from-scratch-5c24fc916995
-
132
Optimize SQL Stored Procedures with LLM [Agoda]
In this episode, we explore how Agoda tackled a costly engineering bottleneck by integrating GPT into their CI/CD pipeline to analyze failing SQL stored procedures automatically and suggest optimizations — complete with rewritten queries, index recommendations, and side-by-side performance comparisons. The result is a human-in-the-loop system where AI handles the heavy lifting and engineers make the final call, leading to significant improvements in engineering productivity.For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/agoda-engineering/how-agoda-uses-gpt-to-optimize-sql-stored-procedures-in-ci-cd-29caf730c46c
-
131
LLM-Empowered Job Search [LinkedIn]
In this episode, we explore how LinkedIn is reimagining job search with AI and large language models — evolving from rigid, keyword-based systems to flexible, intent-aware experiences that feel more conversational and personalized.For more details, you can refer to their published tech blog, linked here for your reference: https://www.linkedin.com/blog/engineering/ai/building-the-next-generation-of-job-search-at-linkedin
-
130
Personalized CRM with Bandit algorithm [Uber]
In this episode, we explore how Uber tackled the challenge of personalizing CRM communications at scale through contextual bandit strategies enhanced with generative AI embeddings, lightweight and powerful models like LinUCB and XGBoost, and smart decision augmentation with SquareCB. This work shows how data science can take a core business need—delivering relevant user communications—and build systems that adapt in near real time to maximize impact.For more details, you can refer to their published tech blog, linked here for your reference: https://www.uber.com/blog/enhancing-personalized-crm
-
129
Enhanced Evaluation for Analytics AI Agent [Thomson Reuters Labs]
In this episode, we explore how seemingly perfect-looking SQL generated by AI agents can be “lying” when essential logic is missing. The Thomson Reuters Labs team highlights the need for deeper evaluation beyond simple syntax checks, and shows how tools like TruLens and AgentBench help expose hidden errors and better align agent outputs with real business intent.For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/tr-labs-ml-engineering-blog/is-your-ai-agent-lying-with-perfect-sql-3a6a7d69bccf
-
128
Measure Listing Lifetime value [Airbnb]
In this episode, we explore how Airbnb measures Listing Lifetime Value by separating it into baseline LTV, incremental LTV, and marketing-induced incremental LTV, and how this framework helps address challenges like measuring true incrementality and handling uncertainty about the future. For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/airbnb-engineering/how-airbnb-measures-listing-lifetime-value-a603bf05142c
-
127
RankNet and LambdaRank for Enhanced Ranking Models [OLX]
In this episode, we explore how OLX evolved its ranking algorithms—from the pairwise logic of RankNet to the metric-optimized power of LambdaRank—to ensure users find exactly what they’re looking for. We discuss how moving from simple classification to "Learning to Rank" helps businesses prioritize user attention where it matters most.For more details, you can refer to their published tech blog, linked here for your reference: https://tech.olx.com/from-ranknet-to-lambdamart-leveraging-xgboost-for-enhanced-ranking-models-cf21f33350fb
-
126
Evolving user intent understanding prediction [Udemy]
In this episode, we explore how Udemy tackled the tricky challenge of understanding learner intent in their AI Assistant — from a simple similarity-based embedding model, through experiments with larger models and fine-tuning, to a hybrid system that intelligently leverages both embeddings and large language model classification. This evolution demonstrates how real-world ML systems often require balancing accuracy, cost, latency, and user experience — especially in AI features that directly interact with users.For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/udemy-engineering/evolution-of-the-udemy-ai-assistant-intent-understanding-system-ec3ee0039364
-
125
Framework for Navigating Product Strategy as Data Leaders [Meta]
In this episode, we explore how Meta’s data scientists approach product strategy using a structured framework that adapts to different data and problem scenarios. We walk through the distinct analytical approaches used across different problem spaces, defined by whether data availability is high or low and whether problem clarity is broad or concrete. Each scenario requires a different mix of thinking, collaboration, and analytics to drive meaningful product value.For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/@AnalyticsAtMeta/data-scientists-framework-for-navigating-product-strategy-as-data-leaders-2eb62b20f505
-
124
Estimating Incremental Lift in Customer Value Using Synthetic Control [PayPal]
In this episode, we explore how PayPal estimates incremental lift in customer value using synthetic control methods. This causal inference–based approach provides a principled way to construct a counterfactual and isolate causal effects when traditional experiments aren’t sufficient, helping teams measure true impact in a complex, noisy, real-world environment and make more informed decisions.For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/paypal-tech/estimating-incremental-lift-in-customer-value-delta-cv-using-synthetic-control-522be5e3da3a
-
123
Predicting User Session Intent with Multi-Task Learning [Netflix]
In this episode, we explore how Netflix tackles the challenge of predicting user session intent by extending the capabilities of its foundation model with a hierarchical multi-task learning architecture. This approach helps Netflix better understand what users want in the moment and personalize the experience in real time, ultimately improving its recommendation system at scale.For more details, you can refer to their published tech blog, linked here for your reference: https://netflixtechblog.com/fm-intent-predicting-user-session-intent-with-hierarchical-multi-task-learning-94c75e18f4b8
-
122
Product Recommendations with LLMs and Word2Vec [CVS Health]
In this episode, we explore how CVS Health builds its product recommendation system to deliver relevant, timely suggestions across millions of customers and thousands of products. We look at the business motivation behind personalization at CVS, and then walk through how the team uses Word2Vec, Euclidean distance, LLM-generated product summaries, and iterative refinement to improve the system step by step.For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/cvs-health-tech-blog/enhancing-you-may-also-like-ymal-systems-using-llms-and-word2vec-0340280019d2
-
121
Building AI Agents at Airtable [Airtable]
In this episode, we explore how Airtable built AI Agents—a system that lets users automate workflows using natural language. We examine the business motivation behind making automation more accessible and break down the technical architecture that ensures these agents are safe, reliable, and tightly integrated into Airtable’s platform.For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/airtable-eng/how-we-built-ai-agents-at-airtable-70838d73cc43
-
120
Quick Thoughts and Reflections at the End of 2025
In this episode, I share a few key observations and reflections drawn from the tech blogs I read throughout 2025. The themes include the rise of real-world LLM applications, a move toward deeply customized machine learning solutions, and the evolving skill sets in data and AI, with continuous learning becoming more important than ever.I’d also like to express my sincere appreciation to everyone who has listened, read, engaged with, or shared my posts and podcasts this year. Thank you for making this journey so rewarding and fun. I wish you a restful holiday season and an inspiring start to the new year.
-
119
Real-time Spatial and Temporal Forecasting [Lyft]
In this episode, we explore how Lyft identified the right algorithmic approach for building a real-time spatial-temporal forecasting system. The team evaluated two major model families for this task: classical time-series models and deep neural networks. This study highlights the balance between accuracy and practicality—and serves as a valuable guide for choosing machine learning solutions that truly meet business needs.For more details, you can refer to their published tech blog, linked here for your reference: https://eng.lyft.com/real-time-spatial-temporal-forecasting-lyft-fa90b3f3ec24
-
118
GenAI Solution for Invoice Document Processing [Uber]
In this episode, we explore how Uber tackled the challenge of processing an enormous volume of invoices that vary widely in layout, language, and quality. We break down how generative AI plays a central role in helping them build a more flexible and scalable document-processing system. By combining OCR, LLM-based extraction, and a thoughtful human-in-the-loop workflow, Uber created a platform that’s faster, more accurate, and far easier to maintain than traditional rule-based automation.For more details, you can refer to their published tech blog, linked here for your reference: https://www.uber.com/blog/advancing-invoice-document-processing-using-genai
-
117
Optimize Web Performance [Walmart]
In this episode, we will explore how Walmart's Engineering team tackled the challenge of optimizing web performance at scale: they set top-line targets, moved from server-centric metrics to user-centric ones like Core Web Vitals, integrated these measures into their experimentation framework, and ultimately drove measurable business impact through improved engagement and organic traffic. For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/walmartglobaltech/walmart-journey-to-optimize-web-performance-and-drive-business-growth-c3bec8d7780b
-
116
Understanding Metric Movement with Root Cause Analysis [Pinterest]
In this episode, we explore how Pinterest tackled one of the toughest challenges in large-scale analytics — understanding why metrics move. We discuss how their engineering team built a root cause analysis platform that combines Slice and Dice, General Similarity, and Experiment Effects, with each component addressing a different part of the problem. This system brings together analytics, statistics, and engineering into an actionable workflow, empowering teams to respond faster and with greater confidence.For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/pinterest-engineering/the-quest-to-understand-metric-movements-8ab12ae97cda
-
115
Improving Search Ranking for Maps [Airbnb]
In this episode, we explore how Airbnb improved search ranking for its map interface — a challenge that sits at the intersection of user behavior, design, and data science. From assuming uniform attention to modeling tiered and spatial attention, Airbnb’s team systematically refined how users interact with map results. This work shows how aligning user attention with booking likelihood can drive real business impact — improving bookings, enhancing customer satisfaction, and increasing overall platform efficiency.For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/airbnb-engineering/improving-search-ranking-for-maps-13b03f2c2cca
-
114
Out-of-Stock Product Recommendations with Machine Learning [Instacart]
In this episode, we explore how Instacart leverages machine learning to suggest smart replacements for out-of-stock products — a challenge that’s central to the grocery delivery experience. We dive into Instacart’s two-model approach, where a deep learning model uncovers general product relationships across the catalog, and an engagement model learns from customer behavior to personalize those recommendations. Together, they power a system that makes replacements more accurate, relevant, and efficient at scale.For more details, you can refer to their published tech blog, linked here for your reference: https://tech.instacart.com/how-instacart-uses-machine-learning-to-suggest-replacements-for-out-of-stock-products-8f80d03bb5af
-
113
Covariate Selection in Causal Inference [Booking.com]
In this episode, we explore the importance of covariate selection in causal inference and how different types of variables can influence the results. The discussion highlights why careful covariate selection is essential for generating reliable insights and enabling smarter, evidence-based business decisions.For more details, you can refer to their published tech blog, linked here for your reference: https://booking.ai/covariate-selection-in-causal-inference-good-and-bad-controls-5f56126a984a
-
112
Personalizing Marketing with Uplift Modeling [Klaviyo]
In this episode, we explore how Klaviyo used counterfactual learning and uplift modeling to move beyond the question of which treatment works — to the deeper question of for whom it works. We’ll see how the team combined randomized experiments, causal inference techniques, and uplift modeling to power a product that helps marketers deliver smarter, more personalized messages.For more details, you can refer to their published tech blog, linked here for your reference: https://klaviyo.tech/the-stats-that-tell-you-what-could-have-been-counterfactual-learning-and-uplift-modeling-e95d3b712d8a
-
111
Quick History and Fun Facts About Halloween: Pumpkins, Candies, and Costumes
In this Halloween special episode, we explore some fun facts and surprising data behind these festive favorites: Did you know Illinois is the top pumpkin-producing state, harvesting nearly 40% of all pumpkins in the U.S.? Or that Reese’s Peanut Butter Cups consistently rank as America’s most popular Halloween candy? And that over — or at least — 20% of pet owners now dress up their pets for Halloween? Now, let’s dive into these facts and the history behind the holiday. Enjoy!
-
110
Feed Ranking: From Batch Inference to Online Inference [Whatnot]
In this episode, we explore how Whatnot improved its feed ranking system by moving from batch predictions to online inference—enabling the platform to scale effectively while capturing real-time marketplace dynamics. This evolution reflects a broader shift in recommendation systems toward more adaptive, real-time personalization.For more details, check out the full tech blog from the Whatnot engineering team: https://medium.com/whatnot-engineering/evolving-feed-ranking-at-whatnot-25adb116aeb6
-
109
Self-serve Experimentation Tool for Marketing [Tripadvisor]
In this episode, we explore Tripadvisor’s self-serve experimentation platform for marketing. On the business side, the challenge was measuring campaign effectiveness in a messy, external environment where clean randomization isn’t always possible. On the technical side, the TripAdvisor team developed a system that applies causal inference techniques—particularly the difference-in-differences method—to deliver reliable estimates of campaign impact.For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/tripadvisor/introducing-baldur-tripadvisors-self-serve-experimentation-tool-for-marketing-7fc9933b25cc
-
108
Global Feature Importance with Collective Wisdom [Meta]
In this episode, we look at how Meta addressed the challenge of feature selection at scale through Global Feature Importance—a system that aggregates insights across models to surface the most valuable features. This approach not only streamlines model development but also enables machine learning engineers to iterate more effectively and build models that deliver stronger business impact.For more details, check out Meta’s published tech blog here: https://medium.com/@AnalyticsAtMeta/collective-wisdom-of-models-advanced-feature-importance-techniques-at-meta-1a7a8d2f9e27
-
107
Evaluating Retrieval Capabilities of Language Models [Microsoft]
In this episode, we explore how to evaluate the retrieval-augmented generation (RAG) capabilities of small language models. On the business side, we discuss why RAG, long context windows, and small language models are critical for building scalable and reliable AI systems. On the technical side, we walk through the Needle-in-a-Haystack methodology and discuss key findings about retrieval performance across different models.For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/data-science-at-microsoft/evaluating-rag-capabilities-of-small-language-models-e7531b3a5061
-
106
Personalized Recommendation with Foundation Models [Netflix]
In this episode, we explore how Netflix enhanced recommendation personalization using foundation models. These models can process massive user histories through tokenization and attention mechanisms, while also addressing the cold-start problem with hybrid embeddings. The work highlights how principles from large language models can be adapted to build more effective recommendation systems at scale.For more details, you can refer to their published tech blog, linked here for your reference: https://netflixtechblog.com/foundation-model-for-personalized-recommendation-1a0bd8e02d39
-
105
A/B Testing vs. Multi-Armed Bandits: A Simulated Study [Vanguard]
In this episode, we explore how Vanguard evaluated standard A/B testing against multi-armed bandits for digital experimentation. Their simulated study showed that A/B testing is often the better choice when dealing with a small number of variations, while bandit strategies, such as Thompson Sampling, become more effective as the number of variations increases. The broader lesson is that experimentation design should always be context-aware—balancing simplicity, speed, and interpretability based on your business needs.For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/vanguard-technology/smarter-web-wins-a-b-testing-vs-multi-armed-bandits-unpacked-7f5032358513
-
104
Catalog Attribute Extraction with Multi-Modal LLMs [Instacart]
In this episode, we explore how Instacart tackled the challenge of extracting accurate product attributes at scale. We discuss different solutions—starting with SQL rules, moving to text-based ML models, and finally, Instacart’s multi-modal LLM platform, PARSE. By blending text and image data and enabling rapid configuration, PARSE demonstrates how modern AI tools can streamline data pipelines, reduce engineering overhead, and deliver better user experiences.For more details, you can refer to their published tech blog, linked here for your reference: https://tech.instacart.com/multi-modal-catalog-attribute-extraction-platform-at-instacart-b9228754a527
-
103
Segmenting Supply with a Data-Driven Methodology [Airbnb]
In this episode, we explore how Airbnb developed a structured framework that combines unsupervised clustering and supervised modeling to classify listings into meaningful supply personas based on availability patterns. This data-driven approach helps Airbnb enhance personalization, improve experimentation, and gain deeper insights into its global supply base.For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/airbnb-engineering/from-data-to-insights-segmenting-airbnbs-supply-c88aa2bb9399
-
102
Causal Inference with Bayesian Structural Time Series Model [Walmart]
In this episode, we explore the Bayesian Structural Time Series model as a causal inference methodology and walk through a real-world example of how Walmart leveraged it to measure the impact of a simple yet meaningful product taxonomy change.For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/walmartglobaltech/decoding-causal-incrementality-in-e-commerce-leveraging-bayesian-structural-time-series-model-with-f7eaf7267d69
-
101
Advancements in Embedding-Based Retrieval [Pinterest]
In this episode, we delve into how Pinterest has enhanced its embedding-based retrieval system to provide a more personalized, relevant, and dynamic Homefeed experience. By scaling their models with richer feature interactions, refreshing the content corpus with trending Pins, and leveraging cutting-edge machine learning techniques, Pinterest is able to serve better content—faster and more accurately—to hundreds of millions of users.For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/pinterest-engineering/advancements-in-embedding-based-retrieval-at-pinterest-homefeed-d7d7971a409e
-
100
How Data Scientists Lead and Drive Impact [Meta]
In this episode, we dive into what it’s like to be a data scientist at Meta. Grounded in product leadership, data scientists at Meta apply deep analytical expertise to drive measurement, navigate complex product ecosystems, and shape key decisions—ultimately delivering meaningful impact on product outcomes.For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/@AnalyticsAtMeta/how-data-scientists-lead-and-drive-impact-at-meta-6b5b896821b2
-
99
Building Scalable Risk Management Platform [Revolut]
In this episode, we explore how Revolut is reimagining risk management. By developing a modular, scientifically grounded, and explainable platform, the team has enabled faster, more accurate, and more transparent risk decisions—spanning diverse products and global markets.For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/revolut/reinventing-risk-at-revolut-77e63c552503
-
98
Tackling Interference Bias with Marketplace Marginal Values [Lyft]
In this episode, we explore how Lyft tackles interference bias in marketplace experiments using Marketplace Marginal Values (MMVs). We break down why interference is a natural challenge in two-sided platforms like Lyft, and how their team uses optimization, simulation, and advanced metrics to measure causal effects more reliably.For more details, check out the original tech blog linked here: https://eng.lyft.com/using-marketplace-marginal-values-to-address-interference-bias-a11aff6e670f
-
97
Causal Inference with Double Machine Learning [Microsoft]
In this episode, we explore how causal inference helps companies like Microsoft answer high‑stakes product and business questions when A/B testing isn’t possible. We dive into Double Machine Learning—a technique that leverages ML models to control for confounding variables and isolate true causal effects. The result is a flexible, rigorous framework that every data scientist should have in their toolkit.For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/data-science-at-microsoft/introduction-to-causal-inference-using-double-machine-learning-5daa642321f3
-
96
Scalable and Blendable Feed Construction [Whatnot]
In this episode, we explore how Whatnot tackled the challenge of scaling feed recommendation systems across a rapidly growing platform. We dive into WhataMix—a DAG-based framework that enables teams to build, test, and deploy feed logic using reusable, modular components. It’s a great example of how thoughtful system design can accelerate development while maintaining high standards in machine learning infrastructure.For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/whatnot-engineering/whatamix-blendable-feed-construction-2c94c21f6635
-
95
Using Generative and Traditional AI to Enhance Travel Experience [Expedia]
In this episode, we explore how Expedia is integrating both generative and traditional AI to enhance the travel experience. The company’s approach leverages generative models for open-ended, natural language tasks, and relies on traditional models for structured, mission-critical problems. By playing to the strengths of each, Expedia is able to build smarter, more adaptable AI systems without overcomplicating things or compromising on performance.For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/expedia-group-tech/elevating-travel-experiences-with-ai-acdb2cf2ec13
-
94
Ensuring Data Quality at Petabyte Scale [Glassdoor]
In this episode, we dive into how Glassdoor addresses the challenge of maintaining data quality at a petabyte scale. By treating data as a product, the engineering team built a centralized, scalable platform that enables proactive validation, continuous monitoring, and cross-team collaboration. From data contracts and static code analysis to LLM-based logic checks and anomaly detection, we unpack the key practices behind their approach.For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/glassdoor-engineering/data-quality-at-petabyte-scale-building-trust-in-the-data-lifecycle-7052361307a4
-
93
Building a Travel Assistant with LLMs [Agoda]
In this episode, we explore how Agoda used large language models (LLMs) to improve user experience through building a conversational AI product. By focusing on prompt engineering, grounding data, and smart evaluation, the team built a scalable assistant that adds real value to the user journey.For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/agoda-engineering/how-we-built-agodas-property-ama-bot-to-simplify-travel-decisions-b861c7ec7ff1
-
92
Setting Goals at Scale with the Goal Map [Meta]
In this episode, we explore how Meta tackles the complex challenge of setting aligned, measurable, and high-impact goals across a vast organization. Whether you’re in data science, analytics, or product leadership, this episode offers practical insights into building a more effective goal-setting system.For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/@AnalyticsAtMeta/how-facebook-sets-goals-94cee1c7f44f
-
91
Predicting user actions with transformer-based models [Hike]
In this episode, we will explore how Hike applied transformer-based models to predict user behavior in their Rush Gaming Universe. We will look at the business motivation and break down the technical solution, from input features to prediction and evaluation. This case is a good example of how modern deep learning techniques can drive real impact in improving user experience.For more details, you can refer to their published tech blog, linked here for your reference: https://blog.hike.in/predicting-user-actions-with-transformer-based-models-ffd1c6b68e99
-
90
Quantization Techniques for Language Model [EsperantoTech]
In this episode, we will explore quantization techniques for language models. We will look at the business motivation—making large language models more efficient—and unpack the technical solutions that make this possible. For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/@EsperantoTech/quantization-and-mixed-mode-techniques-for-small-language-models-b3366dbad554
We're indexing this podcast's transcripts for the first time — this can take a minute or two. We'll show results as soon as they're ready.
No matches for "" in this podcast's transcripts.
No topics indexed yet for this podcast.
Loading reviews...
ABOUT THIS SHOW
This podcast is about making data science and machine learning knowledge accessible and less intimidating. Every week, I will handpick one selected industrial tech blog to break it down. We will discuss some key data science concepts and machine learning algorithms, and how they are applied in those real-world applications.Subscribe to the channel and enjoy Snacks Weekly on Data Science!
HOSTED BY
Pan Wu
CATEGORIES
Loading similar podcasts...