PODCAST · education

Snacks Weekly on Data Science

by Pan Wu

This podcast is about making data science and machine learning knowledge accessible and less intimidating. Every week, I will handpick one selected industrial tech blog to break it down. We will discuss some key data science concepts and machine learning algorithms, and how they are applied in those real-world applications.Subscribe to the channel and enjoy Snacks Weekly on Data Science!

Subscribe · 0 Bookmark

139

Hybrid Search for Improved Content Discovery [OLX]

In this episode, we explore how OLX improved discovery by combining keyword search and vector search instead of forcing a choice between the two. Keyword systems remain excellent for precision, while vector systems add semantic understanding. Together, they create a smarter and more user-friendly marketplace experience.For more details, you can refer to their published tech blog, linked here for your reference: https://tech.olx.com/hybrid-search-where-keywords-meet-vectors-enabling-classifieds-discovery-b7c383fe4fc4

May 11, 2026

7m
138

Localization-Led Generative AI Product [Udemy]

In this episode, we explore how Udemy built a multilingual AI platform to bring its generative AI features to learners around the world. The team approached localization across three levels: a translation-first approach for broad and fast coverage, a fully native multilingual system for markets where fluency and cultural precision are essential, and a hybrid solution in between that intelligently routes between the two depending on the situationFor more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/udemy-engineering/from-zero-to-hero-localization-led-generative-ai-at-udemy-a422e4f968d4

May 4, 2026

8m
137

Ladder of Evidence to Understand Product Effectiveness [Meta]

In this episode, we explore how Meta uses the “Ladder of Evidence” framework to evaluate the effectiveness of new product features. Instead of relying on a single analytical method, this framework helps teams choose the right type of evidence based on real-world constraints, leading to better and more informed product decisions.For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/@AnalyticsAtMeta/ladder-of-evidence-in-understanding-effectiveness-of-new-products-part-i-ad8dee70906c

Apr 27, 2026

9m
136

Customized AI System for Subtitle Translation [Vimeo]

In this episode, we explore how Vimeo built a customized AI system for subtitle translation—one that goes beyond basic text translation to tackle the much more challenging problem of synchronizing language with timing. We discuss how the team designed a split-brain architecture to separate translation quality from timing constraints, and how they implemented fallback mechanisms to ensure the system remains reliable in real-world scenarios.For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/vimeo-engineering-blog/how-we-built-ai-powered-subtitles-at-vimeo-ff11f1d64b2a

Apr 20, 2026

9m
135

Scaling Unit Test Coverage with AI Tools [NYTimes]

In this episode, we explore how the New York Times engineering team used AI agents to scale unit test coverage across their News site. They accomplished this by building a custom coverage measurement tool, designing a two-loop human–AI workflow, and investing heavily in prompt engineering, including strict guardrails to prevent the agent from cheating or drifting. The key takeaway is that AI works best when it is tightly constrained, carefully monitored, and used to amplify human judgment. For more details, you can refer to their published tech blog, linked here for your reference: https://open.nytimes.com/how-the-new-york-times-is-scaling-unit-test-coverage-using-ai-tools-fa796bf9b8d2

Apr 13, 2026

8m
134

Product classification evolution [Shopify]

In this episode, we explore how Shopify evolved its product classification system across three major stages: from a traditional logistic regression model with TF-IDF features, to a multi-modal approach combining text and images, and finally to Vision Language Models built on top of a standardized and evolving product taxonomy. We also look at how architectural design and inference optimization are just as important as model accuracy in real-world machine learning systems.For more details, you can refer to their published tech blog, linked here for your reference: https://shopify.engineering/evolution-product-classification

Apr 6, 2026

8m
133

Building an Ads System from Scratch [Faire]

In this episode, we explore how Faire built its ads system from scratch. On the business side, we discuss why ads matter for a growing marketplace: enabling brand discovery, creating a new revenue stream, and strengthening the overall ecosystem. On the technical side, we break down the three core components—Ads Delivery, Ads Manager, and Ads Foundation—and examine key considerations such as optimizing for long-term brand–retailer relationships and shipping a complex system within just six months.For more details, you can refer to their published tech blog, which is linked in the episode description: https://craft.faire.com/building-faires-ads-system-from-scratch-5c24fc916995

Mar 30, 2026

10m
132

Optimize SQL Stored Procedures with LLM [Agoda]

In this episode, we explore how Agoda tackled a costly engineering bottleneck by integrating GPT into their CI/CD pipeline to analyze failing SQL stored procedures automatically and suggest optimizations — complete with rewritten queries, index recommendations, and side-by-side performance comparisons. The result is a human-in-the-loop system where AI handles the heavy lifting and engineers make the final call, leading to significant improvements in engineering productivity.For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/agoda-engineering/how-agoda-uses-gpt-to-optimize-sql-stored-procedures-in-ci-cd-29caf730c46c

Mar 23, 2026

7m
131

LLM-Empowered Job Search [LinkedIn]

In this episode, we explore how LinkedIn is reimagining job search with AI and large language models — evolving from rigid, keyword-based systems to flexible, intent-aware experiences that feel more conversational and personalized.For more details, you can refer to their published tech blog, linked here for your reference: https://www.linkedin.com/blog/engineering/ai/building-the-next-generation-of-job-search-at-linkedin

Mar 16, 2026

8m
130

Personalized CRM with Bandit algorithm [Uber]

In this episode, we explore how Uber tackled the challenge of personalizing CRM communications at scale through contextual bandit strategies enhanced with generative AI embeddings, lightweight and powerful models like LinUCB and XGBoost, and smart decision augmentation with SquareCB. This work shows how data science can take a core business need—delivering relevant user communications—and build systems that adapt in near real time to maximize impact.For more details, you can refer to their published tech blog, linked here for your reference: https://www.uber.com/blog/enhancing-personalized-crm

Mar 9, 2026

9m
129

Enhanced Evaluation for Analytics AI Agent [Thomson Reuters Labs]

In this episode, we explore how seemingly perfect-looking SQL generated by AI agents can be “lying” when essential logic is missing. The Thomson Reuters Labs team highlights the need for deeper evaluation beyond simple syntax checks, and shows how tools like TruLens and AgentBench help expose hidden errors and better align agent outputs with real business intent.For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/tr-labs-ml-engineering-blog/is-your-ai-agent-lying-with-perfect-sql-3a6a7d69bccf

Mar 2, 2026

10m
128

Measure Listing Lifetime value [Airbnb]

In this episode, we explore how Airbnb measures Listing Lifetime Value by separating it into baseline LTV, incremental LTV, and marketing-induced incremental LTV, and how this framework helps address challenges like measuring true incrementality and handling uncertainty about the future. For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/airbnb-engineering/how-airbnb-measures-listing-lifetime-value-a603bf05142c

Feb 23, 2026

10m
127

RankNet and LambdaRank for Enhanced Ranking Models [OLX]

In this episode, we explore how OLX evolved its ranking algorithms—from the pairwise logic of RankNet to the metric-optimized power of LambdaRank—to ensure users find exactly what they’re looking for. We discuss how moving from simple classification to "Learning to Rank" helps businesses prioritize user attention where it matters most.For more details, you can refer to their published tech blog, linked here for your reference: https://tech.olx.com/from-ranknet-to-lambdamart-leveraging-xgboost-for-enhanced-ranking-models-cf21f33350fb

Feb 16, 2026

9m
126

Evolving user intent understanding prediction [Udemy]

In this episode, we explore how Udemy tackled the tricky challenge of understanding learner intent in their AI Assistant — from a simple similarity-based embedding model, through experiments with larger models and fine-tuning, to a hybrid system that intelligently leverages both embeddings and large language model classification. This evolution demonstrates how real-world ML systems often require balancing accuracy, cost, latency, and user experience — especially in AI features that directly interact with users.For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/udemy-engineering/evolution-of-the-udemy-ai-assistant-intent-understanding-system-ec3ee0039364

Feb 9, 2026

11m
125

Framework for Navigating Product Strategy as Data Leaders [Meta]

In this episode, we explore how Meta’s data scientists approach product strategy using a structured framework that adapts to different data and problem scenarios. We walk through the distinct analytical approaches used across different problem spaces, defined by whether data availability is high or low and whether problem clarity is broad or concrete. Each scenario requires a different mix of thinking, collaboration, and analytics to drive meaningful product value.For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/@AnalyticsAtMeta/data-scientists-framework-for-navigating-product-strategy-as-data-leaders-2eb62b20f505

Feb 2, 2026

10m
124

Estimating Incremental Lift in Customer Value Using Synthetic Control [PayPal]

In this episode, we explore how PayPal estimates incremental lift in customer value using synthetic control methods. This causal inference–based approach provides a principled way to construct a counterfactual and isolate causal effects when traditional experiments aren’t sufficient, helping teams measure true impact in a complex, noisy, real-world environment and make more informed decisions.For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/paypal-tech/estimating-incremental-lift-in-customer-value-delta-cv-using-synthetic-control-522be5e3da3a

Jan 26, 2026

10m
123

Predicting User Session Intent with Multi-Task Learning [Netflix]

In this episode, we explore how Netflix tackles the challenge of predicting user session intent by extending the capabilities of its foundation model with a hierarchical multi-task learning architecture. This approach helps Netflix better understand what users want in the moment and personalize the experience in real time, ultimately improving its recommendation system at scale.For more details, you can refer to their published tech blog, linked here for your reference: https://netflixtechblog.com/fm-intent-predicting-user-session-intent-with-hierarchical-multi-task-learning-94c75e18f4b8

Jan 19, 2026

10m
122

Product Recommendations with LLMs and Word2Vec [CVS Health]

In this episode, we explore how CVS Health builds its product recommendation system to deliver relevant, timely suggestions across millions of customers and thousands of products. We look at the business motivation behind personalization at CVS, and then walk through how the team uses Word2Vec, Euclidean distance, LLM-generated product summaries, and iterative refinement to improve the system step by step.For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/cvs-health-tech-blog/enhancing-you-may-also-like-ymal-systems-using-llms-and-word2vec-0340280019d2

Jan 12, 2026

8m
121

Building AI Agents at Airtable [Airtable]

In this episode, we explore how Airtable built AI Agents—a system that lets users automate workflows using natural language. We examine the business motivation behind making automation more accessible and break down the technical architecture that ensures these agents are safe, reliable, and tightly integrated into Airtable’s platform.For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/airtable-eng/how-we-built-ai-agents-at-airtable-70838d73cc43

Jan 5, 2026

10m
120

Quick Thoughts and Reflections at the End of 2025

In this episode, I share a few key observations and reflections drawn from the tech blogs I read throughout 2025. The themes include the rise of real-world LLM applications, a move toward deeply customized machine learning solutions, and the evolving skill sets in data and AI, with continuous learning becoming more important than ever.I’d also like to express my sincere appreciation to everyone who has listened, read, engaged with, or shared my posts and podcasts this year. Thank you for making this journey so rewarding and fun. I wish you a restful holiday season and an inspiring start to the new year.

Dec 29, 2025

8m
119

Real-time Spatial and Temporal Forecasting [Lyft]

In this episode, we explore how Lyft identified the right algorithmic approach for building a real-time spatial-temporal forecasting system. The team evaluated two major model families for this task: classical time-series models and deep neural networks. This study highlights the balance between accuracy and practicality—and serves as a valuable guide for choosing machine learning solutions that truly meet business needs.For more details, you can refer to their published tech blog, linked here for your reference: https://eng.lyft.com/real-time-spatial-temporal-forecasting-lyft-fa90b3f3ec24

Dec 22, 2025

11m
118

GenAI Solution for Invoice Document Processing [Uber]

In this episode, we explore how Uber tackled the challenge of processing an enormous volume of invoices that vary widely in layout, language, and quality. We break down how generative AI plays a central role in helping them build a more flexible and scalable document-processing system. By combining OCR, LLM-based extraction, and a thoughtful human-in-the-loop workflow, Uber created a platform that’s faster, more accurate, and far easier to maintain than traditional rule-based automation.For more details, you can refer to their published tech blog, linked here for your reference: https://www.uber.com/blog/advancing-invoice-document-processing-using-genai

Dec 15, 2025

10m
117

Optimize Web Performance [Walmart]

In this episode, we will explore how Walmart's Engineering team tackled the challenge of optimizing web performance at scale: they set top-line targets, moved from server-centric metrics to user-centric ones like Core Web Vitals, integrated these measures into their experimentation framework, and ultimately drove measurable business impact through improved engagement and organic traffic. For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/walmartglobaltech/walmart-journey-to-optimize-web-performance-and-drive-business-growth-c3bec8d7780b

Dec 8, 2025

9m
116

Understanding Metric Movement with Root Cause Analysis [Pinterest]

In this episode, we explore how Pinterest tackled one of the toughest challenges in large-scale analytics — understanding why metrics move. We discuss how their engineering team built a root cause analysis platform that combines Slice and Dice, General Similarity, and Experiment Effects, with each component addressing a different part of the problem. This system brings together analytics, statistics, and engineering into an actionable workflow, empowering teams to respond faster and with greater confidence.For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/pinterest-engineering/the-quest-to-understand-metric-movements-8ab12ae97cda

Dec 1, 2025

11m
115

Improving Search Ranking for Maps [Airbnb]

In this episode, we explore how Airbnb improved search ranking for its map interface — a challenge that sits at the intersection of user behavior, design, and data science. From assuming uniform attention to modeling tiered and spatial attention, Airbnb’s team systematically refined how users interact with map results. This work shows how aligning user attention with booking likelihood can drive real business impact — improving bookings, enhancing customer satisfaction, and increasing overall platform efficiency.For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/airbnb-engineering/improving-search-ranking-for-maps-13b03f2c2cca

Nov 24, 2025

9m
114

Out-of-Stock Product Recommendations with Machine Learning [Instacart]

In this episode, we explore how Instacart leverages machine learning to suggest smart replacements for out-of-stock products — a challenge that’s central to the grocery delivery experience. We dive into Instacart’s two-model approach, where a deep learning model uncovers general product relationships across the catalog, and an engagement model learns from customer behavior to personalize those recommendations. Together, they power a system that makes replacements more accurate, relevant, and efficient at scale.For more details, you can refer to their published tech blog, linked here for your reference: https://tech.instacart.com/how-instacart-uses-machine-learning-to-suggest-replacements-for-out-of-stock-products-8f80d03bb5af

Nov 17, 2025

10m
113

Covariate Selection in Causal Inference [Booking.com]

In this episode, we explore the importance of covariate selection in causal inference and how different types of variables can influence the results. The discussion highlights why careful covariate selection is essential for generating reliable insights and enabling smarter, evidence-based business decisions.For more details, you can refer to their published tech blog, linked here for your reference: https://booking.ai/covariate-selection-in-causal-inference-good-and-bad-controls-5f56126a984a

Nov 10, 2025

12m
112

Personalizing Marketing with Uplift Modeling [Klaviyo]

In this episode, we explore how Klaviyo used counterfactual learning and uplift modeling to move beyond the question of which treatment works — to the deeper question of for whom it works. We’ll see how the team combined randomized experiments, causal inference techniques, and uplift modeling to power a product that helps marketers deliver smarter, more personalized messages.For more details, you can refer to their published tech blog, linked here for your reference: https://klaviyo.tech/the-stats-that-tell-you-what-could-have-been-counterfactual-learning-and-uplift-modeling-e95d3b712d8a

Nov 3, 2025

9m
111

Quick History and Fun Facts About Halloween: Pumpkins, Candies, and Costumes

In this Halloween special episode, we explore some fun facts and surprising data behind these festive favorites: Did you know Illinois is the top pumpkin-producing state, harvesting nearly 40% of all pumpkins in the U.S.? Or that Reese’s Peanut Butter Cups consistently rank as America’s most popular Halloween candy? And that over — or at least — 20% of pet owners now dress up their pets for Halloween? Now, let’s dive into these facts and the history behind the holiday. Enjoy!

Oct 27, 2025

7m
110

Feed Ranking: From Batch Inference to Online Inference [Whatnot]

In this episode, we explore how Whatnot improved its feed ranking system by moving from batch predictions to online inference—enabling the platform to scale effectively while capturing real-time marketplace dynamics. This evolution reflects a broader shift in recommendation systems toward more adaptive, real-time personalization.For more details, check out the full tech blog from the Whatnot engineering team: https://medium.com/whatnot-engineering/evolving-feed-ranking-at-whatnot-25adb116aeb6

Oct 20, 2025

7m
109

Self-serve Experimentation Tool for Marketing [Tripadvisor]

In this episode, we explore Tripadvisor’s self-serve experimentation platform for marketing. On the business side, the challenge was measuring campaign effectiveness in a messy, external environment where clean randomization isn’t always possible. On the technical side, the TripAdvisor team developed a system that applies causal inference techniques—particularly the difference-in-differences method—to deliver reliable estimates of campaign impact.For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/tripadvisor/introducing-baldur-tripadvisors-self-serve-experimentation-tool-for-marketing-7fc9933b25cc

Oct 13, 2025

10m
108

Global Feature Importance with Collective Wisdom [Meta]

In this episode, we look at how Meta addressed the challenge of feature selection at scale through Global Feature Importance—a system that aggregates insights across models to surface the most valuable features. This approach not only streamlines model development but also enables machine learning engineers to iterate more effectively and build models that deliver stronger business impact.For more details, check out Meta’s published tech blog here: https://medium.com/@AnalyticsAtMeta/collective-wisdom-of-models-advanced-feature-importance-techniques-at-meta-1a7a8d2f9e27

Oct 6, 2025

8m
107

Evaluating Retrieval Capabilities of Language Models [Microsoft]

In this episode, we explore how to evaluate the retrieval-augmented generation (RAG) capabilities of small language models. On the business side, we discuss why RAG, long context windows, and small language models are critical for building scalable and reliable AI systems. On the technical side, we walk through the Needle-in-a-Haystack methodology and discuss key findings about retrieval performance across different models.For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/data-science-at-microsoft/evaluating-rag-capabilities-of-small-language-models-e7531b3a5061

Sep 29, 2025

10m
106

Personalized Recommendation with Foundation Models [Netflix]

In this episode, we explore how Netflix enhanced recommendation personalization using foundation models. These models can process massive user histories through tokenization and attention mechanisms, while also addressing the cold-start problem with hybrid embeddings. The work highlights how principles from large language models can be adapted to build more effective recommendation systems at scale.For more details, you can refer to their published tech blog, linked here for your reference: https://netflixtechblog.com/foundation-model-for-personalized-recommendation-1a0bd8e02d39

Sep 22, 2025

11m
105

A/B Testing vs. Multi-Armed Bandits: A Simulated Study [Vanguard]

In this episode, we explore how Vanguard evaluated standard A/B testing against multi-armed bandits for digital experimentation. Their simulated study showed that A/B testing is often the better choice when dealing with a small number of variations, while bandit strategies, such as Thompson Sampling, become more effective as the number of variations increases. The broader lesson is that experimentation design should always be context-aware—balancing simplicity, speed, and interpretability based on your business needs.For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/vanguard-technology/smarter-web-wins-a-b-testing-vs-multi-armed-bandits-unpacked-7f5032358513

Sep 15, 2025

10m
104

Catalog Attribute Extraction with Multi-Modal LLMs [Instacart]

In this episode, we explore how Instacart tackled the challenge of extracting accurate product attributes at scale. We discuss different solutions—starting with SQL rules, moving to text-based ML models, and finally, Instacart’s multi-modal LLM platform, PARSE. By blending text and image data and enabling rapid configuration, PARSE demonstrates how modern AI tools can streamline data pipelines, reduce engineering overhead, and deliver better user experiences.For more details, you can refer to their published tech blog, linked here for your reference: https://tech.instacart.com/multi-modal-catalog-attribute-extraction-platform-at-instacart-b9228754a527

Sep 8, 2025

10m
103

Segmenting Supply with a Data-Driven Methodology [Airbnb]

In this episode, we explore how Airbnb developed a structured framework that combines unsupervised clustering and supervised modeling to classify listings into meaningful supply personas based on availability patterns. This data-driven approach helps Airbnb enhance personalization, improve experimentation, and gain deeper insights into its global supply base.For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/airbnb-engineering/from-data-to-insights-segmenting-airbnbs-supply-c88aa2bb9399

Sep 1, 2025

8m
102

Causal Inference with Bayesian Structural Time Series Model [Walmart]

In this episode, we explore the Bayesian Structural Time Series model as a causal inference methodology and walk through a real-world example of how Walmart leveraged it to measure the impact of a simple yet meaningful product taxonomy change.For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/walmartglobaltech/decoding-causal-incrementality-in-e-commerce-leveraging-bayesian-structural-time-series-model-with-f7eaf7267d69

Aug 25, 2025

8m
101

Advancements in Embedding-Based Retrieval [Pinterest]

In this episode, we delve into how Pinterest has enhanced its embedding-based retrieval system to provide a more personalized, relevant, and dynamic Homefeed experience. By scaling their models with richer feature interactions, refreshing the content corpus with trending Pins, and leveraging cutting-edge machine learning techniques, Pinterest is able to serve better content—faster and more accurately—to hundreds of millions of users.For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/pinterest-engineering/advancements-in-embedding-based-retrieval-at-pinterest-homefeed-d7d7971a409e

Aug 18, 2025

10m
100

How Data Scientists Lead and Drive Impact [Meta]

In this episode, we dive into what it’s like to be a data scientist at Meta. Grounded in product leadership, data scientists at Meta apply deep analytical expertise to drive measurement, navigate complex product ecosystems, and shape key decisions—ultimately delivering meaningful impact on product outcomes.For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/@AnalyticsAtMeta/how-data-scientists-lead-and-drive-impact-at-meta-6b5b896821b2

Aug 11, 2025

10m
99

Building Scalable Risk Management Platform [Revolut]

In this episode, we explore how Revolut is reimagining risk management. By developing a modular, scientifically grounded, and explainable platform, the team has enabled faster, more accurate, and more transparent risk decisions—spanning diverse products and global markets.For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/revolut/reinventing-risk-at-revolut-77e63c552503

Aug 4, 2025

10m
98

Tackling Interference Bias with Marketplace Marginal Values [Lyft]

In this episode, we explore how Lyft tackles interference bias in marketplace experiments using Marketplace Marginal Values (MMVs). We break down why interference is a natural challenge in two-sided platforms like Lyft, and how their team uses optimization, simulation, and advanced metrics to measure causal effects more reliably.For more details, check out the original tech blog linked here: https://eng.lyft.com/using-marketplace-marginal-values-to-address-interference-bias-a11aff6e670f

Jul 28, 2025

9m
97

Causal Inference with Double Machine Learning [Microsoft]

In this episode, we explore how causal inference helps companies like Microsoft answer high‑stakes product and business questions when A/B testing isn’t possible. We dive into Double Machine Learning—a technique that leverages ML models to control for confounding variables and isolate true causal effects. The result is a flexible, rigorous framework that every data scientist should have in their toolkit.For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/data-science-at-microsoft/introduction-to-causal-inference-using-double-machine-learning-5daa642321f3

Jul 21, 2025

8m
96

Scalable and Blendable Feed Construction [Whatnot]

In this episode, we explore how Whatnot tackled the challenge of scaling feed recommendation systems across a rapidly growing platform. We dive into WhataMix—a DAG-based framework that enables teams to build, test, and deploy feed logic using reusable, modular components. It’s a great example of how thoughtful system design can accelerate development while maintaining high standards in machine learning infrastructure.For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/whatnot-engineering/whatamix-blendable-feed-construction-2c94c21f6635

Jul 14, 2025

8m
95

Using Generative and Traditional AI to Enhance Travel Experience [Expedia]

In this episode, we explore how Expedia is integrating both generative and traditional AI to enhance the travel experience. The company’s approach leverages generative models for open-ended, natural language tasks, and relies on traditional models for structured, mission-critical problems. By playing to the strengths of each, Expedia is able to build smarter, more adaptable AI systems without overcomplicating things or compromising on performance.For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/expedia-group-tech/elevating-travel-experiences-with-ai-acdb2cf2ec13

Jul 7, 2025

9m
94

Ensuring Data Quality at Petabyte Scale [Glassdoor]

In this episode, we dive into how Glassdoor addresses the challenge of maintaining data quality at a petabyte scale. By treating data as a product, the engineering team built a centralized, scalable platform that enables proactive validation, continuous monitoring, and cross-team collaboration. From data contracts and static code analysis to LLM-based logic checks and anomaly detection, we unpack the key practices behind their approach.For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/glassdoor-engineering/data-quality-at-petabyte-scale-building-trust-in-the-data-lifecycle-7052361307a4

Jun 30, 2025

11m
93

Building a Travel Assistant with LLMs [Agoda]

In this episode, we explore how Agoda used large language models (LLMs) to improve user experience through building a conversational AI product. By focusing on prompt engineering, grounding data, and smart evaluation, the team built a scalable assistant that adds real value to the user journey.For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/agoda-engineering/how-we-built-agodas-property-ama-bot-to-simplify-travel-decisions-b861c7ec7ff1

Jun 23, 2025

8m
92

Setting Goals at Scale with the Goal Map [Meta]

In this episode, we explore how Meta tackles the complex challenge of setting aligned, measurable, and high-impact goals across a vast organization. Whether you’re in data science, analytics, or product leadership, this episode offers practical insights into building a more effective goal-setting system.For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/@AnalyticsAtMeta/how-facebook-sets-goals-94cee1c7f44f

Jun 16, 2025

8m
91

Predicting user actions with transformer-based models [Hike]

In this episode, we will explore how Hike applied transformer-based models to predict user behavior in their Rush Gaming Universe. We will look at the business motivation and break down the technical solution, from input features to prediction and evaluation. This case is a good example of how modern deep learning techniques can drive real impact in improving user experience.For more details, you can refer to their published tech blog, linked here for your reference: https://blog.hike.in/predicting-user-actions-with-transformer-based-models-ffd1c6b68e99

Jun 9, 2025

6m
90

Quantization Techniques for Language Model [EsperantoTech]

In this episode, we will explore quantization techniques for language models. We will look at the business motivation—making large language models more efficient—and unpack the technical solutions that make this possible. For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/@EsperantoTech/quantization-and-mixed-mode-techniques-for-small-language-models-b3366dbad554

Jun 2, 2025

10m

View all 139 episodes →

Type above to search every episode's transcript for a word or phrase. Matches are scoped to this podcast.

Searching…

We're indexing this podcast's transcripts for the first time — this can take a minute or two. We'll show results as soon as they're ready.

No matches for "" in this podcast's transcripts.

Showing of matches

No topics indexed yet for this podcast.

Loading reviews...

Share your thoughts

ABOUT THIS SHOW

HOSTED BY

Pan Wu

Frequently Asked Questions

How many episodes does Snacks Weekly on Data Science have?

Snacks Weekly on Data Science currently has 50 episodes available on PodParley. New episodes are automatically indexed when they're published to the podcast feed.

What is Snacks Weekly on Data Science about?

How often does Snacks Weekly on Data Science release new episodes?

Snacks Weekly on Data Science has 50 episodes. Check the episode list to see recent publication dates and frequency.

Where can I listen to Snacks Weekly on Data Science?

You can listen to Snacks Weekly on Data Science on PodParley by clicking any episode. We provide an embedded audio player for direct listening, and you can also subscribe via your preferred podcast app using the RSS feed.

Who hosts Snacks Weekly on Data Science?

Snacks Weekly on Data Science is created and hosted by Pan Wu.

URL copied to clipboard!

Hybrid Search for Improved Content Discovery [OLX]

Localization-Led Generative AI Product [Udemy]

Ladder of Evidence to Understand Product Effectiveness [Meta]

Customized AI System for Subtitle Translation [Vimeo]

Scaling Unit Test Coverage with AI Tools [NYTimes]

Product classification evolution [Shopify]

Building an Ads System from Scratch [Faire]

Optimize SQL Stored Procedures with LLM [Agoda]

LLM-Empowered Job Search [LinkedIn]

Personalized CRM with Bandit algorithm [Uber]

Enhanced Evaluation for Analytics AI Agent [Thomson Reuters Labs]

Measure Listing Lifetime value [Airbnb]

RankNet and LambdaRank for Enhanced Ranking Models [OLX]

Evolving user intent understanding prediction [Udemy]

Framework for Navigating Product Strategy as Data Leaders [Meta]

Estimating Incremental Lift in Customer Value Using Synthetic Control [PayPal]

Predicting User Session Intent with Multi-Task Learning [Netflix]

Product Recommendations with LLMs and Word2Vec [CVS Health]

Building AI Agents at Airtable [Airtable]

Quick Thoughts and Reflections at the End of 2025

Real-time Spatial and Temporal Forecasting [Lyft]

GenAI Solution for Invoice Document Processing [Uber]

Optimize Web Performance [Walmart]

Understanding Metric Movement with Root Cause Analysis [Pinterest]

Improving Search Ranking for Maps [Airbnb]

Out-of-Stock Product Recommendations with Machine Learning [Instacart]

Covariate Selection in Causal Inference [Booking.com]

Personalizing Marketing with Uplift Modeling [Klaviyo]

Quick History and Fun Facts About Halloween: Pumpkins, Candies, and Costumes

Feed Ranking: From Batch Inference to Online Inference [Whatnot]

Self-serve Experimentation Tool for Marketing [Tripadvisor]

Global Feature Importance with Collective Wisdom [Meta]

Evaluating Retrieval Capabilities of Language Models [Microsoft]

Personalized Recommendation with Foundation Models [Netflix]

A/B Testing vs. Multi-Armed Bandits: A Simulated Study [Vanguard]

Catalog Attribute Extraction with Multi-Modal LLMs [Instacart]

Segmenting Supply with a Data-Driven Methodology [Airbnb]

Causal Inference with Bayesian Structural Time Series Model [Walmart]

Advancements in Embedding-Based Retrieval [Pinterest]

How Data Scientists Lead and Drive Impact [Meta]

Building Scalable Risk Management Platform [Revolut]

Tackling Interference Bias with Marketplace Marginal Values [Lyft]

Causal Inference with Double Machine Learning [Microsoft]

Scalable and Blendable Feed Construction [Whatnot]

Using Generative and Traditional AI to Enhance Travel Experience [Expedia]

Ensuring Data Quality at Petabyte Scale [Glassdoor]

Building a Travel Assistant with LLMs [Agoda]

Setting Goals at Scale with the Goal Map [Meta]

Predicting user actions with transformer-based models [Hike]

Quantization Techniques for Language Model [EsperantoTech]

Authentication Required