The Data Science Podcast with Fexingo: Analytics, Machine Learning, and Data-Driven Conversations

by Fexingo

Lucas and Luna sit at a data-science workstation, two thin laptops open to scatter plots and clustering visualizations, and ask: what can we actually learn from the numbers? Each episode of The Data Science Podcast with Fexingo is a grounded, specific conversation about a single analytics problem or machine-learning method — from regularization in regression to the bias-variance trade-off in random forests. Lucas leads with a journalistic eye for how models are built and tested in the real world, citing actual case studies like how Netflix used matrix factorization for recommendations or how healthcare researchers apply survival analysis to clinical trials. Luna keeps the discussion honest, asking about data quality, feature engineering pitfalls, and whether a model’s accuracy actually translates to business value. They never resort to buzzwords: instead, they walk through the workflow from data collection to deployment, discussing trade-offs like interpretability versus performance. T

Subscribe · 0 Bookmark

5

How a Data Scientist Found Causal Links Without A-B Tests

Episode 13 of The Data Science Podcast with Fexingo dives into causal inference—specifically, how data scientists can estimate cause-and-effect relationships from observational data when A-B testing isn't possible. Lucas and Luna walk through a real-world case: how a health-tech startup used double machine learning (DML) to determine whether its app's push notifications actually reduced hospital readmissions, without running a randomized trial. They break down the core challenge—confounding variables—and explain how DML uses machine learning to model both the treatment and the outcome, then isolates the causal effect. The conversation covers the 'honest' approach of sample splitting to avoid overfitting bias, and why this method is gaining traction in fields like economics, epidemiology, and marketing. By the end, listeners will understand the difference between correlation and causation in a practical, code-adjacent way, and know one concrete technique to try when an A-B test is off the table. #CausalInference #DoubleMachineLearning #DataScience #ObservationalData #ConfoundingVariables #A-BTesting #CausalEffect #MachineLearning #HealthcareAnalytics #ReadmissionRates #PushNotifications #TreatmentEffect #SampleSplitting #Econometrics #Technology #FexingoBusiness #BusinessPodcast #DataDriven Keep every episode free: buymeacoffee.com/fexingo

May 26, 2026

8m
4

How Bayesian A-B Testing Avoids False Positives

Episode 12 of The Data Science Podcast dives into why traditional frequentist A/B testing can lead to false positives and how a Bayesian approach fixes it. Lucas and Luna walk through a concrete example: an e-commerce team testing a new checkout flow that looked like a winner at 5,000 visitors but collapsed at 10,000. They explain p-hacking, the peek problem, and how Bayesian methods with prior distributions, posterior probabilities, and expected loss give you more reliable decisions with smaller sample sizes. No math PhD required — just practical intuition for data scientists and product managers who run tests every week. This episode also covers the free, ad-free mission of the podcast and how listeners can support it. #BayesianA-BTesting #FrequentistStatistics #p-value #p-hacking #DataScience #A-BTesting #BayesianInference #PriorDistribution #PosteriorProbability #ExpectedLoss #FalsePositive #EcommerceTesting #ProductManagement #Experimentation #MachineLearning #StatisticalMethods #FexingoBusiness #BusinessPodcast Keep every episode free: buymeacoffee.com/fexingo

May 26, 2026

13m
3

How Imbalanced Data Ruins Classification Models

Episode 11 of The Data Science Podcast tackles the hidden danger of imbalanced datasets. Lucas and Luna walk through a real-world example: a fraud detection model trained on 99.9 percent legitimate transactions and 0.1 percent frauds. The model achieved 99.9 percent accuracy yet caught zero frauds. They explain why accuracy is a terrible metric on imbalanced data, introduce precision-recall curves and F1-score as better alternatives, and discuss resampling techniques like SMOTE and cost-sensitive learning. Listeners will learn how to spot imbalance traps in their own projects and why some problems require rethinking the loss function entirely. The conversation stays practical and code-adjacent without getting lost in syntax. If you have ever trained a classifier on skewed data and felt something was off, this episode will give you the diagnostic tools to fix it. #ImbalancedData #Classification #FraudDetection #PrecisionRecall #F1Score #SMOTE #CostSensitiveLearning #DataScience #MachineLearning #ModelEvaluation #AccuracyTrap #Resampling #ClassImbalance #Technology #BusinessPodcast #FexingoBusiness #TheDataSciencePodcast #Fexingo Keep every episode free: buymeacoffee.com/fexingo

May 25, 2026

8m
2

Why Your Chatbot Hallucinates and How to Fix It

In this episode, Lucas and Luna tackle one of the most frustrating problems in modern AI: hallucination in large language models. They break down the specific mechanisms that cause models to confidently generate false information, using the example of a customer support chatbot that invented a refund policy. Lucas explains how retrieval-augmented generation (RAG) and grounding techniques can reduce hallucination rates from over 20 percent to under 5 percent, citing a 2025 paper from Google DeepMind. They also discuss trade-offs with latency and cost, and why no approach is perfect yet. The conversation stays grounded in real numbers and concrete engineering decisions, giving listeners a clear framework for diagnosing and mitigating hallucinations in their own applications. #Hallucination #LargeLanguageModels #RAG #RetrievalAugmentedGeneration #AIModels #MachineLearning #NLP #Chatbot #PromptEngineering #Grounding #GoogleDeepMind #AIAccuracy #DataScience #Technology #FexingoBusiness #BusinessPodcast #ModelReliability #AISafety Keep every episode free: buymeacoffee.com/fexingo

May 25, 2026

8m
1

How Interpretable Machine Learning Found a Hidden Cancer Signal

In this episode, Lucas and Luna explore how interpretability tools like SHAP and LIME uncovered a hidden signal in a hospital's cancer diagnosis model. They walk through a real case where the model was accurate but biased by a spurious correlation with patient age, and how a data scientist used local explanations to catch it before deployment. The conversation covers the difference between global and local interpretability, why accuracy metrics can mask dangerous blind spots, and how one hospital saved lives by asking 'why' instead of just 'how good'. Lucas and Luna also touch on the trade-off between model complexity and explainability, and why regulators are starting to demand interpretable models in healthcare. Listeners will come away with a concrete example of why interpretability isn't just a nice-to-have but a critical safety check in high-stakes machine learning. #InterpretableML #SHAP #LIME #HealthcareAI #CancerDiagnosis #ModelBias #DataScience #MachineLearning #Explainability #FeatureImportance #ModelValidation #AIEthics #ClinicalAI #DataDriven #Podcast #Technology #FexingoBusiness #BusinessPodcast Keep every episode free: buymeacoffee.com/fexingo

May 24, 2026

8m
0

How A-B Testing Can Mislead You in Data Science

In this episode, Lucas and Luna dig into a specific pitfall of A/B testing that tripped up a real fintech company: Simpson's Paradox. They walk through the exact scenario where a landing page variant showed higher conversion overall but lost on every customer segment, and explain why a proper stratified analysis would have caught the trap. Lucas brings a concrete example from a 2025 A/B test at a mid-sized payments startup where the naive read of the numbers nearly shipped an inferior product. Luna pushes back on whether sample size alone is ever enough protection. The conversation leaves listeners with a single decision rule to sanity-check any experiment where segment sizes differ between control and treatment. No abstract theory — just one sharp case you can use Monday morning. #ABTesting #SimpsonsParadox #DataScience #Statistics #Experimentation #ProductAnalytics #ConversionRate #CausalInference #DataDriven #Technology #Business #Analytics #MachineLearning #StratifiedAnalysis #Fintech #UserTesting #FexingoBusiness #BusinessPodcast Keep every episode free: buymeacoffee.com/fexingo

May 24, 2026

6m
-1

When Training Data and Real Data Diverge

Episode 7 of The Data Science Podcast tackles the critical concept of distribution shift — what happens when the data your model sees in production differs significantly from its training data. Lucas and Luna walk through a concrete example from a ride-hailing app that saw its demand prediction model fail during a holiday surge. They explain covariate shift, prior probability shift, and concept drift using that real case, and discuss practical detection methods including statistical tests like the Kolmogorov-Smirnov test and population stability index. The episode also covers monitoring strategies and retraining triggers, giving listeners actionable takeaways for building robust ML systems. No ads — just clear, specific data science conversation. #DataScience #MachineLearning #DistributionShift #CovariateShift #ConceptDrift #ModelMonitoring #MLEngineering #RideHailing #DemandPrediction #KolmogorovSmirnov #PopulationStabilityIndex #DataDrift #ProductionML #ModelRetraining #TechPodcast #FexingoBusiness #BusinessPodcast #Technology Keep every episode free: buymeacoffee.com/fexingo

May 23, 2026

8m
-2

How Data Drift Makes Models Go Stale

Machine learning models don't break the way software does. They rot slowly, like fruit left on the counter. In this episode, Lucas and Luna explore a real-world case from a fintech lending company that deployed a fraud detection model in late 2024. By February 2026, the model's precision had dropped from 92% to 61% — not because of a bug, but because borrower behavior shifted. This is data drift: the gap between training data and live data. Lucas explains the two types — covariate shift and concept drift — and walks through the fintech's post-mortem. They discuss detection methods, monitoring dashboards, and the hard decision to retrain or rebuild. Luna asks the crucial question: if drift is inevitable, why don't more teams bake monitoring into their MLOps pipeline from day one? By the end, listeners understand why drift is the silent killer of production models — and how to spot it before it costs real money. #DataDrift #ModelMonitoring #MLOps #MachineLearning #DataScience #FraudDetection #Fintech #CovariateShift #ConceptDrift #ModelDegradation #ProductionML #ModelRetraining #DataQuality #MLInfrastructure #FexingoBusiness #BusinessPodcast #Technology #Analytics Keep every episode free: buymeacoffee.com/fexingo

May 23, 2026

5m
-3

How Recommendation Engines Trap You in a Filter Bubble

In this episode of The Data Science Podcast, Lucas and Luna explore how recommendation algorithms create filter bubbles that trap users in narrowing loops. Using a real example from a major social media platform's newsfeed algorithm in early 2026, they break down the mechanics behind collaborative filtering, the feedback loop that causes category convergence, and the one metric engineers use to detect overspecialization. Lucas explains the mathematical shift from maximizing engagement to measuring content diversity, and Luna pushes back on whether platforms actually want to fix the problem. The conversation covers edge-case exposure, the cold-start problem for new creators, and why recommending 'boring but diverse' content is harder than optimizing for clicks. A must-hear for anyone working on recommendation systems or thinking about the ethics of personalization. #FilterBubble #RecommendationAlgorithms #CollaborativeFiltering #ContentDiversity #MachineLearning #DataScience #EngagementMetrics #FeedbackLoop #ColdStartProblem #EdgeCases #NewsfeedAlgorithm #PlatformEthics #MLInProduction #TechEthics #DataSciencePodcast #FexingoBusiness #BusinessPodcast #Technology Keep every episode free: buymeacoffee.com/fexingo

May 22, 2026

7m
-4

How a Hedge Fund Built a Better Model with Feature Engineering

In this episode of The Data Science Podcast, Lucas and Luna dive into the art of feature engineering — the process of transforming raw data into inputs that make machine learning models actually work. They anchor the discussion around a specific case: a mid-sized hedge fund that improved its equity factor model's Sharpe ratio from 0.7 to 1.4, not by changing algorithms, but by redesigning feature creation. Lucas explains how the fund derived rolling volatility regime indicators, time-decayed correlation features, and synthetic interaction terms from trading data, and why these had more impact than switching from XGBoost to a neural net. Luna challenges the reproducibility of such gains and asks about feature selection pitfalls. They also touch on the broader lesson: that in data science, domain expertise often matters more than model architecture. The episode includes a natural mid-show acknowledgment that listener support via buy me a coffee dot com slash fexingo keeps the show ad-free and accessible. #FeatureEngineering #MachineLearning #DataScience #HedgeFund #EquityFactorModel #SharpeRatio #RollingVolatility #InteractionTerms #DomainExpertise #ModelPerformance #FeatureSelection #QuantitativeFinance #XGBoost #NeuralNetworks #AlphaGeneration #Finance #Technology #FexingoBusiness Keep every episode free: buymeacoffee.com/fexingo

May 22, 2026

12m
-5

How a Midwest Bank Built a Better Credit Model with Ensemble Methods

Episode 3 of The Data Science Podcast dives into a real-world case study: how a $12 billion regional bank in the Midwest overhauled its consumer credit scoring model using ensemble methods. Lucas walks through the specific problem — a legacy logistic regression that was rejecting too many thin-file applicants — and how a gradient-boosted tree ensemble, combined with a neural network meta-learner, lifted default prediction accuracy by 18 percent while approving 14 percent more borrowers. Luna presses on the operational trade-offs: model explainability, regulatory compliance under the Equal Credit Opportunity Act, and the compute cost of maintaining two models in production. The episode closes with a data-ethics question: when a better model widens credit access, who decides what 'better' means? #CreditScoring #EnsembleMethods #GradientBoosting #XGBoost #MachineLearning #LogisticRegression #Explainability #SHAP #ECOA #RegulatoryCompliance #ModelDeployment #DataEthics #Finance #Banking #ThinFile #PredictiveModeling #FexingoBusiness #BusinessPodcast Keep every episode free: buymeacoffee.com/fexingo

May 21, 2026

11m
-6

How Data Leakage Inflates Model Performance

In this episode of The Data Science Podcast, Lucas and Luna dive into one of the most common yet overlooked pitfalls in machine learning: data leakage. They break down how a seemingly innocent preprocessing misstep can cause models to appear 20% more accurate in training only to collapse in production. Using the real-world example of a medical diagnosis model that flagged patient IDs as a top predictor, they explain the three main types of leakage—target, time-series, and preprocessing—and share concrete techniques to prevent each. Lucas also introduces the concept of 'target leakage via feature engineering' where future information sneaks into training data, and Luna challenges the audience to audit their own pipelines. By the end, listeners learn to spot leakage symptoms like suspiciously high AUC scores and zero-error features, and walk away with a simple five-minute audit checklist to safeguard their models. #DataLeakage #MachineLearning #ModelValidation #DataScience #MLPipelines #FeatureEngineering #TargetLeakage #TimeSeries #Preprocessing #AUCScore #ModelPerformance #HealthcareAI #DataSciencePodcast #FexingoBusiness #BusinessPodcast #Technology #Analytics #DataDriven Keep every episode free: buymeacoffee.com/fexingo

May 21, 2026

6m
-7

How a Single Number Reveals Which Models Fail in Production

On the premiere of The Data Science Podcast with Fexingo, Lucas and Luna anchor on a startling fact: 87 percent of machine learning projects never make it to production—and of those that do, nearly half degrade within the first six months. They drill into one specific metric—prediction drift—using a case from a mid-sized e-commerce company whose recommendation engine started recommending winter coats in July. Lucas explains how data scientists track distribution shifts with KL divergence and population stability indexes, while Luna questions whether the real problem is organizational, not technical. The hosts set the tone for the show: no vague ML hype, just concrete numbers, real failure stories, and the tools that actually fix broken models. Listeners walk away knowing exactly what drift looks like, why it matters, and how to build a simple monitoring dashboard using open-source tools like Evidently AI and Great Expectations. #MachineLearning #DataScience #ProductionML #ModelMonitoring #PredictionDrift #MLOps #DataDrift #EvidentlyAI #GreatExpectations #AIinProduction #ModelDegradation #DataQuality #DataSciencePodcast #Technology #FexingoBusiness #BusinessPodcast #EcommerceAI #LucasAndLuna Keep every episode free: buymeacoffee.com/fexingo

May 19, 2026

7m

Type above to search every episode's transcript for a word or phrase. Matches are scoped to this podcast.

Searching…

We're indexing this podcast's transcripts for the first time — this can take a minute or two. We'll show results as soon as they're ready.

No matches for "" in this podcast's transcripts.

Showing of matches

No topics indexed yet for this podcast.

Loading reviews...

Share your thoughts

ABOUT THIS SHOW

HOSTED BY

Fexingo

How a Data Scientist Found Causal Links Without A-B Tests

How Bayesian A-B Testing Avoids False Positives

How Imbalanced Data Ruins Classification Models

Why Your Chatbot Hallucinates and How to Fix It

How Interpretable Machine Learning Found a Hidden Cancer Signal

How A-B Testing Can Mislead You in Data Science

When Training Data and Real Data Diverge

How Data Drift Makes Models Go Stale

How Recommendation Engines Trap You in a Filter Bubble

How a Hedge Fund Built a Better Model with Feature Engineering

How a Midwest Bank Built a Better Credit Model with Ensemble Methods

How Data Leakage Inflates Model Performance

How a Single Number Reveals Which Models Fail in Production

Authentication Required