Data Science Tech Brief By HackerNoon cover art

All Episodes

Data Science Tech Brief By HackerNoon — 140 episodes

#
Title
1

How We Built a Per-Plant CO2 Dataset for 4,551 Power Stations Worldwide

2

Eliminating Data Latency with Event-Driven Pipelines at Enterprise Scale

3

Scaling Self-Service Analytics in Regulated Banking With Metadata-Driven Design

4

How to Rotate Proxies Without Breaking Login Sessions

5

I Built an Open-Source Firebase Analytics Alternative Because I Hit 1M Events/Day Once Too Many

6

Your Redshift Cluster Is Probably Idle 85% of the Time — And You're Paying for All of It

7

What the Real Operating Data on AI Agents Tells Me as an Investor

8

Building Data Quality Into the Pipeline Instead of Cleaning Up After It

9

Why Speed Matters: How Performance in Analytics Saves Business from "Digital Paralysis"

10

Open Data Is Not a Product. Here's What It Takes to Make It One.

11

Why Scrapers Fail: Headers, Sessions, IP Reputation, and Request Patterns

12

I Built an AI-Assisted Data Quality Layer for Operations Dashboards

13

The Source Code Isn't Hidden - You Just Gotta Refocus Your Lens

14

Why Your Data Governance Framework Is Failing (And What You Can Do About It)

15

The Cloud Data Leak: Architecting SQL to Stop Financial Bleeding

16

Principal Components Analysis in TypeScript (Part 4): Turning PCA Into Interpretable Factor Analysis

17

Data Engineering Teams Need a Different Version of Agile

18

The LLM Veneer: When AI Sounds Smart but Has Nothing Real to Reason Over

19

Bad Ingestion Architecture Generates Million Dollar Snowflake and Databricks Bills

20

Optimizing Distributed Data Processing for ML at Scale

21

Why Finance Data Quality Needs Rule Engines, Not ML Hype

22

156 Blog Posts To Learn About Business Intelligence

23

Why Your Marketplace Scraper Keeps Getting Blocked (And Why It’s Not a Code Problem)

24

How I Decoded My Apple Watch Metrics: Taking a Look At The Raw Numbers (Part 2)

25

Why AI Agents Are Creating a New Kind of Data Engineer

26

The Architectural Limits of Data Lakes and the Rise of Lakehouses

27

The Economic Case for Investing in Youth Education

28

HiveMQ and TimescaleDB: It Just Works!

29

102 Blog Posts To Learn About Datasets

30

Why More Data Doesn’t Guarantee Better Insights in Modern Data Systems

31

500 Blog Posts To Learn About Data

32

228 Blog Posts To Learn About Data Visualization

33

The Hard Lessons of Managing a Data Science Team

34

95 Blog Posts To Learn About Data Storage

35

70 Blog Posts To Learn About Data Scraping

36

500 Blog Posts To Learn About Data Science

37

110 Blog Posts To Learn About Data Management

38

402 Blog Posts To Learn About Data Analytics

39

50 Blog Posts To Learn About Data Collection

40

427 Blog Posts To Learn About Data Analysis

41

Your Dashboard Isn’t Wrong - Your KPI Logic Is

42

The Hidden Cost of Scraping Everything (and Why Datasets Win)

43

500 Blog Posts To Learn About Big Data

44

263 Blog Posts To Learn About Analytics

45

They Got Lost in the Transformer, Episode 1: What Even Is an Embedding?

46

Kafka vs Azure Event Hubs: The Tradeoffs You Only See in Production

47

Clarifying the Difference Between Data Strategy, Analytics, and AI Governance

48

The “Store Everything” Cloud Model Is Breaking Under Modern AI Workloads

49

AI Belongs Inside DataOps, Not Just at the End of the Pipeline

50

Stop Torturing Your Data: How to Automate Rigor With AI

51

Minimum Incident Lineage (MIL): A Run-Level Evidence Standard for Reproducible Data Incidents

52

5 Ways Spark 4.1 Moves Data Engineering From Manual Pipelines to Intent-Driven Design

53

Beyond Prediction: Econometric Data Science for Measuring True Business Impact

54

Designing Economic Intelligence: Econometrics-First Approaches in Data Science

55

From Forecasting to BI: Inside Shravanthi Ashwin Kumar’s Data-Driven Finance Playbook

56

Causal Thinking in the Age of Big Data: Modern Econometrics for Data Scientists

57

Data Pipeline Testing: The 3 Levels Most Teams Miss

58

HSM: The Original Tiering Engine Behind Mainframes, Cloud, and S3

59

Navigating Architectural Trade-offs at Scale to Meet AI Goals in 2026

60

Will AI Take Your Job? The Data Tells a Very Different Story

61

You Don’t Need an API for Everything (Sometimes Scraping Is Enough)

62

How to Use Propensity Score Matching to Measure Down Stream Causal Impact of an Event

63

How to Analyze Call Sentiment With Open-Source NLP Libraries

64

How Bayesian Tail-Risk Modeling can save your Retail Business Marketing Budget

65

Architecting Trustworthy Healthcare Data Platforms Using Declarative Pipelines

66

When A/B Tests Aren’t Possible, Causal Inference Can Still Measure Marketing Impact

67

Why Data Quality Is Becoming a Core Developer Experience Metric

68

Why “Accuracy” Fails for Uplift Models (and What to Use Instead)

69

Turning Your Data Swamp into Gold: A Developer’s Guide to NLP on Legacy Logs

70

Data Monetization Strategies in Government Digital Platforms

71

Why Partner Data Became My Toughest Engineering Problem

72

PBIX Is Not Going Away - But PowerBI Will Never Work the Same Again

73

Smart Fire Protection: How AI Is Changing Preventive Maintenance Forever

74

Why More VARs and SIs Are Embedding Melissa Into Their Enterprise Solutions

75

Big Data as the New Compass of Competition

76

Srilatha Samala’s Agile Intelligence Approach to Enterprise Reporting as a Strategic Asset

77

The Hidden Cost of Bad Data: Why It’s Undermining Your AI Strategy

78

Data Platform as a Service: A Three-Pillar Model for Scaling Enterprise Data Systems

79

How RAG Improves Database Management

80

How To Power AI, Analytics, and Microservices Using the Same Data

81

From Data Fragmentation to Billion-Dollar Insights: The Vision of Manish Ravindra Sharath

82

Building a Layered Defense Against Web Scraping

83

Cosmo: The Graph Visualization Tool Built for Your Terminal

84

How Businesses Are Turning Space Data into a Tool for Risk, Resilience, and Sustainability

85

How Data Innovation Changed a State’s Infrastructure Engine

86

How to Optimize Your Marketing Budget Using Just Three Letters: MMM

87

Here's How ShareChat Scaled Their ML Feature Store 1000X Without Scaling the Database

88

Why You Shouldn’t Judge by PnL Alone

89

From "Decentralized" to "Unified": SUPCON Uses SeaTunnel to Build an Efficient Data Collection Frame

90

Enterprise Data Pipeline Revolution: Suresh Palli's Metadata-Driven Automation Success

91

Unified Data, Smarter Agents—Is Your Architecture Future-Proof?

92

Data-Driven Decisions at Scale: A/B Testing Best Practices for Engineering & Data Science Teams

93

Why You Should (Almost) Always Choose Sync Gunicorn Workers

94

Beyond the Ten Blue Links: How Generative AI Rewires Our Brains for Search

95

Need Web Data? Here Are the 3 Methods Everyone’s Using

96

Applying Transitive Closure to Sort Products Into Categories, Considering Nesting and Overlaps

97

98% of Data Strategies Fail: Let's Fix It

98

How To Measure The Results Of In-App Events When Onelinks Don’t Work

99

How AI-Powered Data Mapping is Democratizing Data Management

100

Data Engineering: What’s the Value of API Security in the Generative AI Era?

101

Say Goodbye to Outdated Diagrams: Automate Your Infrastructure Visualization

102

Why C-Suite Executives Won’t Cut it Without Data Skills Anymore

103

Meet New & Improved BigQuery: Single, Unified AI-Ready Data Platform

104

Decoding Transformers' Superiority over RNNs in NLP Tasks

105

How to Enable Auto-Start for Apache DolphinScheduler

106

Benchmarking Apache Kafka: Performance-per-price

107

When and When Not to Use Apache Kafka as a Database

108

A Leader's Guide to Data-Driven Success

109

Seamlessly Migrate Your On-Premise Data Pipeline to Azure with These Key Steps

110

Data Collection for Product Managers

111

Data Collection for Product Managers

112

Leveraging Data Granularity, Distribution, and Modeling for Effective Product Management

113

How Vectors, Rag and Llama 3 Are Changing First-Party Data

114

16 Best Sklearn Datasets for Building Machine Learning Models

115

Enhancing Audit Processes With Advanced Analytical Tools

116

Go Clean to Be Lean: Data Optimization for Improved Business Efficiency

117

Efficient Data Management and Workflow Orchestration with Apache Doris Job Scheduler

118

Scaling Ethereum: Data Bloat, Data Availability, and the Cloudless Solution

119

What Frontend Devs Want (From Backend Devs)

120

How to Build an AI Chatbot with Python and Gemini API

121

How to Set Up a Local DNS Server With Python

122

The Collective Loves Data: How Big Data Is Shaping and Predicting Our Future

123

Apache Doris for Log and Time Series Data Analysis in NetEase: Why Not Elasticsearch and InfluxDB?

124

Unlocking the Power of Data Lakes for Embedded Analytics in Multi-Tenant SaaS

125

The LinkedIn Nanotargeting Experiment that Broke All the Rules

126

Data Science Interview Question: Creating ROC & Precision Recall Curves From Scratch

127

Why Should Companies Outsource Data Processing?

128

The Role of Big Data in Developing New Medicines

129

Building CI Pipeline with Databricks Asset Bundle and GitLab

130

How I'm Building an AI for Analytics Service

131

Real-Time Anomaly Detection in Underwater Gliders: Experimental Evaluation

132

Real-Time Anomaly Detection in Underwater Gliders: Abstract and Intro

133

The Power of Universal Semantic Layers: Insights from Cube Co-founder Artyom Keydunov

134

A Comprehensive Guide to Building DolphinScheduler 3.2.0 Production-Grade Cluster Deployment

135

Why Monitoring a Distributed Database is More Complex Than You Might Expect

136

Outlier Detection: What You Need to Know

137

Instrument Variables and AB Testing – Part 1

138

Using Arrow Flight SQL Protocol in Apache Doris 2.1 For Super Fast Data Transfer

139

Data Science for Portfolio Optimization: Markowitz Mean-Variance Theory

140

10 Best Datasets for Time Series Analysis