The CTO Podcast with Fexingo: Technical Leadership, Architecture, and Engineering Org podcast artwork

PODCAST · business

The CTO Podcast with Fexingo: Technical Leadership, Architecture, and Engineering Org

Lucas and Luna sit down in front of a whiteboard to dissect the decisions that shape technical organizations. Each episode of The CTO Podcast with Fexingo examines a specific engineering leadership challenge — from scaling a microservices architecture without creating a distributed monolith, to managing the cognitive load of a 200-engineer org, to choosing between a monorepo and polyrepo strategy based on team topology. The conversations are grounded in real-world cases: how Etsy restructured its data pipeline after a 2019 outage, why Stripe’s API versioning policy reduces breaking changes, or what Basecamp’s choice of SQLite over PostgreSQL says about product philosophy. Lucas brings the journalistic rigor — citing commit histories, RFCs, and postmortems — while Luna pushes back with the pragmatics of org dynamics, hiring constraints, and technical debt. There are no hot takes, no vendor pitches, no ‘best practices’ without trade-offs. Each episode ends with a specific tension left un

  1. 48

    How Netflix Rebuilt Its CDN for 300 Million Subscribers

    Netflix's content delivery network, Open Connect, delivers over half of the world's internet traffic at peak. In this episode, Lucas and Luna dive deep into the specific architectural decisions Netflix made to scale its CDN from 100 million to 300 million subscribers. They explore the shift from commercial CDNs to a peered, ISP-embedded appliance model, the move from spinning disks to NVMe SSDs, and the caching algorithms that optimize for long-tail content. The hosts also discuss how Netflix manages the trade-off between cache hit ratio and storage cost, and why they chose to build their own hardware. This episode is a masterclass in infrastructure scaling from one of the most demanding streaming platforms on the planet. #Netflix #OpenConnect #CDN #ContentDeliveryNetwork #StreamingInfrastructure #VideoStreaming #EdgeComputing #CacheOptimization #NVMe #ISP #Peering #LongTailContent #CacheHitRatio #HardwareDesign #BusinessAndTechnology #FexingoBusiness #BusinessPodcast #CTOPodcast Keep every episode free: buymeacoffee.com/fexingo

  2. 47

    How Walmart Rebuilt Its Supply Chain with Real-Time Data

    Walmart's supply chain is the largest in the world, moving over 5 billion units annually. In this episode, Lucas and Luna explore how Walmart rebuilt its supply chain on real-time data from edge to shelf. They break down the 2019 shift from batch processing to streaming events, the use of Kafka at massive scale, and how a machine learning model called 'Eddie' predicts demand by the hour. Lucas explains why Walmart moved its inventory management from mainframes to a cloud-native architecture built on Google Cloud, and how real-time visibility reduced out-of-stocks by 16% in pilot stores. The conversation covers the technical trade-offs—why they chose Apache Beam for processing, how they handle data locality across 4,700 stores, and what happens when a hurricane disrupts the supply graph. Luna pushes back on whether smaller retailers can replicate this, and Lucas outlines the core principles that transfer. This is a deep, non-obvious look at how the world's largest retailer treats logistics as a data platform. #Walmart #SupplyChain #RealTimeData #ApacheKafka #ApacheBeam #GoogleCloud #MachineLearning #InventoryManagement #DataEngineering #TechArchitecture #BusinessAndTechnology #CTO #Logistics #RetailTech #StreamingData #EdgeComputing #FexingoBusiness #BusinessPodcast Keep every episode free: buymeacoffee.com/fexingo

  3. 46

    How Datadog Monitors Its Own Infrastructure

    Episode 58 of The CTO Podcast goes inside Datadog's engineering org to explore how the company monitors its own 100-terabyte infrastructure. Lucas and Luna walk through Datadog's dogfooding culture, the architectural challenges of running a monitoring platform for itself, and how the team handles alert fatigue, distributed tracing, and log ingestion at massive scale. They discuss specific tools like the Datadog Agent, the trace-agent, and the custom time-series database built in-house. The episode includes concrete numbers: 30 trillion time-series points ingested daily, 99.99 percent uptime target, and how the SRE team manages 8,000 hosts across multiple cloud providers. Tune in for a rare look at how the watcher watches itself. #Datadog #InfrastructureMonitoring #Dogfooding #SRE #Observability #TimeSeriesDatabase #DistributedTracing #AlertFatigue #CloudInfrastructure #EngineeringCulture #SiteReliabilityEngineering #DevOps #BusinessAndTechnology #FexingoBusiness #BusinessPodcast #CTO #TechnicalLeadership #Architecture Keep every episode free: buymeacoffee.com/fexingo

  4. 45

    How Figma Rebuilt Its Multiplayer Engine for 500 Users per File

    Figma's multiplayer engine lets hundreds of designers edit the same file simultaneously. How did they rebuild it from scratch to handle over 500 concurrent users per document without conflicts or lag? Lucas and Luna break down the architecture: the shift from CRDTs to a custom conflict-resolution layer, the 'change tree' data structure that replaced operational transforms, and the decision to move from WebSockets to WebRTC data channels for sub-200ms sync. They also discuss the engineering trade-offs: why Figma chose JavaScript over Rust for the client, how they handle undo/redo in a multi-user environment, and the surprising bottleneck that was the browser's own garbage collector. A concrete look at real-time collaboration at scale. #Figma #MultiplayerEngine #RealTimeCollaboration #CRDT #OperationalTransform #WebRTC #ConflictResolution #JavaScript #BrowserPerformance #GarbageCollection #UndoRedo #ChangeTree #Engineering #Architecture #TechLeadership #BusinessAndTechnology #FexingoBusiness #BusinessPodcast Keep every episode free: buymeacoffee.com/fexingo

  5. 44

    How Cloudflare Built Its Global Network for 30 Million Requests per Second

    Lucas and Luna break down the architectural decisions behind Cloudflare's global edge network, which handles over 30 million HTTP requests per second. They explore how the company moved from a simple reverse proxy to a distributed system spanning 330 cities, the role of custom-built Nginx configurations, and the trade-offs between latency and consistency. Specific topics include the use of Anycast routing, the challenge of DDoS mitigation at scale, and how Cloudflare optimized its cache hierarchy for static content delivery. This episode is a deep dive for engineers and CTOs interested in high-performance networking and edge computing. #Cloudflare #EdgeNetwork #CDN #Anycast #DDoS #Nginx #Latency #Scalability #GlobalInfrastructure #BusinessAndTechnology #FexingoBusiness #BusinessPodcast #CTO #Engineering #Architecture #Performance #Networking #Optimization Keep every episode free: buymeacoffee.com/fexingo

  6. 43

    How Stripe Migrated Payment Routing to 99.999% Uptime

    Episode 55 of The CTO Podcast dives into how Stripe rebuilt its payment routing engine to achieve 99.999% uptime. Lucas and Luna break down the architectural shift from a monolithic routing layer to a distributed, deterministic system that handles millions of transactions per second. They explore the team's decision to move away from traditional load balancers, the role of formal verification in routing logic, and how Stripe's engineers stress-tested the system with simulated global outages. Along the way, they discuss the trade-offs between latency and consistency, and why a gradual canary deployment was critical. This episode offers concrete lessons for engineering leaders designing fault-tolerant systems at scale. #Stripe #PaymentRouting #99.999PercentUptime #DistributedSystems #Architecture #FaultTolerance #FormalVerification #CanaryDeployment #LatencyConsistencyTradeoff #PaymentProcessing #EngineeringLeadership #SystemDesign #HighAvailability #BusinessAndTechnology #FexingoBusiness #BusinessPodcast #CTOPodcast #TechLeadership Keep every episode free: buymeacoffee.com/fexingo

  7. 42

    How Datadog Monitors Its Own 100-Terabyte Infrastructure

    Episode 54 of The CTO Podcast: Lucas and Luna explore how Datadog, the monitoring giant, uses its own tools to manage a sprawling infrastructure that ingests over 100 terabytes of data daily. They dive into the dogfooding strategy, the architectural choices that keep observability scalable, and the surprising insight that Datadog runs its entire backend on a single PostgreSQL fork — with custom sharding. Lucas explains the engineering org structure behind the monitoring team, and Luna questions whether dogfooding can blind teams to customer pain. Specific examples include how Datadog handles metric cardinality explosion and why they built a separate time-series database internally before launching it as a product. #Datadog #Observability #Dogfooding #TechLeadership #Infrastructure #PostgreSQL #Scalability #TimeSeriesDatabase #EngineeringCulture #Monitoring #CTOPodcast #FexingoBusiness #BusinessPodcast #Architecture #Sharding #MetricCardinality #SRE #CloudNative Keep every episode free: buymeacoffee.com/fexingo

  8. 41

    How Stripe Rebuilt Payment Routing for 99.999% Uptime

    Stripe's payment infrastructure processes billions of dollars annually, and their routing engine—the system that decides which bank or processor gets each transaction—is a marvel of distributed systems engineering. In this episode, Lucas and Luna explore how Stripe rebuilt its payment routing layer to achieve five-nines uptime, handling failures at the bank level in milliseconds without user impact. They break down the architecture: the state machine that tracks each transaction through six phases, the circuit-breaker pattern that isolates failing processors, and the decision-tree optimization that cut latency by 40 percent. Lucas explains why routing is the hardest problem in payments—more complex than fraud detection or compliance—and how Stripe's design influenced the broader fintech industry. Luna draws parallels to how other critical infrastructure systems, from DNS to CDNs, solve similar reliability problems. A concrete look at what it takes to move money reliably at internet scale. #Stripe #PaymentRouting #DistributedSystems #FiveNines #Fintech #Latency #CircuitBreaker #StateMachines #Reliability #Engineering #Architecture #BusinessAndTechnology #FexingoBusiness #BusinessPodcast #CTOPodcast #TechnicalLeadership #ScalingPayments #SystemDesign Keep every episode free: buymeacoffee.com/fexingo

  9. 40

    How Supabase Rebuilt Postgres for Real-Time Apps

    In this episode, Lucas and Luna explore how Supabase, an open-source Firebase alternative, built a real-time layer on top of PostgreSQL that handles millions of concurrent WebSocket connections. They break down the architecture behind Supabase's Realtime server, which uses PostgreSQL's logical replication and Elixir's BEAM VM to stream database changes to client applications with sub-second latency. Lucas explains why the team chose to fork PostgreSQL's replication slot mechanism and how they handle backpressure when clients fall behind. Luna questions the trade-offs of using WebSockets versus server-sent events for real-time data synchronization. The conversation also touches on Supabase's decision to build on AWS's Graviton processors to reduce costs and how the company scaled from zero to over 200,000 users without a dedicated infrastructure team. If you're building a real-time application or just curious about modern database architecture, this episode offers concrete insights into one of the most exciting open-source projects in the cloud space. #Supabase #PostgreSQL #RealTime #WebSockets #Elixir #BEAM #Database #Backend #Architecture #OpenSource #FirebaseAlternative #LogicalReplication #AWS #Graviton #Scalability #BusinessAndTechnology #FexingoBusiness #BusinessPodcast Keep every episode free: buymeacoffee.com/fexingo

  10. 39

    How Discord Rebuilt Its Voice Engine for Sub-50ms Latency

    In this episode of The CTO Podcast, Lucas and Luna dive into how Discord achieved sub-50 millisecond voice latency across millions of concurrent users. They break down the specific architectural changes Discord made: switching from Opus to a custom codec called Siren, rewriting their audio processing pipeline in Rust, and deploying edge relays in over 300 locations worldwide. The discussion covers why Discord chose to build its own transport protocol over WebRTC, how they handle packet loss with forward error correction, and the trade-offs between CPU usage and bandwidth. Lucas explains the key metric that guided their redesign — the 99th percentile one-way voice latency — and how they optimized for it without sacrificing audio quality. Luna challenges whether the effort was worth it given Discord's core use case for gamers, and Lucas argues that voice latency is the defining feature for real-time communication. The episode includes a brief donation segment near the end, seamlessly woven into the conversation about open-source tools and community support. Perfect for CTOs, engineering leaders, and anyone building real-time audio applications. #Discord #VoiceEngine #LowLatency #RealTimeAudio #SirenCodec #Rust #WebRTC #EdgeRelays #ForwardErrorCorrection #Sub50ms #GameChat #AudioPipeline #CTO #EngineeringLeadership #RealTimeCommunication #BusinessAndTechnology #FexingoBusiness #BusinessPodcast Keep every episode free: buymeacoffee.com/fexingo

  11. 38

    How Airbnb Rebuilt Search for 150 Million Guests

    In this episode, Lucas and Luna dive into Airbnb's multi-year effort to rebuild its search infrastructure to handle 150 million nightly searches. They explore the shift from a monolithic PostgreSQL-backed system to a custom search service built on Elasticsearch, the trade-offs between relevance and latency, and the team's decision to implement a two-phase ranking system with lightweight machine learning at query time. Specific numbers include Airbnb's pre-migration latency of 800 milliseconds for a single search and the post-migration reduction to under 200 milliseconds at peak. The discussion also covers how the engineering team organized around the project, the cultural challenges of migrating a core revenue system without downtime, and the unexpected lesson that algorithmic ranking reduced guest booking friction by 12 percent. Perfect for CTOs, engineering leaders, and anyone architecting large-scale consumer platforms. #Airbnb #SearchEngine #Elasticsearch #MachineLearning #SystemArchitecture #Latency #RankingAlgorithm #EngineeringCulture #TechLeadership #CTO #BusinessPodcast #FexingoBusiness #Fexingo #Podcast #TechInfrastructure #BigData #ConsumerTech #PerformanceOptimization Keep every episode free: buymeacoffee.com/fexingo

  12. 37

    How Postgres Powers 40 Percent of New Cloud Databases

    Lucas and Luna examine how PostgreSQL has quietly become the default database for modern cloud-native applications. They trace the journey from a 1996 open-source project to powering 40 percent of new database instances on AWS, Azure, and Google Cloud. The episode focuses on the architectural decisions that made Postgres scalable: its extension ecosystem, the rise of managed services like Aurora and Cloud SQL, and how its MVCC concurrency model handles mixed workloads. They also discuss why developers are migrating from proprietary databases and what Postgres's dominance means for the database industry. Specific examples include how Instacart uses Postgres for real-time inventory and how Citus extends it for sharding. #PostgreSQL #CloudDatabases #DatabaseArchitecture #OpenSource #AWS #AuroraPostgres #CloudSQL #MVCC #Citus #Instacart #DatabaseMigrations #ExtensionEcosystem #ManagedDatabases #BusinessTechnology #Business #TechLeadership #FexingoBusiness #BusinessPodcast Keep every episode free: buymeacoffee.com/fexingo

  13. 36

    How Gitlab Runs Remote Engineering with 2000 Developers

    In this episode of The CTO Podcast, Lucas and Luna dive into the operational and cultural mechanics behind GitLab's all-remote engineering organization. With over 2000 developers spread across 65 countries, GitLab has become a case study in asynchronous work, written documentation, and intentional culture-building. Lucas walks through GitLab's handbook-first approach, how they structure teams around 'stable counterparts' to avoid silos, and the specific tools and rituals that keep a global engineering org aligned. Luna challenges the model's trade-offs: burnout risk in async environments, the difficulty of onboarding without synchronous mentorship, and whether remote scales differently for engineering versus other functions. Together, they unpack what GitLab learned when it hit 1000 engineers — and how it adapted. Specific numbers, concrete practices, and honest criticism. If you're building or leading a distributed team, this episode gives you the architecture behind the architecture. #GitLab #RemoteEngineering #AsynchronousWork #EngineeringManagement #DistributedTeams #HandbookFirst #DevOps #CTO #TechnicalLeadership #EngineeringOrg #RemoteCulture #ScalingTeams #Business #Technology #FexingoBusiness #BusinessPodcast #CTOPodcast #Fexingo Keep every episode free: buymeacoffee.com/fexingo

  14. 35

    How Monzo Rebuilt Its Core Banking Engine for Real-Time

    Lucas and Luna dive into how Monzo, the UK digital bank, replaced its legacy core banking system with a real-time event-driven architecture. They explore the technical bet on Apache Kafka as the source of truth, the migration from a batch-processing model to stream processing, and the engineering trade-offs involved in ensuring instant balance updates without breaking financial integrity. With specific numbers on transaction throughput and uptime targets, this episode unpacks a case study in modernizing financial infrastructure at scale. #Monzo #CoreBanking #EventDriven #ApacheKafka #RealTime #Fintech #StreamProcessing #Microservices #Architecture #Migration #FinancialServices #UKTech #Business #Technology #FexingoBusiness #BusinessPodcast #CTOPodcast #TechLeadership Keep every episode free: buymeacoffee.com/fexingo

  15. 34

    How LinkedIn Rebuilt Search for 950 Million Members

    LinkedIn's search team faced a massive technical challenge: how to serve relevant results to 950 million members across jobs, people, companies, and posts — all while respecting privacy and permissions. In this episode, Lucas and Luna dive into how the team rebuilt LinkedIn's search infrastructure using a real-time indexing pipeline and a custom retrieval engine called Galene. They discuss the trade-offs between relevance and speed, the decision to move away from Apache Solr, and how LinkedIn handles multilingual queries and typo tolerance. Specific numbers include: 2.5 billion search queries per week, 100 million daily active job searches, and a 40 percent reduction in query latency after the rebuild. #LinkedIn #SearchEngineering #Galene #RealTimeIndexing #ApacheSolr #InformationRetrieval #QueryLatency #MultilingualSearch #TypoTolerance #EngineeringLeadership #TechArchitecture #BusinessTechnology #FexingoBusiness #BusinessPodcast #CTO #PlatformEngineering #SearchRelevance #Infrastructure Keep every episode free: buymeacoffee.com/fexingo

  16. 33

    How HashiCorp Rebuilt Terraform for Multi-Cloud Scale

    In this episode, Lucas and Luna dive into HashiCorp's architectural overhaul of Terraform to handle multi-cloud deployments at massive scale. They explore the shift from a monolithic state management system to a modular, plugin-based architecture, the introduction of Terraform Cloud's real-time collaboration features, and the engineering decisions behind maintaining backward compatibility while scaling to over 100 million monthly runs. The hosts discuss the trade-offs between performance and consistency, the role of infrastructure as code in modern DevOps, and how HashiCorp's approach to provider abstraction enables organizations to manage hundreds of cloud resources across AWS, Azure, and Google Cloud seamlessly. A must-listen for engineering leaders and platform architects. #HashiCorp #Terraform #MultiCloud #InfrastructureAsCode #DevOps #CloudComputing #StateManagement #Architecture #BusinessAndTechnology #Podcast #FexingoBusiness #BusinessPodcast #CTOPodcast #EngineeringLeadership #PlatformEngineering #CloudScale #TerraformCloud #ProviderAbstraction Keep every episode free: buymeacoffee.com/fexingo

  17. 32

    How CockroachDB Survived the Cloud Database Wars

    Episode 44 of The CTO Podcast dives deep into how Cockroach Labs built a distributed SQL database that could survive not just server failures, but the competitive onslaught of AWS, Google, and Microsoft. Lucas walks through the key architectural decisions — the Raft consensus protocol, the geo-partitioning trick that made multi-region compliance possible, and the controversial move to make the product open-source but the enterprise features proprietary. Luna presses on how CockroachDB lost Google's internal adoption to Spanner but won over financial-services customers like JPMorgan. The episode also covers the inflection point in 2023 when CockroachDB hit $50 million in annual recurring revenue and how the team decided to prioritize horizontal scalability over SQL compatibility. Concrete numbers include the 4.5-year development cycle to GA, the 20x latency penalty for global writes before optimization, and the 99.995 percent uptime guarantee they eventually published. A behind-the-scenes note on listener support closes the episode. #CockroachDB #DistributedSQL #CloudDatabases #RaftConsensus #CockroachLabs #SpencerKimball #PeterMattis #BenDarnell #GoogleSpanner #AWS #JPMorgan #OpenSource #TechArchitecture #Scalability #BusinessAndTechnology #FexingoBusiness #BusinessPodcast #CTOPodcast Keep every episode free: buymeacoffee.com/fexingo

  18. 31

    How Vercel Rebuilt Its Edge Network for Sub-50ms Cold Starts

    Lucas and Luna dive into how Vercel redesigned its edge compute layer to achieve cold-start latencies under 50 milliseconds, even for complex serverless functions. They unpick the architecture behind Vercel's 'Edge Functions' — from isolate pooling and Wasm-based sandboxing to regional pre-warming. The hosts discuss the trade-offs between JavaScript and Rust runtimes, how Vercel collaborates with Cloudflare on WinterJS, and why sub-50ms cold starts matter for real-time personalisation at scale. A concrete look at the engineering decisions that let developers run logic at the network edge without the traditional cold-start tax. #Vercel #EdgeComputing #Serverless #ColdStarts #Wasm #WinterJS #Cloudflare #Rust #JavaScript #PerformanceEngineering #CDN #WebAssembly #IsolatePooling #RegionalPreWarming #RealTimePersonalisation #Business #FexingoBusiness #BusinessPodcast Keep every episode free: buymeacoffee.com/fexingo

  19. 30

    How Slack Rebuilt Its Backend for 10 Million Daily Active Users

    In this episode, Lucas and Luna dive into the technical decisions behind Slack's backend overhaul as it scaled from a small team tool to a platform serving 10 million daily active users. They explore how Slack moved from a monolithic Ruby on Rails architecture to a service-oriented model using Java and C++, the critical choice of building its own message queue instead of relying on Kafka or RabbitMQ, and how the team tackled the 'unread counts' challenge that nearly broke the system. With specific examples like the Flannel service for real-time presence and the Vitess database sharding layer, this episode offers concrete lessons for CTOs and engineering leaders wrestling with growth. No vague platitudes — just the architecture decisions that kept Slack online during its hypergrowth phase. #Slack #BackendArchitecture #CTO #EngineeringLeadership #Scalability #Microservices #RealTimeMessaging #RubyOnRails #Java #CPlusPlus #Vitess #MessageQueue #Flannel #UnreadCounts #Business #Technology #FexingoBusiness #BusinessPodcast Keep every episode free: buymeacoffee.com/fexingo

  20. 29

    How Notion Scaled Its Real-Time Sync Engine

    Notion's real-time sync has become table stakes for any collaborative product, but building it was anything but straightforward. In this episode, Lucas and Luna break down how Notion's engineering team moved from a naive polling model to a custom CRDT-based sync engine that handles millions of concurrent edits across documents, databases, and wikis. They walk through the key design decisions: why they chose a hybrid logical clock over vector clocks, how they handle conflict resolution without a central server, and the storage tradeoffs they made to keep latency under 100 milliseconds. Lucas also shares a concrete example of a sync bug that caused data loss for 48 hours in 2021 and how they rebuilt their test harness to prevent it from happening again. If you're building any kind of real-time collaborative app, this episode offers a rare behind-the-scenes look at what it actually takes to make 'instant sync' work at scale. #Notion #RealTimeSync #CRDT #Collaboration #DistributedSystems #Engineering #TechLeadership #Productivity #ConflictResolution #Database #Latency #Scalability #Startup #Business #Technology #FexingoBusiness #BusinessPodcast #CTOPodcast Keep every episode free: buymeacoffee.com/fexingo

  21. 28

    How Linear Uses Linear Technology to Build Linear

    Episode 40 of The CTO Podcast explores Linear — the project management tool built by a team of seven engineers using what they ship. Lucas and Luna walk through Linear's architecture: a single TypeScript codebase, a custom sync engine built on SQLite and CRDTs, and how they handle optimistic updates with zero conflict. The episode examines why the team chose not to adopt microservices, how they keep latency under 50 milliseconds even on shaky connections, and what happens when your dogfooding strategy means your entire infrastructure is also your product. Specific numbers discussed: seven engineers, 50 ms sync latency, zero merge conflicts on issues, and a 99.95% uptime target with no dedicated SRE team. A grounded look at how a small team ships a tool used by thousands of engineering orgs — including the one that built it. #Linear #ProjectManagement #SoftwareArchitecture #TypeScript #SQLite #CRDTs #Dogfooding #StartupEngineering #SyncEngine #OptimisticUpdates #SingleCodebase #SmallTeam #Productivity #Business #Technology #FexingoBusiness #CTOPodcast #EngineeringCulture Keep every episode free: buymeacoffee.com/fexingo

  22. 27

    How Shopify Handles Black Friday Traffic With Static Caching

    Lucas and Luna break down how Shopify prepares its infrastructure for the biggest shopping day of the year. They focus on a specific technique: using edge static caching to absorb 90 percent of read requests before they hit the application layer. The episode walks through Shopify's architecture for serving storefront pages from CDN nodes, how they invalidate caches when a merchant updates a product, and what happens when the cache misses. Lucas explains the trade-offs between stale content and site reliability, and Luna asks about the blast radius of a cache stampede. They also touch on how Shopify's approach differs from a generic CDN setup. By the end, listeners understand one concrete pattern for scaling read-heavy traffic without burning server capacity. #Shopify #BlackFriday #StaticCaching #EdgeComputing #CDN #SiteReliability #CacheInvalidation #TrafficSurge #EcommerceInfrastructure #ReadHeavyWorkload #CacheStampede #WebPerformance #RubyOnRails #Fastly #Cloudflare #Business #Technology #FexingoBusiness Keep every episode free: buymeacoffee.com/fexingo

  23. 26

    How Amazon Built Its One-Day Delivery Supply Chain

    In 2019, Amazon announced it would convert Prime shipping from two days to one day. Most people saw a marketing promise. Engineers saw a logistics nightmare. This episode unpacks how Amazon rebuilt its fulfillment network — restructuring inventory placement, rethinking sortation center algorithms, and launching its own air hub in Cincinnati — to make one-day delivery economically viable across millions of SKUs. Lucas and Luna walk through the key architectural decisions: how Amazon used machine learning to predict demand at the zip-code level, decoupled its fulfillment centers from its transportation layer, and absorbed a multi-billion-dollar cost that competitors couldn't replicate. They also touch on the trade-offs: higher inventory carrying costs, pressure on warehouse labor, and the environmental toll of speed. A grounded look at how the world's most demanding logistics system was rearchitected from the inside out. #Amazon #OneDayDelivery #SupplyChain #Logistics #Fulfillment #MachineLearning #InventoryManagement #SortationCenters #PrimeAir #CincinnatiAirHub #LastMileDelivery #OperationsResearch #Business #Technology #Engineering #FexingoBusiness #BusinessPodcast #CTOPodcast Keep every episode free: buymeacoffee.com/fexingo

  24. 25

    How GitLab Runs Remote Engineering with 2000 Developers

    In this episode, Lucas and Luna dive into how GitLab manages a fully remote engineering organization of over 2,000 developers. They explore the company's unique handbook-first culture, how they maintain code quality across time zones, and the specific tools they use for asynchronous communication. Lucas shares key metrics: GitLab ships 40 releases per year with a median merge request cycle time of under 6 hours. They also discuss how the company handles onboarding, performance reviews, and incident response without a physical office. A must-listen for anyone leading or building a remote engineering team. #GitLab #RemoteEngineering #EngineeringManagement #AsynchronousWork #DevOps #CodeReview #TechLeadership #BusinessAndTechnology #FexingoBusiness #BusinessPodcast #CTO #EngineeringOrg #RemoteWork #HandbookDriven #MergeRequest #Onboarding #IncidentResponse #Culture Keep every episode free: buymeacoffee.com/fexingo

  25. 24

    How Figma Scales Real-Time Collaboration With CRDTs

    Episode 36 of The CTO Podcast dives into how Figma built its real-time collaboration engine using Conflict-Free Replicated Data Types (CRDTs). Lucas and Luna unpack the architectural decision to move from Operational Transform to CRDTs, how Figma handles merge conflicts at scale, and the engineering tradeoffs behind its vector-based multi-user editing. They walk through the key design choices: why Figma chose a custom CRDT instead of off-the-shelf libraries, how it serialises operations for low-latency sync across hundreds of collaborators on a single file, and the surprising way it prioritises local responsiveness over consistency. Luna asks the hard questions about production incidents, and Lucas breaks down the monitoring approach behind Figma's 'real-time' guarantee. A concrete look at distributed systems theory meeting product design. #Figma #CRDT #RealTimeCollaboration #DistributedSystems #ConflictFreeReplicatedDataTypes #OperationalTransform #ProductDesign #Collaboration #Latency #Engineering #Architecture #Whiteboard #MultiUserEditing #Sync #VectorGraphics #BusinessAndTechnology #FexingoBusiness #BusinessPodcast Keep every episode free: buymeacoffee.com/fexingo

  26. 23

    How Elasticsearch Powers Netflix's Search and Observe

    Netflix runs one of the largest Elasticsearch deployments in the world — over 150 clusters, thousands of nodes, processing tens of billions of documents. In this episode, Lucas and Luna unpack how Netflix uses Elasticsearch not just for log aggregation, but to power its internal search, real-time monitoring, and even the titles you see when you open the app. They walk through the architecture behind Netflix's search — from how they handle partial matches across 17,000 titles to how they keep observability data flowing without crashing the clusters. Along the way, they cover shard sizing, index lifecycle management, and the painful lessons Netflix learned when Elasticsearch failed at scale. A practical episode for any engineering leader running search or observability at scale. #Elasticsearch #Netflix #SearchArchitecture #Observability #Logging #DistributedSystems #Sharding #IndexLifecycleManagement #RealTimeMonitoring #EngineeringLeadership #CTO #TechnicalDebt #Infrastructure #SiteReliabilityEngineering #BusinessAndTechnology #FexingoBusiness #BusinessPodcast #TheCTOPodcast Keep every episode free: buymeacoffee.com/fexingo

  27. 22

    How Discord Rebuilt Its Voice Engine for Latency

    In this episode of The CTO Podcast, Lucas and Luna dive into Discord's architectural overhaul of its real-time voice system. They explore how the team reduced latency from hundreds of milliseconds to under 50 by switching from a traditional client-server model to a mesh-based WebRTC architecture. The discussion covers the trade-offs of running their own media servers versus outsourcing, the engineering challenge of synchronizing 50 users in a single voice channel without a central coordinator, and how Discord handled the transition without disrupting its 150 million monthly active users. Lucas explains the key insight: rather than optimizing the existing pipeline, Discord rethought the entire signaling and media routing layer around a 'selective forwarding unit' pattern. Luna presses on the operational cost of running proprietary infrastructure at scale, and Lucas shares the surprising finding that the rewrite actually reduced server spend by 30 percent. The episode closes with a reflection on when to rebuild versus patch. #Discord #VoiceEngine #WebRTC #LowLatency #RealTimeCommunication #MeshArchitecture #SelectiveForwardingUnit #CTO #EngineeringOrg #Scaling #Infrastructure #TechnicalLeadership #Business #Technology #FexingoBusiness #BusinessPodcast #TheCTOPodcast #Architecture Keep every episode free: buymeacoffee.com/fexingo

  28. 21

    How AWS Built Its Control Plane for 200 Services

    Amazon Web Services runs over 200 services, each with its own control plane. In this episode, Lucas and Luna break down how AWS's internal architecture team designed a unified control plane framework that handles millions of API requests per second across regions. They explore the concept of 'control plane as a platform' — a set of reusable primitives for authorization, rate limiting, and state management that lets service teams focus on business logic. Lucas walks through the key design decisions: separating data plane from control plane at the infrastructure level, using eventual consistency for global state, and the 'cell-based architecture' that isolates failures. Luna asks how this affects developers building on AWS today and whether the pattern is reproducible outside of hyperscalers. A specific look at one of the most complex distributed systems ever built, and what it teaches us about scaling engineering orgs. #AWS #ControlPlane #DistributedSystems #CloudArchitecture #EngineeringAtScale #TechLeadership #PlatformEngineering #FexingoBusiness #BusinessPodcast #CTOPodcast #AWSreInvent #CellBasedArchitecture #APIDesign #Authorization #RateLimiting #EventualConsistency #InfrastructureAsCode #Scaling Keep every episode free: buymeacoffee.com/fexingo

  29. 20

    How Stripe Runs a Global Payment Platform With 99.999 Percent Uptime

    Stripe processes hundreds of billions in payments annually. But behind the API is a reliability architecture that few people talk about. In this episode, Lucas and Luna dive into how Stripe achieves five-nines uptime across its payment infrastructure — the layers of redundancy, the careful rollout strategy, and the incident response playbook that keeps money moving. They explore Stripe's use of circuit breakers, gradual canary deployments, and a global multi-region database topology that can survive an entire cloud region going dark. Specific numbers: Stripe's documented 99.999% uptime goal, the 30-minute maximum recovery time for critical services, and how they test failure scenarios weekly. If you're building systems where every millisecond counts, this is a masterclass in production resilience. No marketing fluff — just the engineering reality behind one of the most critical payment platforms on the internet. #Stripe #PaymentInfrastructure #ReliabilityEngineering #FiveNines #Uptime #IncidentResponse #CanaryDeployments #CircuitBreakers #MultiRegion #FaultTolerance #SRE #ProductionResilience #PaymentProcessing #GlobalInfrastructure #BusinessAndTechnology #FexingoBusiness #BusinessPodcast #CTOPodcast Keep every episode free: buymeacoffee.com/fexingo

  30. 19

    How Uber Rebuilt Its Maps for 40 Million Daily Rides

    Episode 31 of The CTO Podcast digs into how Uber's engineering team rebuilt its mapping and routing stack from scratch between 2019 and 2022 to handle over 40 million daily rides across 10,000 cities. We look at the specific reason they abandoned the old pipeline — vendor lock-in with Google Maps and a 40 percent cost increase in a single quarter — and how they designed a modular routing engine called Michelangelo Maps. Lucas explains the architecture: a C++ kernel for shortest-path that runs in under 50 milliseconds, a tile-based geocoding layer that reduced queries by 80 percent, and a machine learning model that predicts travel time to within 5 percent of actual trip duration. Luna pushes back on whether rebuilding a core piece of infrastructure that touches every single ride was worth the three-year timeline and the hundreds of engineers it took. We also touch on the trade-off between cost savings and reliability during the 2020 ridership drop. No hot takes — just the concrete decisions Uber's technical leadership made and the numbers that justified them. #Uber #Maps #RoutingEngine #MichelangeloMaps #CPlusPlus #Geocoding #MachineLearning #Architecture #Scaling #Infrastructure #CTOPodcast #Fexingo #BusinessAndTechnology #Engineering #TechLeadership #TravelTimePrediction #FexingoBusiness #BusinessPodcast Keep every episode free: buymeacoffee.com/fexingo

  31. 18

    How Spotify Migrated to Google Cloud Without Breaking Discovery Weekly

    In 2016, Spotify announced it was moving its entire infrastructure from its own data centers to Google Cloud Platform. The migration took four years and involved moving over 1,200 services, petabytes of data, and the machine learning pipelines powering Discover Weekly — all while keeping the music streaming without audible interruption. Lucas and Luna break down how Spotify's engineering team pulled off one of the largest cloud migrations in tech history, the architectural decisions that made it possible, and the lessons for any organization facing a big infrastructure move. Featuring the surprising role of a custom tool called 'Sisyphus' and why Spotify chose to keep its own storage layer running on top of Google's network. #Spotify #GoogleCloud #CloudMigration #Infrastructure #MusicStreaming #DiscoverWeekly #MLPipelines #Sisyphus #DataCenters #Engineering #Architecture #Scalability #Business #Technology #FexingoBusiness #BusinessPodcast #TechLeadership #CTO Keep every episode free: buymeacoffee.com/fexingo

  32. 17

    How Stripe Uses Idempotency Keys to Prevent Double Charges

    Stripe processes billions of dollars in payments every year. One double charge could destroy trust. In this episode, Lucas and Luna break down how Stripe uses idempotency keys — a simple but brilliant engineering pattern — to guarantee that even if a network request is retried dozens of times, the customer is charged exactly once. They walk through a real-world example: a customer hitting 'Place Order' twice during a card decline, the first attempt succeeds, and the second attempt should not create a duplicate charge. Lucas explains the idempotency key lifecycle: generation, storage in Redis, TTL, and response replay. He contrasts Stripe's approach with a naive dedup table and explains why idempotency is a design philosophy that ripples through error handling, database transactions, and API contracts. Luna pushes on edge cases: what if Redis goes down? What about race conditions between write and read? Lucas covers the safety nets — conditional writes, single-node Redis with replication, and the trade-off between performance and consistency. The episode closes with practical advice for any engineer building payment or booking systems: start with idempotency from day one. #Stripe #IdempotencyKeys #PaymentInfrastructure #EngineeringPatterns #API #Redis #ErrorHandling #Idempotency #DistributedSystems #Fintech #SoftwareEngineering #TechLeadership #Business #Technology #FexingoBusiness #BusinessPodcast #TheCTOPodcast #DoubleCharge Keep every episode free: buymeacoffee.com/fexingo

  33. 16

    How Pixar Rebuilt Its Render Farm for Real-Time

    In this episode, we dive into how Pixar Engineering rebuilt their legendary render farm architecture to support hybrid real-time workflows without sacrificing the fidelity that made 'Soul' and 'The Incredibles 2' possible. Hosts Lucas and Luna unpack the tradeoffs between batch rendering and real-time ray tracing, the shift to a unified storage fabric, and how Pixar's internal tool RenderMan co-evolved with Disney's streaming push. We discuss the specific challenge of maintaining deterministic results across heterogeneous GPU clusters and how the team used a scene-graph abstraction to decouple authoring from rendering. A concrete look at the infrastructure behind animated movies and why it matters for any engineering org managing legacy systems under new constraints. #Pixar #RenderFarm #RealTimeRendering #RenderMan #Disney #AnimationTech #BusinessAndTechnology #CTOPodcast #FexingoBusiness #BusinessPodcast #EngineeringOrg #Architecture #GPUClusters #SceneGraph #StorageFabric #DeterministicRendering #LegacyModernization #HybridWorkflows Keep every episode free: buymeacoffee.com/fexingo

  34. 15

    How Stack Overflow Survived ChatGPT's First Year

    When ChatGPT launched in late 2022, many predicted Stack Overflow was dead. Traffic dropped 14 percent quarter-over-quarter in early 2023 as developers copied AI-generated code instead of browsing answers. By mid-2024, the site had stabilized and even recovered some traffic. In this episode, Lucas and Luna unpack Stack Overflow's survival playbook: why the moderation layer gave it staying power, how they launched OverflowAI without alienating their core community, and what the traffic data says about developer trust in AI-generated answers versus human-vetted ones. They also discuss a concrete lesson for any platform facing generative AI disruption—namely, that curation becomes more valuable, not less, when answers are easier to generate. The episode includes a short listener-support segment near the end. #StackOverflow #ChatGPT #OverflowAI #DeveloperCommunity #AIDisruption #Moderation #Curation #GenAI #DeveloperTrust #QAPlatform #TrafficDecline #ProductStrategy #CommunityManagement #BusinessAndTechnology #TechnicalLeadership #FexingoBusiness #BusinessPodcast #EngineeringCulture Keep every episode free: buymeacoffee.com/fexingo

  35. 14

    How Netflix Rebuilt Its Encoding Pipeline for Bandwidth Savings

    Lucas and Luna dive into how Netflix re-engineered its video encoding pipeline to shave bandwidth usage without sacrificing quality. They explore the technical trade-offs between constant bitrate and variable bitrate encoding, the role of per-title encoding optimization, and how the streaming giant uses machine learning to dynamically encode every frame. The episode also touches on why this matters for mobile users and emerging markets. Listeners learn a concrete example of how a real-world engineering team turned a bandwidth problem into a competitive advantage. #Netflix #VideoEncoding #StreamingTechnology #BandwidthOptimization #MachineLearning #PerTitleEncoding #CBRvsVBR #EngineeringOrg #TechnicalLeadership #Architecture #CTOPodcast #FexingoBusiness #BusinessPodcast #TechInfrastructure #CodecOptimization #StreamingQuality #MobileStreaming #EmergingMarkets Keep every episode free: buymeacoffee.com/fexingo

  36. 13

    How Stripe Uses Idempotency Keys to Prevent Double Charges

    In this episode, Lucas and Luna dive into one of the most elegant patterns in distributed systems: the idempotency key. Using Stripe's payment API as the central case, they explain how a single HTTP header prevents duplicate charges during network retries, how Shopify applies the same pattern to order creation, and why idempotency is a fundamental principle for any system that deals with money, inventory, or state changes. The discussion covers the mechanics of idempotency keys, their role in exactly-once semantics, and practical trade-offs like key expiration and storage. Listeners will walk away understanding a concrete tool to make their own APIs safer. #Idempotency #Stripe #PaymentAPI #DistributedSystems #APIDesign #ExactlyOnce #HTTP #Shopify #RetryLogic #SystemsDesign #FaultTolerance #Engineering #Backend #TechnicalLeadership #FexingoBusiness #BusinessPodcast #CTOPodcast #Architecture Keep every episode free: buymeacoffee.com/fexingo

  37. 12

    How Monzo Keeps Its Banking App Running Like a Startup

    In this episode, we dive into how Monzo, the UK digital bank, maintains a startup-like engineering velocity while managing millions of transactions daily. We explore their use of event sourcing and the CQRS pattern to decouple read and write workloads, and how they keep their core banking ledger simple despite scaling to over 9 million customers. Lucas breaks down Monzo's approach to feature flags and gradual rollouts—treating every deployment as an experiment. Luna chimes in with her own experience from a fintech that tried a similar architecture and hit unexpected pain points. We also touch on how Monzo's engineering team stays lean by focusing on a small set of well-understood primitives. If you're building or running a platform where correctness and speed both matter, this episode offers a concrete case study in trading complexity for control. #Monzo #Banking #Fintech #EventSourcing #CQRS #EngineeringVelocity #FeatureFlags #Deployments #Microservices #StartupCulture #LeanEngineering #DigitalBank #UKFintech #Scalability #BusinessAndTechnology #FexingoBusiness #BusinessPodcast #CTOPodcast Keep every episode free: buymeacoffee.com/fexingo

  38. 11

    How Palantir Migrated to the Cloud Without Losing Security Clearance

    Palantir runs critical infrastructure for the US military and intelligence community. In 2020, the company began migrating its entire stack from on-premise government data centers to Amazon Web Services while maintaining top-secret security accreditation. This episode breaks down the technical architecture that made the move possible: how Palantir built a 'cloud bridge' that let legacy and cloud environments run in parallel, the zero-trust networking layer that replaced traditional VPNs, and the compliance automation that turned six-month audits into continuous monitoring. Lucas and Luna also discuss what the migration reveals about the future of defense tech procurement and why the Pentagon's Joint Warfighting Cloud Capability contract marked a turning point for Silicon Valley and Washington. #Palantir #CloudMigration #AWS #ZeroTrust #GovTech #DefenseTech #SecurityClearance #IL5 #FedRAMP #CloudBridge #Infrastructure #Compliance #Automation #Business #Technology #FexingoBusiness #BusinessPodcast #CTOPodcast Keep every episode free: buymeacoffee.com/fexingo

  39. 10

    How Google Rebuilt Its Search Index for AI

    When Google launched its AI-powered search overviews in 2024, it quietly replaced the core indexing pipeline that had powered search for two decades. Lucas and Luna dig into the engineering decisions behind that rewrite: why Google moved from a document-sorting system to a semantic embedding pipeline, how it retrained its ranking models without breaking traditional web search, and what the shift means for every engineer building with large language models. They trace the specific architecture — from the Incremental Document Index through the dual-rank retrieval layer — and explore the trade-offs between latency and relevance that Google engineers had to navigate. A concrete look at how the world's most-used software product reinvented its own foundation without anyone noticing. #Google #Search #AI #Engineering #Architecture #Indexing #LLM #Embeddings #Retrieval #SemanticSearch #Ranking #Infrastructure #Scalability #MachineLearning #Technology #Business #FexingoBusiness #BusinessPodcast Keep every episode free: buymeacoffee.com/fexingo

  40. 9

    How Airbnb Rebuilt Its Search for Instant Bookings

    In this episode of The CTO Podcast, Lucas and Luna dive into how Airbnb rewrote its search backend to handle instant bookings at global scale. They explore the specific engineering challenge: reducing search latency from 2.3 seconds to under 200 milliseconds while adding real-time pricing and availability filters. The conversation covers the architectural shift from a monolithic Rails app to a bespoke C++ search service using Apache Lucene, and the trade-offs between index freshness and query speed. Lucas explains why Airbnb chose not to use Elasticsearch, how they handled the 'cold start' problem for new listings, and the surprising role of machine learning in reranking results. Luna pushes back on the cost of custom infrastructure versus managed services. Perfect for anyone building real-time search or scaling a marketplace platform. #Airbnb #SearchArchitecture #RealTimeSearch #ApacheLucene #CPlusPlus #Microservices #LatencyOptimization #MarketplaceTech #EngineeringTradeoffs #MachineLearning #IndexFreshness #SearchReranking #BusinessAndTechnology #CTOPodcast #FexingoBusiness #BusinessPodcast #TechnicalLeadership #EngineeringOrg Keep every episode free: buymeacoffee.com/fexingo

  41. 8

    How Datadog Monitors Its Own Monolith at Scale

    Episode 20 of The CTO Podcast dives into a paradox: how does Datadog, the company that sells observability software, actually monitor its own massive monolith? Lucas and Luna walk through the architecture behind Datadog's internal dogfooding strategy — a single codebase that handles millions of metrics per second. They explore the tradeoffs of keeping a monolith versus microservices, how the engineering team built an internal tool called 'Watchtower' to catch regressions before they hit customers, and why Datadog's CTO decided against splitting the core observability pipeline into separate services. Along the way, they reveal a specific threshold: 1.2 million events per second per host, and how the team tracks it. A concrete look at how one company eats its own dog food at planetary scale. #Datadog #Observability #Monolith #EngineeringArchitecture #Dogfooding #Watchtower #Scalability #MetricsPipeline #CTO #TechnicalLeadership #BusinessAndTechnology #Fexingo #FexingoBusiness #BusinessPodcast #Podcast #SoftwareEngineering #Infrastructure #SRE Keep every episode free: buymeacoffee.com/fexingo

  42. 7

    How Notion Uses Atomic Blocks to Beat Document Chaos

    Notion has become the go-to tool for startups and engineering teams, but behind its clean UI is a deceptively complex data model. This episode breaks down how Notion's core abstraction—the atomic block—lets it handle everything from meeting notes to product wikis without collapsing into a mess of sync conflicts or schema drift. Lucas explains why Notion chose blocks over documents, how its real-time sync differs from Google Docs, and the engineering tradeoffs that keep the product fast even as users pile on nested databases. Luna presses on the hard part: how Notion manages block-level permissions and offline editing without breaking. If you've ever wondered why Notion feels like a database pretending to be a word processor—or why building a clone is harder than it looks—this episode gives you the architecture behind the magic. #Notion #AtomicBlocks #RealTimeSync #DataModel #CRDT #DocumentEditor #BlockBasedEditor #Architecture #EngineeringCulture #ProductivityTools #Database #OfflineEditing #Permissions #Scaling #StartupStack #BusinessAndTechnology #FexingoBusiness #BusinessPodcast Keep every episode free: buymeacoffee.com/fexingo

  43. 6

    How Shopify Handles Black Friday Without Breaking

    Lucas and Luna dig into the engineering behind Shopify's Black Friday infrastructure — specifically, how the platform absorbs 10,000 requests per second per store during peak traffic without cascading failures. They break down the shift from monolithic scaling to a 'cell-based' architecture where each merchant's data lives in an isolated shard, preventing one viral store from taking down the whole platform. Lucas explains the surprising bottleneck: not database queries, but TLS handshake overhead at the load balancer layer. Luna challenges whether this level of isolation creates operational complexity that offsets the reliability gains. They also touch on how Shopify's engineering team stress-tests with 'failure injection Fridays' and why the company chose to open-source parts of its sharding toolkit. The episode ends with a candid look at whether cell-based architecture is overkill for smaller platforms, and a quick nod to how listener support keeps the podcast ad-free. #Shopify #BlackFriday #Ecommerce #Scalability #CellBasedArchitecture #Sharding #Infrastructure #LoadBalancing #TLS #OpenSource #FailureInjection #EngineeringCulture #Business #Technology #FexingoBusiness #BusinessPodcast #CTOPodcast #TechnicalLeadership Keep every episode free: buymeacoffee.com/fexingo

  44. 5

    How Figma Uses Real-Time Sync Without Breaking Git

    Episode 17 of The CTO Podcast with Fexingo: Technical Leadership, Architecture, and Engineering Org. Lucas and Luna dive into how Figma's engineering team builds a real-time collaborative design tool that coexists with a Git-based versioning backend. They explore the architectural decisions behind the CRDT-based sync engine, the trade-offs of using a custom WebSocket layer over HTTP, and how Figma avoided the pitfalls of operational transform (OT) that plagued Google Docs. Specific focus on the 2022 incident where a conflict resolution bug caused a three-hour outage for 10% of users, and the subsequent redesign of the merge logic. The hosts also discuss how Figma's engineering culture prioritizes 'design-driven development' and why they chose to write their own Rust-based wasm module for performance-critical rendering. Tune in for a masterclass in reconciling real-time with deterministic history. #Figma #RealTimeSync #CRDT #WebSocket #Rust #Wasm #GitVersioning #ConflictResolution #EngineeringCulture #DesignDrivenDevelopment #SystemArchitecture #IncidentResponse #CollaborationTools #BusinessAndTechnology #FexingoBusiness #BusinessPodcast #TechLeadership #CTOPodcast Keep every episode free: buymeacoffee.com/fexingo

  45. 4

    How Palantir Builds Software for the US Military

    Lucas and Luna dive into how Palantir develops and deploys software for the US Department of Defense, focusing on its flagship Gotham platform. They explore the company's unique engineering culture—where every feature ships under a government deadline and every line of code faces audit. Specific cases include Palantir's work on Project Maven for the Army's Tactical Intelligence Targeting Access Node (TITAN) and how the company maintains FedRAMP authorization. The hosts discuss the tension between speed and compliance, how Palantir uses a software-defined data integration layer called Foundry for logistics, and what commercial CTOs can learn from building under constant adversarial pressure. #Palantir #DefenseTech #SoftwareEngineering #GovernmentContracts #ProjectMaven #USMilitary #FedRAMP #GothamPlatform #Foundry #DataIntegration #SecureDeployment #DevSecOps #Compliance #TechAndWarfare #Business #Technology #FexingoBusiness #BusinessPodcast Keep every episode free: buymeacoffee.com/fexingo

  46. 3

    How Cloudflare Rebuilt Its Network for Zero Trust

    Lucas and Luna unpack how Cloudflare pivoted its entire infrastructure from a content-delivery network into a zero-trust security platform — a migration that touched every router, every data center, and every product line. They walk through the technical decision to build a new network layer (Argo Smart Routing) and the organizational challenge of running old and new stacks in parallel for years. They also explore the trade-offs Cloudflare made around latency vs. security, and why CEO Matthew Prince decided to offer the core zero-trust product for free to small businesses. A case study in platform rewrites that change the company's revenue model. #Cloudflare #ZeroTrust #NetworkArchitecture #InfrastructureMigration #ArgoSmartRouting #MatthewPrince #Security #EdgeComputing #CDN #PlatformRewrite #Latency #TechStrategy #EngineeringOrg #BusinessAndTechnology #FexingoBusiness #BusinessPodcast #TheCTOPodcast #TechnicalLeadership Keep every episode free: buymeacoffee.com/fexingo

  47. 2

    How Canva Migrated Millions of Users Without Downtime

    In this episode, Lucas and Luna dive into Canva's massive infrastructure migration that moved millions of daily active users to a new service architecture without noticeable downtime. They explore the technical decisions behind Canva's gradual migration strategy, the challenges of maintaining backward compatibility, and the lessons for engineering leaders facing similar scale transitions. With specific numbers on user growth and migration timelines, this episode offers a concrete case study in managing complexity at scale. #Canva #InfrastructureMigration #TechnicalLeadership #EngineeringOrg #Scalability #Microservices #BackwardCompatibility #ZeroDowntime #CloudInfrastructure #Architecture #CTO #EngineeringLeadership #BusinessAndTechnology #FexingoBusiness #BusinessPodcast #TechPodcast #PlatformEngineering #MigrationStrategy Keep every episode free: buymeacoffee.com/fexingo

  48. 1

    How Nubank Keeps Its Banking Platform Running

    Episode 13 of The CTO Podcast explores how Nubank, the Brazilian digital bank with over 100 million customers, maintains 99.99% uptime on its core banking platform. Lucas and Luna break down the specific architectural decisions that power Nubank's resilience: from its custom-built transaction engine to its chaos engineering culture. They discuss how the company reduced mean-time-to-recovery from hours to minutes, and why Nubank treats its database schema changes like code deploys. A rare look inside one of the most operationally rigorous fintech stacks in the world. #Nubank #Fintech #ChaosEngineering #BankingPlatform #Uptime #Microservices #DatabaseSchema #MTTR #Resilience #Brazil #EngineeringCulture #BusinessAndTechnology #FexingoBusiness #BusinessPodcast #CTOPodcast #SystemArchitecture #TransactionEngine #IncidentResponse Keep every episode free: buymeacoffee.com/fexingo

  49. 0

    How Basecamp Runs Engineering With Two Designers

    Lucas and Luna dissect Basecamp engineering org — a 40-person company with just 2 designers and a founder who writes code. They explore how David Heinemeier Hansson structures product teams, why the company enforces a 40-hour work week, and the specific asynchronous communication norms that replace standup meetings. The hosts walk through the company's 'one product team per feature' model, its deliberate lack of a QA department, and the surprising statistic that Basecamp's engineering-to-designer ratio is 10-to-1. They debate whether this approach scales beyond small teams and what lessons it holds for CTOs at larger orgs. #Basecamp #DavidHeinemeierHansson #SmallTeams #EngineeringOrg #ProductTeams #AsyncWork #QALess #TwoDesigners #Rails #RubyOnRails #FlatStructure #CTO #TechLeadership #BusinessAndTechnology #FexingoBusiness #BusinessPodcast #EngineeringCulture #WorkLifeBalance Keep every episode free: buymeacoffee.com/fexingo

  50. -1

    How Slack Rebuilt Its Search Infrastructure for Scale

    Slack's search once ground to a crawl as workspaces grew. In 2024, its engineering team rewrote the search layer from a custom Lucene-backed index to a real-time Elasticsearch pipeline spanning 17 data regions. Hosts Lucas and Luna walk through the architecture: why they migrated from DocIds to opaque document keys, how they handled reindexing 2PB of data without downtime, and the tradeoff between query freshness and index latency. This episode gets into the concrete decisions—like why they chose a 90-second refresh window and how they used a dual-read pattern during cutover. If you're building anything that needs to scale from 10 users to 10 million, this one's for you. #Slack #SearchInfrastructure #Elasticsearch #Lucene #DataMigration #Indexing #RealTimeSearch #EngineeringLeadership #TechnicalDebt #SystemDesign #Scalability #Infrastructure #BusinessAndTechnology #CTOPodcast #FexingoBusiness #BusinessPodcast #EngineeringOrg #Architecture Keep every episode free: buymeacoffee.com/fexingo

Type above to search every episode's transcript for a word or phrase. Matches are scoped to this podcast.

Searching…

We're indexing this podcast's transcripts for the first time — this can take a minute or two. We'll show results as soon as they're ready.

No matches for "" in this podcast's transcripts.

Showing of matches

No topics indexed yet for this podcast.

Loading reviews...

ABOUT THIS SHOW

Lucas and Luna sit down in front of a whiteboard to dissect the decisions that shape technical organizations. Each episode of The CTO Podcast with Fexingo examines a specific engineering leadership challenge — from scaling a microservices architecture without creating a distributed monolith, to managing the cognitive load of a 200-engineer org, to choosing between a monorepo and polyrepo strategy based on team topology. The conversations are grounded in real-world cases: how Etsy restructured its data pipeline after a 2019 outage, why Stripe’s API versioning policy reduces breaking changes, or what Basecamp’s choice of SQLite over PostgreSQL says about product philosophy. Lucas brings the journalistic rigor — citing commit histories, RFCs, and postmortems — while Luna pushes back with the pragmatics of org dynamics, hiring constraints, and technical debt. There are no hot takes, no vendor pitches, no ‘best practices’ without trade-offs. Each episode ends with a specific tension left un

HOSTED BY

Fexingo

CATEGORIES

Frequently Asked Questions

How many episodes does The CTO Podcast with Fexingo: Technical Leadership, Architecture, and Engineering Org have?

The CTO Podcast with Fexingo: Technical Leadership, Architecture, and Engineering Org currently has 50 episodes available on PodParley. New episodes are automatically indexed when they're published to the podcast feed.

What is The CTO Podcast with Fexingo: Technical Leadership, Architecture, and Engineering Org about?

Lucas and Luna sit down in front of a whiteboard to dissect the decisions that shape technical organizations. Each episode of The CTO Podcast with Fexingo examines a specific engineering leadership challenge — from scaling a microservices architecture without creating a distributed monolith, to...

How often does The CTO Podcast with Fexingo: Technical Leadership, Architecture, and Engineering Org release new episodes?

The CTO Podcast with Fexingo: Technical Leadership, Architecture, and Engineering Org has 50 episodes. Check the episode list to see recent publication dates and frequency.

Where can I listen to The CTO Podcast with Fexingo: Technical Leadership, Architecture, and Engineering Org?

You can listen to The CTO Podcast with Fexingo: Technical Leadership, Architecture, and Engineering Org on PodParley by clicking any episode. We provide an embedded audio player for direct listening, and you can also subscribe via your preferred podcast app using the RSS feed.

Who hosts The CTO Podcast with Fexingo: Technical Leadership, Architecture, and Engineering Org?

The CTO Podcast with Fexingo: Technical Leadership, Architecture, and Engineering Org is created and hosted by Fexingo.
URL copied to clipboard!