PODCAST · technology
Adventures in DevOps
by Will Button, Warren Parad
Join us in listening to the experienced experts discuss cutting edge challenges in the world of DevOps. From applying the mindset at your company, to career growth and leadership challenges within engineering teams, and avoiding the common antipatterns. Every episode you'll meet a new industry veteran guest with their own unique story.
-
278
What If Tools Are Not Expensive To Build
Share Episode Developers spend more than 50% of their time reading code, making it the single largest expense in software engineering. Despite this massive cost, the industry rarely discusses or optimizes how we read code. So we've brought in Tudor Girba, CEO at Feenk to help us rethink, just how software engineering should be done. Instead of relying on manual reading and generic text editors, teams must shift toward building deterministic, contextual tools to directly extract information and answer questions about their systems. The suggested solution? Contextual and composable micro-tools writen by everyone focused on exposing just the right information at the right time. This creates the opportunity for structural interrogation of your solution. And how many tools should we? We'll if one example of tool is testing, and 50% or more of your code can be tests, imagine what percentage of your software should be actually production related! Most importantly, generic tools fall short, but where can we find how to build the right tools, listen in to find out.... 💡 Notable Links: ✨ Episode: IDE & Copilot & Critical ThinkingBook: Moldable software developmentWardley MapGuest Request: Formal Verification🎯 Picks: Warren - The real stuff: Underwood Ranches SrirachaTudor - The beaches of Normandy
-
277
DR: Staying resilient in the cloud
Share Episode Welcome back to another hopefully, relief from architectural existential dread. This week, we've pulled in Seth Eliot from Arpio, (Ar-Pi-O, RPO, get it?), to dive headfirst into the beautiful, deeply expensive illusion that migrating your legacy infrastructure to a major hyperscaler magically grants it instant immortality. It doesn't. We break down the shared responsibility model for resilience, which was conveniently cribbed straight from the security model, and analyze how the foundational promise of automated fault isolation boundaries routinely crumbles. From cloud providers sticking multiple "independent" availability zones inside the exact same physical building, to multi-AZ cascading anomalies, to regional power grid failures, it's clear your provider's abstractions aren't nearly as resilient as their marketing slides suggest. Discussed within is the "Thundering Herd" phenomenon, that can't be ignored even when the failover clusters are designed correctly. From cross-organization KMS re-encryption loops to the horror of fragmented application logs across CloudFront edge regions, at the end of the day, true resilience isn't achieved by forcing your engineering team to implement features, it's about architecting your baseline, confidentiality for the inevitability of production burning to the ground. 💡 Notable Links: ✨ Episode: Eat your security vegetables✨ Episode: Matt vibecodes✨ Episode: on DNS and isolation🎯 Picks: Warren - Book: Moldable software developmentSeth - Lockpick set
-
276
Eat your security vegetables
Share Episode This week's adventure tackles the absolute absurdity of modern enterprise infrastructure, where a single company can easily find itself running multiple different CI/CD platforms due to unchecked mergers and acquisitions. We've brought in Chris Farris, AWS Security Hero and consults with companies via Securosis. And dig deep to find the security cracks and philosophize about the real world impacts of tech debt in the AI age. Management rarely prioritizes standardization, leaving security teams to defend a chaotic swamp of mixed cloud providers, GitHub repositories, and nostalgic on-prem Bitbucket instances. We define this accumulated technical debt not as some abstract concept, but as literal potholes on the infrastructure Autobahn—annoying speed bumps that permanently damage velocity and set organizations up for an inevitable disaster. We contrast this with the evolution from old-school sysadmins cutting their fingers on rack screws to modern engineers spinning up entire architectures with a few lines of code, noting that the ease of deployment has far outpaced our willingness to clean up our own mess. The crisis is only accelerating now that the cost of writing code (but not having to maintain it) is rapidly approaching zero. While letting an AI agent autonomously build a website or manipulate an AWS sandbox over a single Saturday afternoon sounds magical, it creates a terrifying volume of unreviewed, context-devoid software. Compounding this systemic frailty, massive cloud provider layoffs mean the crucial institutional memory and human operational experience required to survive are walking right out the door. We expose the fundamental flaw of modern agentic tooling: they completely lack fine-grained access control, operating on a dangerous all-or-nothing identity model. Until autonomous agents are engineered with actual conscience, consequence, and common sense, security teams will continue fighting a losing battle against a digital supply chain. 💡 Notable Links: Chris' Article on AI Tech DebtBreaking Open Source: Malus - ArticleVercel Security Incident✨ Episode: 🎯 Picks: Warren - Rick & Morty S02 + S03Chris - Risky Business: The latest actually good cybersecurity news
-
275
Automatic Data Pipelining: One More Turtle Ahead
Share Episode We grabbed Donald Nguyen, co-founder and CTO at Corvic, to discuss the absurd complexities of enterprise data and multimodal inference. We explore how organizations habitually hoard mountains of useless, "dead" data just out of the sheer fantascy that someone might ask for it later. We highlight the fundamental disconnect where data collectors using tools like Airbyte and Kafka speak a completely different language than the business consumers analyzing it in Excel. True scale isn't just about managing petabytes; it's the absolute nightmare of extracting subjective business meaning from flat PDFs and invoices. In the deep-end of vector embeddings, we're challenging translating data into a different semantic universe requires imposing a heavy business bias. Auditors and artists will view the exact same invoice completely differently, meaning your embedding model selection is incredibly subjective to the business context. The industry's desperate search for actual AI success stories beyond basic workflow automation is still ongoing as we laugh—and cry—at the reality that companies are likely budgeting 50% of an engineer's salary for LLM token usage, effectively enabling product managers to burn cash on infinite loops to generate prototype code. Reasonable or unreasonable? And lastly, we tackle the existential dread of securing autonomous AI agents. Because fine-grained access control for agent actions is basically an unsolved fantasy, we must treat their execution environments as entirely untrusted, relying on rigid sandboxes like AWS Firecracker VMs. Prompt injection attacks are an inevitable flaw of the transformer architecture, and the industry's best defense mechanism seems to be wrapping models inside of other models to validate the outputs. It is quite literally turtles all the way down, and the winner of enterprise security is simply the organization that manages to put one more turtle ahead of the attackers. 💡 Notable Links: Kuuk Thaayorre Aboriginal Tribe - Cardinal Directions✨ Episode: Generating automatic integrations at scale🎯 Picks: Warren - Dr. NEMO: Clockwise circle pitDonald - Book: InvestiGators
-
274
The Human Value Versus AI Legacy Code
Share Episode Down to business with GitHub's Cassidy Williams, Senior Director of Developer Advocacy at GitHub, where we try to untangle the existential dread of modern software development. It includes the sheer absurdity of managing a platform that officially crossed the one billion commit mark in 2025. Currently absorbing a completely unreasonable 275 million commits per week, GitHub's technical debt is naturally showing its age under the weight of AI agents aggressively creating pull requests. And with company's own copilot advocating for more, we explore the daily reality of being the internet's punching bag during an outage, and how the "Tiny Wins" buy back developer affection by still shipping the critical features. Which of course is a small signal in the sea of the industry's collective identity crisis: vibe coding and the valley of AI-generated garbage. Discussed is one suggested solution of strongly typed languages which are skyrocketing in popularity because we desperately need rigid guardrails to babysit the hallucinated code our non-human agents are frantically pushing to production. Things have gotten so dire that we commiserate on missing the good old days of Stack Overflow, where instead of a chatbot agreeably telling you your terrible idea is great, a grumpy human engineer would just ruthlessly roast your architecture honestly. 💡 Notable Links: Cassidy's post on Typed LanguageFermat's Last TheoremCassidy's newsletterBook: 4-Hour Work Week✨ Episode: Typed Languages✨ Episode: Vibecoding✨ Episode: Productivity Isn't Real🎯 Picks: Warren - Book: The Light EatersCassidy - Obsidian Offline Wiki
-
273
Who needs a server?
Share Episode Founder of Bespinian and long-time cloud solutions architect, Lena Fuhrimann, sits down with us to clarify the widespread confusion around serverless architecture. We discuss how serverless is often incorrectly equated solely with Function as a Service (FaaS), when it actually represents a broader spectrum on the abstraction ladder—including managed AI inference, container platforms, and databases. Lena shares her early career traps of building a fragmented landscape of sixty "nano-services" and explains why starting with a well-architected monolith and progressively breaking out microservices based on distinct resource or lifecycle requirements is a much saner approach. Then we shift to drivers behind cloud migrations, emphasizing that the primary financial benefit of serverless isn't necessarily shrinking the monthly cloud provider bill, but rather optimizing your most expensive resource: engineering time. By offloading mundane infrastructure patching to the cloud provider, teams can focus entirely on delivering tangible business value to customers. But cost is still there too. We also explore the psychological challenges of adopting new paradigms, sharing a fascinating story of bridging the gap for a VM-loving engineer by introducing immutable infrastructure concepts through Packer and Ansible before fully transitioning them to containers. And of course we tackle the dreaded topic of "cold starts" and why complex workarounds—like building custom Lambda warmers to periodically call APIs—often defeat the core benefits of reduced total cost of ownership. 💡 Notable Links: BespinianBook: Drive — Motivation 3.0✨ Episode: Typed Languages, Haskell, and building monoliths🎯 Picks: Warren - Better thank coffee: Himmelstau teaLena - Home Assistant open source project and Awtrix Clocks
-
272
How to build a monolith the right way
Share Episode We sit down with Ian Duncan, senior staff engineer on the stability team at Mercury, to discuss the delicate balance of choosing your tech stack and the implications. That means explore the concept of the novelty budget or frequently known as "Choose Boring Technology". It emphasizes why companies should carefully spend their innovation tokens on things that actually move the needle, rather than reinventing the wheel. Mercury leverages simple technology like Postgres and EC2 instances alongside high-innovation bets like Haskell and Nix to maintain stability. The conversation unpacks the hidden complexities of over-relying on standard tools, sharing a cautionary tale about using a Postgres table as a massive queuing system until it consumed all the database resources and caused login failures. To solve architectural scaling without descending into nanoservice madness, we jump to discussing monolithic build systems. By leveraging hermetically sealed, modular build targets, teams can achieve massive parallelism and avoid endless local rebuilds while maintaining a single coherent view of the codebase. We also advocate for separating management tools from primary systems by utilizing dedicated control planes, and touch on the rising popularity of durable execution frameworks like Temporal to handle resilient workflows. And it turns out Ian might be a bigger advocate of microservices that he thought! 💡 Notable Links: Ian's blogBook: Blah Blah BlahUsing Innovation TokensNovelty budgetBuck2🎯 Picks: Warren - Why Archers Didn’t Volley FireIan - Band - Gloryhammer
-
271
Infrastructure as code: why you can never avoid thinking
Share Episode We explore the past and AI-driven future of Infrastructure as Code with Cloud Posse's Erik Osterman, discussing various IaC traumas. Erik maintains the world's largest repository of open-source IaC modules. Looking back at the dark ages of infrastructure, from the early days of raw CloudFormation and Capistrano to the rise and fall of tools like Puppet and Chef, we discuss the organic, messy growth of cloud environments. Where organizations frequently scale a single AWS account into a tangled web rather than adopting a robust multi-account architecture guided by a proper framework. The conversation then shifts to the modern era of rapid integration of infrastructure development. While generating IaC with large language models can be incredibly fast, it introduces severe risks if left unchecked, and we explore how organizations can protect themselves by relying on Architectural Decision Records (ADRs) and predefined "skills". The hopeful goal of ensuring autonomous deployments are compliant, reproducible, and secure instead of relying on hallucinated architecture. Finally, we tackle the compounding issue of code review in an age where developers can produce a year's worth of engineering slop progress in a single week. 💡 Notable Links: Atmos frameworkCheckov - IaC ValidationCode Rabbit✨ Episode: Agent Skills✨ Episode: All about MCPs🎯 Picks: Warren - Project Hail MaryErik - Everybody's free to wear sunscreen & Book: The 10X Rule
-
270
GPU versus CPU: What is engineering really doing for us
Share Episode We sit down with Jaikumar Ganesh, Head of Engineering at AnyScale, to explore the intricacies of heterogeneous compute. He unpacks the growing CPU/GPU divide, detailing how ML pipelines require precise orchestration — using CPUs for data reading and writing while leveraging expensive, massive-die GPUs for chunking and embedding. Warren brings the insight that, with AI agents rapidly changing how software is created, building is now a requirement of the business-focused team. And our guest shares how sales and marketing departments are increasingly using tools like Cursor and Claude to develop their own workflow automations. We discuss the challenges that this shift begs: what is engineering really doing for us? JK emphasizes that the core responsibility of the engineering organization is reliability. While anyone can generate code, running stable production software requires the deep "battle scars", robust observability, and meticulous release processes that only a dedicated engineering team can provide. That results in needing to find the right talent. But, finding the talent to maintain this critical infrastructure isn't easy, which is why JK advocates for highly creative hiring strategies. He shares incredible success stories of bypassing traditional recruiting by running hiring ads in foreign-language movies at local movie theaters and setting up booths at social food festivals to find uniquely qualified candidates. 🎯 Picks: Warren - Archer's Don't Fire VolleysJK - Book: The Explorer's Gene
-
269
Upskilling your agents
Share Episode In this adventure, we sit down with Dan Wahlin, Principal of DevRel for JavaScript, AI, and Cloud at Microsoft, to explore the complexities of modern infrastructure. We examine how cloud platforms like Azure function as "building blocks". Which of course, can quickly become overwhelming without the right instruction manuals. To bridge this gap, one potential solution we discuss is the emerging reliance on AI "skills"—specialized markdown files. They can give coding agents the exact knowledge needed to deploy poorly documented complex open-source projects to container apps without requiring deep infrastructure expertise. And we are saying the silent part outloud, as we review how handing the keys over to autonomous agents introduces terrifying new attack vectors. It's the security nightmare of prompt injections and the careless execution of unvetted AI skills. Which is a blast from the past, and we reminisce how current downloading of random agent instructions to running untrusted executables from early internet sites. While tools like OpenClaw purport to offer incredible automation, such as allowing agents to scour the internet and execute code without human oversight, it's already led us to disastrous leaks of API keys. We emphasize the critical necessity of validating skills through trusted repositories where even having agents perform security reviews on the code before execution is not enough. Finally, we tackle the philosophical debate around AI productivity and why Dorota's LLMs raise the floor and not the ceiling is so spot on. The standout pick requires mentioning, a fascinating 1983 paper titled "Ironies of Automation" by Lisanne Bainbridge. This paper perfectly predicts our current dilemma: automating systems often leaves the most complex, difficult tasks to human operators, proving that as automation scales, the need for rigorous human monitoring actually increases, destroying the very value that was attempting to be captured by the original innovation. 💡 Notable Links: Agent Skill MarketplaceAI Fatigue is realEpisode: Does Productivity even exist?🎯 Picks: Warren - Paper: Ironies of Automation (& AI)Dan - Tool: SkillShare
-
268
There's no way it's DNS...
Share Episode How much do you really know about the protocol that everything is built upon? This week, we go behind the scenes with Simone Carletti, a 13-year industry veteran and CTO at DNSimple, to explore the hidden complexities of DNS. We attempt to uncover why exactly DNS is often the last place developers check during an outage, drawing fascinating parallels between modern web framework abstractions and network-level opaqueness. Simone shares why his team relies on bare-metal machines instead of cloud providers to run their Erlang-based authoritative name servers, highlighting the critical need to control BGP routing. We trade incredible war stories, from Facebook locking themselves out of their own data centers due to a BGP error, to a massive 2014 DDoS attack that left DNSimple unable to access their own log aggregation service. The conversation also tackles the reality of implementing new standards like SVCB and HTTPS records, and why widespread DNSSEC adoption might require an industry-wide mandate. And of course we have the picks, but I'm not spoiling this weeks, just yet... 💡 Notable Links: Episode: IPv6SVCB + HTTPS DNS Resource Records RFC 9460Avian Carrier RFC 1149🎯 Picks: Warren - Book: One Second AfterSimone - Recommended diving locations in Italy and Wreck diving projects
-
267
Getting better at networking
Share Episode We are joined by Daan Boerlage, CTO at Mavexa as we tackle the long-awaited arrival of IPv6 in cloud infrastructure. Here, we highlight how migrating to an IPv6-native setup eliminates public/private subnet complexity and expensive NAT gateways natively. As well as entirely sidestepping the nightmare of IP collisions during VPC peering. Beyond the financial savings of ditching IPv4 charges, we explore the technical superiority of IPv6. Daan breaks down just how mind-bogglingly large the address space is, and focuses on how it solves serverless IP exhaustion while systematically debunking the pervasive myth that NAT is a security feature. We also discuss how IPv6's end-to-end connectivity, paving the way for next-generation protocols like QUIC, HTTP/3, and WebTransport. The episode rounds out with a cathartic venting session about legacy architecture, detailing a grueling nine-year migration away from a central shared database that ironically culminated in a move to Salesforce. Almost by design, Daan recommends his pick, praising its intuitive use of signals and fine-grained reactivity over React. And Warren's pick explores storing data in the internet itself by leveraging the dwell time of ICMP ping packets. 💡 Notable Links: FOSDEM talk on the internet of threadsHilbert Map of IPv6 address space🎯 Picks: Warren - Harder Drive: what we didn't want or needDaan - SolidJS
-
266
Varied Designer Does Vibecoding: Why testing always wins
Share Episode In this episode, we examine how the software industry is fundamentally changing. We're joined by our expert guest, Matt Edmunds, a long-time UX director, principal designer, and Principal UX Consultant at Tiny Pixls. The episode kicks, analyzing how early AI implementation in Applicant Tracking Systems (ATS) created rigid hiring processes that actively filter out the varied candidates who actually bring necessary diversity to engineering teams. Of course we get to the world of "vibe coding", and revisit the poor LLM usage highlighted in the DORA 2025 report, exploring how professionals without traditional software engineering backgrounds are leveraging models to generate functional code. Matt details his hands-on experience using the latest models of Claude Opus and Gemini Pro, successfully building low-level C virtual audio driver in 30 minutes drive by personal needs. We discuss the inherent challenges of large context windows, and coin the term "guess-driven development". To combat these hallucinations, Matt shares his strategy of using question-based prompting and anchoring the AI with comprehensive test files and documented schemas, which the models treat as an undeniable source of truth. Beyond the code, we look at the broader economic and physical limitations of the current AI boom, noting that AI providers are operating at massive financial losses while awaiting hardware efficiency improvements. 💡 Notable Links: Oatmeal on hating AI ArtEpisode: DORA 2025 Report🎯 Picks: Warren - Book: Start With WhyMatt - Book: Creativity, Inc.
-
265
DevOps trifecta: documentation, reliability, and feature flags
Share Episode We dive into the shifting landscape of developer relations and the new necessity of optimizing documentation for both humans and LLMs. Melinda Fekete joins from Unleash, and suggests transitioning to platform to help get this right by utilizing LLMs.txt files to cleanly expose content to AI models. The conversation then takes a look at the June GCP outage, which was triggered by a single IAM policy change. This illustrates that even with world-class CI/CD pipelines, deploying code using runtime controls such as feature flags is still risky. Feature flags can't even save GCP and other cloud providers, so what hope do the rest of us have. Finally, we discuss the practical implementation of these systems, advocating for "boring technology" like polling over streaming to ensure reliability, and conducting internal "breakathons" to test features before a full rollout. 💡 Notable Links: Diátaxis - Who is article this for?Fern - Docs PlatformCloudFlare - Feature Flag causes outageAWS - Graceful degredationBuilding for 5 nines reliabilityEpisode: Latency is always more important than freshnessEpisode: DORA 2025 Report🎯 Picks: Warren - Show: Bosch - LA Detective proceduralMelinda - Wavelength - Party Game
-
264
The Productivity Delusion: Gizmos, Resentment Metrics, and the Art of Deleting Code
Share Episode Dorota, CEO of Authress, returns to apply the US Supreme Court’s definition of obscenity to a scandalous topic: Engineering Productivity. In a world obsessed with AI-driven efficiency, Dorota and Warren argue that software development productivity has nothing to do with manufacturing "gizmos" and everything to do with feelings. They dismantle the factory-floor mentality that equates typing speed with value, suggesting instead that the most productive work often happens while staring out a train window or disassociating in the shower. The conversation takes a dark turn into the reality of performance reviews. If productivity is subjective, how do you decide who gets promoted? Dorota proposes the "Resentment Metric"—ignoring Jira tickets in favor of figuring out who the team has secret concerns fo. They also roast the "100% utilization" fallacy, noting that a fully utilized highway is just a parking lot, and the same logic applies to engineering teams that don't schedule downtime for actual thinking. Ultimately, they land on a definition of productivity that would make any optimizer proud: deleting things. If the best code is no code, then the most productive engineer is the one removing waste, deleting replicas, and emptying S3 buckets. The episode wraps up with a credit-card-sized transformer (it's a tripod) and a book recommendation on why your international colleagues might be misinterpreting your silence. 💡 Notable Links: DevOps Episode: DORA 2025 ReportResearch: Happy software developers solve problems better🎯 Picks: Warren - Book: The Culture MapDorota - GEOMETRICAL Pocket tripod
-
263
Project Yellow Brick Road: Creative, Practical, and Unconventional Engineering
Share Episode ⸺ Episode Sponsor: Rootly AI - https://dev0ps.fyi/rootlyai Paul Conroy, CTO at Square1, joins the show to prove that the best defense against malicious bots isn't always a firewall—sometimes, it’s creative data poisoning. Paul recounts a legendary story from the Irish property market where a well-funded competitor attempted to solve their "chicken and egg" problem by scraping his company's listings. Instead of waiting years for lawyers, Paul’s team fed the scrapers "Project Yellow Brick Road": fake listings that placed the British Prime Minister at 10 Downing Street in Dublin and the White House in County Cork. The result? The competitor’s site went viral for all the wrong reasons, forcing them to burn resources manually filtering junk until they eventually gave up and targeted someone else. We also dive into the high-stakes world of election coverage, where Paul had three weeks to build a "coalition builder" tool for a national election. The solution wasn't a complex microservice architecture, but a humble Google Sheet wrapped in a Cloudflare Worker. Paul explains how they mitigated Google's rate limits and cold start times by putting a heavy cache in front of the sheet, leading to a crucial lesson in pragmatism: data that is "one minute stale" is perfectly acceptable if it saves the engineering team from building a complex invalidation strategy. Practically wins. Finally, the conversation turns to the one thing that causes more sleepless nights than malicious scrapers: caching layers. Paul and the host commiserate over the "turtles all the way down" nature of modern caching, where a single misconfiguration can lead to a news site accidentally attaching a marathon runner’s photo to a crime story. They wrap up with picks, including a history of cryptography that features the Pope breaking Spanish codes and a defense of North Face hiking boots that might just be "glamping" gear in disguise. 🎯 Picks: Warren - The North Face Hedgehog Gore-tex Hiking ShoesPaul - The Code Book
-
262
Special: The DORA 2025 Critical Review
Share Episode "Those memes are not going to make themselves." Dorota, CEO of Authress, joins us to roast the 2025 DORA Report, which she argues has replaced hard data with an AI-generated narrative. From the confusing disconnect between feeling productive and actually shipping code to the grim reality of a 30% acceptance rate, Warren and Dorota break down why this year's report smells a lot like manure. We dissect the massive 142-page 2025 DORA Report. Dorota argues that the report, which is now rebranded as the "State of AI-Assisted Software Development", feels less like a scientific study of DevOps performance and more like a narrative written by an intern using an LLM prompt. The duo investigates the "stubborn results" where AI apparently makes everyone feel like a 10x developer, where the hard results tell a different story. AI actually increases software and product instability — failing to improve. The conversation gets spicy as they debate the "pit of failure" that is feature flags (often used as a crutch for untested code) and the embarrassing reality that GitHub celebrates a mere 30% code acceptance rate as a "success." Dorota suggests that while AI raises the floor for average work, it completely fails when you need to solve complex problems or, you know, actually collaborate with another human being. In a vivid analogy, Dorota compares reading this year's report to the Swiss Spring phenomenon — the time of year when farmers spray manure, leaving the beautiful landscape smelling...unique. The episode wraps up with a reality check on the physical limits of LLM context windows (more tokens, more problems) and a strong recommendation to ignore the AI hype cycle in favor of a much faster-growing organism: a kitchen countertop oyster mushroom kit. 💡 Notable Links: AI as an amplifier truism fallacyDORA 2025 ReportDevOps Episode: VS Code & GitHub CopilotWhere is the deluge of new software - Impact of AI on software productsImpact of AI on Critical Thinking🎯 Picks: Warren - The Maximum Effective Context WindowDorota - Mushroom Grow Kit
-
261
Browser Native Auth and FedCM is finally here!
Share Episode ⸺ Episode Sponsor: Incident.io - https://dev0ps.fyi/incidentio "My biggest legacy at Google is the amount of systems I broke." — Sam Goto joins the show with a name that strikes fear into engineering systems everywhere. As a Senior Staff Engineer on the Chrome team, Sam shares the hilarious reality of having the last name "Goto," which once took down Google's internal URL shortener for four hours simply because he plugged in a new computer. Sam gets us up to speed with Federated Credentials Management (FedCM), as we dive deep into why authentication has been built despite the browser rather than with it, and why it’s time to move identity from "user-land" to "kernel-land". This shift allows for critical UX improvements for logging in all users irrespective of what login providers you use, finally addressing the "NASCAR flag" problem of infinite login lists. Most importantly, he shares why you don't need to change your technology stack to get all the benefits of FedCM. Finally, Sam details the "self-sustaining flame" strategy (as opposed to an ecosystem "flamethrower"), revealing how they utilized JavaScript SDKs to migrate massive platforms like Shopify and 50% of the web's login traffic without requiring application developers to rewrite their code. 💡 Notable Links: HSMs + TPM in production environmentsGet involved: FedCM W3C WGThe FedCM spec GitHub repoTPAC Browser Conference🎯 Picks: Warren - Book: The Platform RevolutionSam - The 7 Laws of Identity and Short Story: The Egg By Andy Weir
-
260
Are we building the right thing?
Share Episode ⸺ Episode Sponsor: Incident.io - https://dev0ps.fyi/incidentio Elise, VP and Head of UX at Unleash, joins us to talk all about UX. Self identifying as probably "The annoying lady in the room" and a career spanning nearly 30 years—starting before "UX" was even a job title — joins us to dismantle the idea that User Experience is just about moving pixels around. Here we debate the friction between engineering, sales, and the customer. We get to the bottom of whether or avoiding end-user interaction, understand, and research is a career-limiting move for staff+ engineers. Or should you avoid forcing a world-class developer to facilitate a call with a non-technical user if it makes them uncomfortable? Warren calls out the "Pit of Failure" often faced by teams as they seek to introduce feature flags. They can become a crutch, leading teams to push untested code into production simply because they can toggle it off—a scenario he calls the "pit of failure". And Elise dives into a great story recounting her consulting days where a company spent a fortune on a branding agency that demanded conflicting "primary colors" for a mainframe application used 8 hours a day. Her low-tech solution to prove them wrong? Listen and find out, this episode is all about bringing UX to Engineering. 💡 Notable Links: Ladder of Leadership - Book: Turn the Ship Around!🎯 Picks: Warren - Growth.Design Case StudiesElise - Paper on Generative UI: LLMs are Effective UI Generators
-
259
Why Your Code Dies in Six Months: Automated Refactoring
Share Episode ⸺ Episode Sponsor: Incident.io - https://dev0ps.fyi/incidentioWarren is joined by Olga Kundzich, Co-founder and CTO of Moderne, to discuss the reality of technical debt in modern software engineering. Olga reveals a shocking statistic: without maintenance, cloud-native applications often cease to function within just six months. And from our experience, that's actually optimistic. The rapid decay isn't always due to bad code choices, but rather the shifting sands of third-party dependencies, which make up 80 to 90% of cloud-native environments.We review the limitations of traditional Abstract Syntax Trees (ASTs) and the introduction of OpenRewrite's Lossless Semantic Trees (LSTs). Unlike standard tools, LSTs preserve formatting and style, allowing for automated, horizontal scaling of code maintenance across millions of lines of code. This fits perfectly in to the toolchain that is the LLMs and open source ecosystem. Olga explains how this technology enables enterprises to migrate frameworks—like moving from Spring Boot 1 to 2 — without dedicating entire years to manual updates.Finally, they explore the intersection of AI and code maintenance, noting that while LLMs are great at generating code, they often struggle with refactoring and optimizing existing codebases. We highlight that agents are not yet fully autonomous and will always require "right-sized" data to function effectively. Will is absent for this episode, leaving Warren to navigate the complexities of mass-scale code remediation solo.💡 Notable Links:DevOps Episode: We read codeDevOps Episode: Dynamic PRs from incidentsOpenRewriteLarger Context Windows are not better🎯 Picks:Warren - Dell XPS 13 9380Olga - Claude Code
-
258
AI, IDEs, Copilot & Critical Thinking
Share EpisodeMicrosoft's John Papa, Partner General Manager of Developer Relations for all things dev and code joins the show to talk developer relations...from his Mac. He reveals his small part in the birth of VS Code (back when its codename was Ticino) after he spent a year trying a new editor every month.The conversation dives deep into "Agentic AI," where John predicts developers will soon become "managers of agents". But is it all hype? John and Warren debate the risks of too much automation (no, AI should not auto-merge your PRs) and the terrifying story of a SaaS built with "zero handwritten code" that immediately got hacked because the founder was "not technical".The episode highlights John's jaw-dropping war stories from Disney, including a mission-critical hotel lock system (for 5,000+ rooms) that was running on a single MS Access database under a desk. It's a perfect, cringeworthy lesson in why "we don't have time to test" is the most expensive phrase in tech, and why we need a human in the loop. John leaves us with the one question we must ask of all new AI features: "Who asked for that?"💡 Notable Links:Impact of AI on Critical Thinking paperLLMs raise the floor not the ceilingDevOps Episode: How far along with AI are we?🎯 Picks:Warren - Shokz OpenFit 2John - Run Disney
-
257
Solving incidents with one-time ephemeral runbooks
Share Episode ⸺ Episode Sponsor: Attribute - https://dev0ps.fyi/attributeIn the wake of one of the worst AWS incidents in history, we're joined by Lawrence Jones, Founding Engineer at Incident.io. The conversation focuses on the challenges of managing incidents in highly regulated environments like FinTech, where the penalties for downtime are harsh and require a high level of rigor and discipline in the response process. Lawrence details the company's evolution, from running a monolithic Go binary on Heroku to moving to a more secure, robust setup in GCP, prioritizing the use of native security primitives like GCP Secret Manager and Kubernetes to meet the obligations of their growing customer base.We spotlight exactly how a system can crawl GitHub pull requests, Slack channels, telemetry data, and past incident post-mortems to dynamically generate an ephemeral runbook for the current incident.Also discussed are the technical challenges of using RAG (Retrieval-Augmented Generation), noting that they rely heavily on pre-processing data with tags and a service catalog rather than relying solely on less consistent vector embeddings to ensure fast, accurate search results during a crisis.Finally, Lawrence stresses that frontier models are no longer the limiting factor in building these complex systems; rather, success hinges on building structured, modular systems, and doing the hard work of defining objective metrics for improvement.💡 Notable Links:Cloud Secrets management at scaleEpisode: Solving Time Travel in RAG DatabasesEpisode: Does RAG Replace keyword search?🎯 Picks:Warren - Anker Adpatable Wall-Charger - PowerPort Atom IIILawrence - Rocktopus & The Checklist Manifesto
-
256
The IT Dictionary: Post-Mortems, Cargo Cults, and Dropped Databases
Share Episode ⸺ Episode Sponsor: Attribute - https://dev0ps.fyi/attributeWe're joined by 20 year industry veteran and DevOps advocate, Adam Korga, celebrating the release of his book IT Dictionary. In this episode we quickly get down to the inspiration behind postmortems as we review some cornerstone cases both in software and in general technology.Adam shares how he started in the industry, long before DevOps was a coined term, focused on making systems safer and avoiding mistakes like accidentally dropping a production database. we review the infamous incidents of accidental database deletion, by LLMs and human's alike.And of course we touch on the quintessential postmortems in civil engineering, flight, and survivorship bias from World War II through analyzing bullet holes on returning planes.💡 Notable Links:Adam's book: IT DictionaryKnight Capital: the 45 minute nightmareWork Chronicles Comic: Will my architecture work for 1 Million users?🎯 Picks:Warren - Cuitisan CANDL storage containersAdam - FUBAR
-
255
Vector Databases Explained: From E-commerce Search to Molecule Research
Share Episode ⸺ Episode Sponsor: Attribute - https://dev0ps.fyi/attributeJenna Pederson, Staff Developer Relations at Pinecone, joins us to close the loop on Vector Databases. Demystifies how they power semantic search, their role in RAG, and also unexpected applications.Jenna takes us beyond the buzzword bingo, explaining how vector databases are the secret sauce behind semantic search. Sharing just how "red shirt" gets converted into a query that returns things semantically similar. It's all about turning your data into high-dimensional numerical meaning, which, as Jenna clarifies, is powered by some seriously clever math to find those "closest neighbors."The conversation inevitably veers into Retrieval-Augmented Generation (RAG). Jenna reveals how databases are the unsung heroes giving LLMs real brains (and up-to-date info) when they're prone to hallucinating or just don't know your company's secrets. They complete the connection from proprietary and generalist foundational models to business relevant answers.💡 Notable Links:Episode: MCP: The Model Context Protocol and Agent InteractionsCrossing the Chasm🎯 Picks:Warren - HanCenDa USB C Magnetic adapterJenna - Keychron Alice Layout Mechanical keyboard (And get a 5% discount on us)
-
254
The Unspoken Challenges of Deploying to Customer Clouds
Share EpisodeThis episode we are joined by Andrew Moreland, co-founder of Chalk. Andrew explains how their company's core business model is to deploy their software directly into their customers' cloud environments. This decision was driven by the need to handle highly sensitive data, like PII and financial records, that customers don't want to hand over to a third-party startup. The conversation delves into the surprising and complex challenges of this approach, which include managing granular IAM permissions and dealing with hidden global policies that can block their application. Andrew and Warren also discuss the real-world network congestion issues that affect cross-cloud traffic, a problem they've encountered multiple times. Andrew shares Chalk's mature philosophy on software releases, where they prioritize backwards compatibility to prevent customer churn, which is a key learning from a competitor.Finally, the episode explores the advanced technical solutions Chalk has built, such as their unique approach to "bitemporal modeling" to prevent training bias in machine learning datasets. As well as, the decision to move from Python to C++ and Rust for performance, using a symbolic interpreter to execute customer code written in Python without a Python runtime. The episode concludes with picks, including a surprisingly popular hobby and a unique take on high-quality chocolate.💡 Notable Links:Fact - The $1M hidden Kubernetes spendGiraffe and Medical Ruler training data biasSOLID principles don't produce better code?Veritasium - The Hole at the Bottom of MathEpisode: Auth Showdown on backwards compatible changes🎯 Picks:Warren - Switzerland Grocery Store ChocolateAndrew - Trek E-Bikes
-
253
How to build in Observability at Petabyte Scale
Share EpisodeWe welcome guest Ang Li and dive into the immense challenge of observability at scale, where some customers are generating petabytes of data per day. Ang explains that instead of building a database from scratch—a decision he says went "against all the instincts" of a founding engineer—Observe chose to build its platform on top of Snowflake, leveraging its separation of compute and storage on EC2 and S3.The discussion delves into the technical stack and architectural decisions, including the use of Kafka to absorb large bursts of incoming customer data and smooth it out for Snowflake's batch-based engine. Ang notes this choice was also strategic for avoiding tight coupling with a single cloud provider like AWS Kinesis, which would hinder future multi-cloud deployments on GCP or Azure. The discussion also covers their unique pricing model, which avoids surprising customers with high bills by providing a lower cost for data ingestion and then using a usage-based model for queries. This is contrasted with Warren's experience with his company's user-based pricing, which can lead to negative customer experiences when limits are exceeded.The episode also explores Observe's "love-hate relationship" with Snowflake, as Observe's usage accounts for over 2% of Snowflake's compute, which has helped them discover a lot of bugs but also caused sleepless nights for Snowflake's on-call engineers. Ang discusses hedging their bets for the future by leveraging open data formats like Iceberg, which can be stored directly in customer S3 buckets to enable true data ownership and portability. The episode concludes with a deep dive into the security challenges of providing multi-account access to customer data using IAM trust policies, and a look at the personal picks from the hosts.💡 Notable Links:Fact - Passkeys: Phishing on Google's own domain and It isn't even newEpisode: All About OTELEpisode: Self Healing Systems🎯 Picks:Warren - The Shadow (1994 film)Ang - XREAL Pro AR Glasses
-
252
The Open-Source Product Leader Challenge: Navigating Community, Code, and Collaboration Chaos
Share Episode In a special solo flight, Warren welcomes Meagan Cojocar, General Manager at Pulumi and a self-proclaimed graduate of “PM school” at AWS. They dive into what it's like to own an entire product line and why giving up that startup hustle for the big leagues sometimes means you miss the direct signal from your users. The conversation goes deep on the paradox of open-source where direct feedback is gold, but dealing with license-shifting competitors can make you wary. From the notorious HashiCorp kerfuffle to the rise of OpenTofu, they explore how Pulumi maintains its commitment to the community amidst a wave of customer distrust. Meagan highlights the invaluable feedback loop provided by the community, allowing for direct interaction between users and the engineering team. This contrasts with the "telephone game" that can happen in proprietary product development. The conversation also addresses the recent industry shift and then immediate back-peddling from open-source licenses, discussing the subsequent customer distrust and how Pulumi maintains its commitment to the open-source model. And finally, the duo tackles the elephant in the cloud: LLMs, and extends on the earlier MCP episode. They debate the great code quality vs. speed trade-off, the risk of a "botched" infrastructure deployment, and whether these models can solve anything more than a glorified statistical guessing game. It's a candid look at the future of DevOps, where the real chaos isn't the code, but the tools that write it. The conversation concludes with a philosophical debate on the fundamental capabilities of LLMs, questioning whether they can truly solve "hard problems" or are merely powerful statistical next-word predictors. 💡 Notable Links: Veritasium - the Math that predicts everythingFact - Don't outsource your customer support: Clorox sues CognizantCloudFlare uses an LLM to generate an OAuth2 Library🎯 Picks: Warren - Rands Leadership CommunityMeagan - The Manager's Path by Camille Fournier
-
251
FinOps: Holding engineering teams accountable for spend
In this episode of Adventures in DevOps, we dive into the world of FinOps, a concept that aims to apply the DevOps mindset to financial accountability. Yasmin Rajabi, Chief Strategy Officer at CloudBolt, joins us to demystify, as we acknowledge the critical challenge of bringing together financial accountability and engineering teams who often are not paying attention to the business.The discussion further explores the practicalities of FinOps in the context of cloud spending and Kubernetes. Yasmin highlights that a significant amount of waste in organizations comes from simply not turning off unused systems and not right-sizing resources. She explains how tools like Horizontal Pod Autoscaler (HPA) and Vertical Pod Autoscaler (VPA) can help, but also points out the complexities of optimizing across horizontal and vertical scaling behaviors. The conversation touches on "shame back reporting" as a way to provide visibility into costs for engineering teams, although the conversation emphasizes that providing tooling and insights is more effective than simply telling developers to change configurations.The episode also delves into the evolving mindset around cloud costs, especially with the rise of AI and machine learning workloads. While historically engineering salaries eclipsed cloud spending, the increasing hardware requirements for ML and data workloads are making cost optimization a more pressing concern. Spending-conscious teams are increasingly asking about GPU optimization, even if AI/ML teams are still largely focused on limitless spending to drive unjustified "innovation". The conclude by discussing the challenges of on-premise versus cloud deployments and the importance of addressing "day two problems" regardless of the infrastructure choice.PicksWarren - Lions and Dolphins cannot make babiesAimee - The Equip Protein Powder and Protein BarYasmin - Bone Broth drink by 1990 Snacks
-
250
The Auth Showdown: Single tenant versus Multitenant Architectures
Get ready for a lively debate on this episode of Adventures in DevOps. We're joined by Brian Pontarelli, founder of FusionAuth and CleanSpeak. Warren and Brian face off by diving into the controversial topic of multitenant versus single-tenant architecture. Expert co-host Aimee Knight joins to moderate the discussion. Ever wondered how someone becomes an "auth expert"? Warren spills the beans on his journey, explaining it's less about a direct path and more about figuring out what it means for yourself. Brian chimes in with his own "random chance" story, revealing how they fell into it after their forum-based product didn't pan out.Aimee confesses her "alarm bells" start ringing whenever multitenant architecture is mentioned, jokingly demanding "details" and admitting her preference for more separation when it comes to reliability. Brian makes a compelling case for his company's chosen path, explaining how their high-performance, downloadable single-tenant profanity filter, CleanSpeak, handles billions of chat messages a month with extreme low latency. This architectural choice became a competitive advantage, attracting companies that couldn't use cloud-based multitenant competitors due to their need to run solutions in their own data centers.We critique cloud providers' tendency to push users towards their most profitable services, citing AWS Cognito as an example of a cost-effective solution for small-scale use that becomes cost-prohibitive with scaling and feature enablement. The challenges of integrating with Cognito, including its reliance on numerous other AWS services and the need for custom Lambda functions for configuration, are also a point of contention. The conversation extends to the frustrations of managing upgrades and breaking changes in both multitenant and single-tenant systems and the inherent difficulties of ensuring compatibility across different software versions and integrations. The episode concludes with a humorous take on the current state and perceived limitations of AI in software development, particularly concerning security.PicksWarren - Scarpa Hiking shoes - Planet Mojito SuadeAimee - Peloton TreadBrian - Searchcraft and Fight or Flight
-
249
Should We Be Using Kubernetes: Did the Best Product Win?
Episode Sponsor: PagerDuty - Checkout the features in their official feature release: https://fnf.dev/4dYQ7gLThis episode dives into a fundamental question facing the DevOps world: Did Kubernetes truly win the infrastructure race because it was the best technology, or were there other, perhaps less obvious, factors at play? Omer Hamerman joins Will and Warren to take a hard look at it. Despite the rise of serverless solutions promising to abstract away infrastructure management, Omer shares that Kubernetes has seen a surge in adoption, with potentially 70-75% of corporations now using or migrating to it. We explore the theory that human nature's preference for incremental "step changes" (Kaizen) over disruptive "giant leaps" (Kaikaku) might explain why a solution perceived by some as "worse" or more complex has gained such widespread traction.The discussion unpacks the undeniable strengths of Kubernetes, including its "thriving community", its remarkable extensibility through APIs, and how it inadvertently created "job security" for engineers who "nerd out" on its intricacies. We also challenge the narrative by examining why serverless options like AWS Fargate could often be a more efficient and less burdensome choice for many organizations, especially those not requiring deep control or specialized hardware like GPUs. The conversation highlights that the perceived "need" for Kubernetes' emerges often from something other than technical superiority.Finally, we consider the disruptive influence of AI and "vibe coding" on this landscape, how could we not? As LLMs are adopted to "accelerate development", they tend to favor serverless deployment models, implicitly suggesting that for rapid product creation, Kubernetes might not be the optimal fit. This shift raises crucial questions about the trade-offs between development speed and code quality, the evolving role of software engineers towards code review, and the long-term maintainability of AI-generated code. We close by pondering the broader societal and environmental implications of these technological shifts, including AI's massive energy consumption and the ongoing debate about centralizing versus decentralizing infrastructure for efficiency.Links:Comparison: Linux versus E. coliPicksWarren - Surveys are great, and also fill in the Podcast SurveyWill - Katana.networkOmer - Mobland and JJ (Jujutsu)
-
248
Mastering SRE: Insights in Scale and at Capacity with Aimee Knight
In this episode, Aimee Knight, an expert in Site Reliability Engineering (SRE) whose experience hails from Paramount and NPM, joins the podcast to discuss her journey into SRE, the challenges she faced, and the strategies she employed to succeed. Aimee shares her transition from a non-traditional background in JavaScript development to SRE, highlighting the importance of understanding both the programming and infrastructure sides of engineering. She also delves into the complexities of SRE at different scales, the role of playbooks in incident management, and the balance between speed and quality in software development.Aimee discusses the impact of AI and machine learning on SRE, emphasizing the need for responsible use of these tools. She touches on the importance of understanding business needs and how it affects decision-making in SRE roles. The conversation also covers the trade-offs in system design, the challenges of scaling applications, and the importance of resilience in distributed systems. Aimee provides valuable insights into the pros and cons of a career in SRE, including the importance of self-care and the satisfaction of mentoring others.The episode concludes with us discussing some of the hard problems such as the on-call burden for large teams, and the technical expertise an org needs to maintain higher complexity systems. Is the average tenure in tech decreasing, we discuss it and do a deep dive on the consequences in the SRE world.PicksThe Adventures In DevOps: SurveyWarren's Technical BlogWarren: The Fifth Discipline by Peter SengeAimee: Sleep Token (Band) - Caramel, GraniteWill: The Bear Grylls Celebrity Hunt on NetflixJillian: Horizon Zero Dawn Video Game
-
247
Exploring MCP Servers and Agent Interactions with Gil Feig
In this episode, we delve into the concept of MCP (Machine Control Protocol) servers and their role in enabling agent interactions. Gil Feig, the co-founder and CTO of Merge, shares insights on how MCP servers facilitate efficient and secure integration between various services and APIs.The discussion covers the benefits and challenges of using MCP servers, including their stateful nature, security considerations, and the importance of understanding real-world use cases. Gil emphasizes the need for thorough testing and evaluation to ensure that MCP servers effectively meet user needs.Additionally, we explore the implications of MCP servers on data security, scaling, and the evolving landscape of API interactions. Warren chimes in with experiences integrating AI with Auth. Will stuns us with some nuclear fission history. And finally, we also touch on the balance between short-term innovation and long-term stability in technology, reflecting on how different generations approach problem-solving and knowledge sharing.Picks:The Adventures In DevOps: SurveyWarren: The Magicians by Lev GrossmanGil: Constant Escapement in WatchmakingWill: Dungeon Crawler Carl & Atmos Clock
-
246
No Lag: Building the Future of High-Performance Cloud with Nathan Goulding
Warren talks with Nathan Goulding, SVP of Engineering at Vultr, about what it actually takes to run a high-performance cloud platform. They cover everything from global game server latency and hybrid models to bare metal provisioning and the power/cooling constraints that come with modern GPU clusters.The discussion gets into real-world deployment challenges like scaling across 32 data centers, edge use cases that actually matter, and how to design systems for location-sensitive customers—whether that's due to regulation or performance. Additionally, there's talk about where the hyperscalers have overcomplicated pricing and where simplicity in a flatter pricing model and optimized defaults are better for everyone.There's a section on nuclear energy (yes, really), including SMRs, power procurement, and what it means to keep scaling compute with limited resources. If you're wondering whether your app actually needs high-performance compute or just better visibility into your costs, this is the episode.PicksThe Adventures In DevOps: SurveyWarren: Jetlag: The GameNathan: Money Heist (La Casa de Papel)
-
245
Ground Truth & Guided Journeys: Rethinking Data for AI with Inna Tokarev Sela
Inna Tokarev Sela, CEO and founder of Illumex, joins the crew to break down what it really means to make your data “AI-ready.” This isn't just about clean tables—it's about semantic fabric, business ontologies, and grounding agents in your company's context to prevent the dreaded LLM hallucination. We dive into how modern enterprises just cannot build a single source of truth, not matter how hard they try. All the while knowing that it's required to build effected agents utilizing the available knowledge graphs and.The conversation unpacks democratizing data access and avoiding analytics anarchy. Inna explains how automation and graph modeling are used to extract semantic meaning from disconnected data stores, and how to resolve conflicting definitions. And yes, Warren finally coughs up what's so wrong with most dashboards.Lastly, we quickly get to the core philosophical questions of agentic systems and AGI, including why intuition is the real differentiator between humans and machines. Plus: storage cost regrets, spiritual journeys disguised as inference pipelines, and a very healthy fear of subscription-based sleep wearables.PicksThe Adventures In DevOps: SurveyWarren: The Non-Computability of IntuitionWill: The Arc BrowserInna: Healthy GenAI skepticism
-
244
Incident Vibing: The Self-Healing System - DevOps 242
Sylvain Kalache, Head of Developer Relations at Rootly joins us to explore the new frontier of incident response powered by large language models. We dive into the evolution of DevRel and how we meet the new challenges impacting our systems.We explore Sylvain's origin story in self-healing systems, dating back to his SlideShare and LinkedIn days. From ingesting logs via Fluentd to building early ML-driven RCA tools, he shares a vision of self-healing infrastructure that targets root causes rather than just restarting boxes. Plus, we trace the historical arc of deterministic and non-deterministic tools.The conversation shifts toward real-world applications, where we're combining logs, metrics, transcripts, and postmortems to give SREs superpowers. We get tactical on integrating LLMs, why fine-tuning isn't always worth it, and how the Model Context Protocol (MCP) could be the USB of AI ops, but how it is still insecure. We wrap by facing the harsh reality of "incident vibing" in a world increasingly built by prompts, not people—and how to prepare for it.PicksWarren: There is no AI RevolutionSylvain: Incident Vibing and Rootly Labs SRE event on April 24th
-
243
Decentralized Chaos: Web3 Infra, NodeOps, and the Art of Blockchain Load Balancing - DevOps 241
This week, Paul Marston from Ankr joins the crew to unpack the madness that is modern blockchain infrastructure. From his wild career transition out of financial services into 24/7 node ops for Web3, Paul shares the brutal truth about uptime expectations, decentralization challenges, and why hard forks are more like enterprise schema upgrades with a community twist. If you've ever wondered why managing a blockchain node is like owning a temperamental pet server, this one's for you.The team goes deep on the nitty-gritty of load balancing across dozens of chains, explaining why routing traffic to the “wrong” archive node could ruin your day—and how Ankr's custom load balancer is basically magic for JSON-RPC calls. Warren tosses out wild scenarios about encrypted data smuggling via blockchain, while Will confesses his angry typing habit (yes, it's back). The discussion gets even more fun with debates on innovation vs. rigor, Web2's forgotten best practices, and why testing in prod might not be such a dirty word after all.But don't think it's all crypto and code. Paul shares battle-won wisdom from running over 100 chains across bare metal, giving us a peek at the operational sophistication and automation involved. From Terraform templates to Docker configs, he walks through the process of onboarding new chains and tuning for performance. The episode also touches on emerging risks like data exfiltration via public blockchains, and why AI (used wisely) might just be the sidekick DevOps always needed.And of course memes, we talk a bit about this one: Tree Swing Product DevelopmentPicksWarren: Dvorak Keyboard Setup and Logitech K295Will: Quirky Record Player from MiniotPaul: Super Whisper - Voice Transcription Tool
-
242
Observability in the CI/CD Pipeline with Adriana Villela - DevOps 240
In this episode, Will and Warren welcome Adriana Villela — CNCF ambassador, Dynatrace advocate, and host of the Geeking Out podcast — for a wide-ranging conversation on observability in CI/CD pipelines. Adriana shares her journey from “On Call Me Maybe” to her own podcast, her work with OpenTelemetry, and why observability isn't just for SREs anymore.The crew digs into how telemetry should be integrated across the software development lifecycle — from development to QA to production — and what that really looks like in modern teams. Adriana drops knowledge on CI/CD failures, distributed traces, and even how to bring observability to other parts of the business like recruiting and onboarding. She also explains how she got involved in the OpenTelemetry end-user SIG and what's next for the observability movement.Things get persona as we trade war stories about SVN, terrible version control systems, reusable grocery bags, and the ethics of AI log parsers. Adriana closes with a powerful take: observability is a team sport, and the better we play it, the more effective — and environmentally conscious — our systems can become.PicksWarren: Adventures In DevOps survey - How can we make it better for you?Adriana: Bouldering — she recommends it both as a physical activity and a therapeutic mental reset, especially when travelingJillian: Expeditionary ForceWill: Iron Neck and Purpose & Prophet
-
241
Building Engineering Excellence with Ganesh Datta of Cortex - DevOps 239
In this episode, I (flying solo today!) sat down with Ganesh Datta, the CTO and co-founder of Cortex, to explore what it really means to drive engineering excellence at scale. And spoiler: it's not just about better dashboards or fancy developer tools—it's about treating software development like the competitive advantage it is.We went deep into the why behind internal developer portals (IDPs) and how they're transforming platform engineering, developer experience, and organizational maturity. Ganesh shares how Cortex came to life—from being paged at 2am for a mystery Game of Thrones-named microservice (yep, we've all been there), to realizing that every other business function had a system of record—except engineering.Key Takeaways:IDPs are like CRMs for Engineering: Just as sales teams wouldn't function without a CRM, modern engineering orgs shouldn't be flying blind without a structured, centralized developer portal.Engineering Excellence = Business Outcomes: Whether it's reliability, security, or platform efficiency, IDPs help codify best practices and align teams toward measurable goals.Start Small to Win Big: You don't need to overhaul everything on day one. Start with a pain point you already know—like production readiness—and improve that incrementally.SREs and Platform Engineers Love IDPs: Because it gives them the data, ownership visibility, and real-time checks they need, without the honor-system chaos.Developer Experience is Just the Beginning: Tools like Cortex aren't just about dev productivity—they're about creating resilient, aligned, scalable engineering orgs.We also geeked out about everything from naming services (“Brewer” for a feature extraction tool? Chef's kiss.) to the surprising power of reading 15 minutes before bed to improve sleep quality—yep, we went there!If you're part of an engineering team (or leading one) and want to know how to move faster and smarter, this is the episode for you.
-
240
Modern DevOps Challenges: Automation, AI, and Scaling in 2025 - DevOps 238
Is AI making us more productive? And where's the value? That's the big debate in this episode as we sit down with Zach Lloyd, founder and CEO of Warp. Warp isn't just another terminal—it's an AI-powered reimagination of the command line. From replacing traditional terminal commands with English instructions to integrating AI-driven automation, Warp is pushing the boundaries of how we work. And finally we find out what Warren is using as a keyboard.In this episode, we discuss the evolution of AI in DevOps, whether LLMs are truly intelligent or just glorified word predictors, and the existential question: will developers eventually be replaced? Warren remains skeptical, questioning the real ROI of AI tools, while Jillian embraces the potential for more accessibility and efficiency—especially if it means she doesn't have to type. Will and Zach counter with insights on the actual adoption of AI among developers and where the technology is heading.As always, we wrap up with our Picks of the Episode, including an engaging sci-fi book, a thought-provoking AI philosophy experiment, a self-promotion plug, an academic research paper, and a medieval travel guide. Tune in for an episode full of sharp opinions, tech insights, and the ongoing war between efficiency and skepticism. PicksDungeon Crawler Carl (Book)Dabble of DevOps AI Data Discovery ToolThe Impact of Generative AI on Critical Thinking (Research Paper)Granola AI Meeting NotesA Travel Guide to the Middle Ages (Book)
-
239
Matt Lea Discusses Cloud War Games and Elevating Everyday DevOps
We dive into the world of cloud architecture and engineering with a fascinating discussion led by our hosts Warren Parad and Will Button, and joined by our special guest, Matt Lea. Matt, hailing from Wisconsin, is the driving force behind innovative projects like CloudWarGames.com, a platform designed to enhance DevOps training and hiring through engaging problem-solving scenarios. As we explore his journey, from coaching gymnastics to developing digital training ecosystems, you'll discover how Matt's experiences shape his unique perspectives on technical challenges, team dynamics, and the ever-evolving landscape of cloud solutions. Whether you're curious about the technical intricacies of infrastructure or seeking inspiration for your own career path, this episode offers a captivating look at the intersection of technology, creativity, and human connections. So, sit back, relax, and get ready to explore the world of DevOps in a whole new way.
-
238
Mastering Infrastructure as Code: Lessons from Matt Gowie's Consultancy Experience - DevOps 235
In this episode, our hosts Will Button, Warren Parad, and Jillian are joined by guest Matt Gowie from Masterpoint. Together, they delve into the complex world of infrastructure as code, discussing best practices, challenges, and the human side of consulting in the DevOps space. Matt shares his journey from software development to running his own consulting agency focused on Terraform and OpenTofu. The conversation covers everything from the nuances of using Terraform workspaces, the implications of large-scale infrastructure management, to the critical soft skills needed for a successful consulting career. Whether you're a seasoned professional or just venturing into DevOps, this episode is packed with valuable insights and practical advice for navigating the ever-evolving landscape of technology.🎯 Picks:DevOps Days Zurich - Systems Thinking at AuthressDabble of DevOps AI Data Discovery ToolDungeon Crawler CarlShear Comfort Seat Covers
-
237
Optimizing Cloud Databases with novel algorithms
In this episode, we sit down with Barzan Mozafari, MIT alum and University of Michigan professor, to explore how AI-driven automation is revolutionizing cloud database optimization. Barzan shares his experiences in bridging academia and industry, and how his work in AI-powered database tuning is reshaping cloud infrastructure efficiency. We discuss the challenges of AI adoption in cloud computing, addressing concerns like implementation risks, security, and trust in autonomous agents. Barzan explains how machine learning models can optimize performance, reduce cloud costs, and automate database management, freeing engineers from tedious manual tuning.Additionally, we explore the future of AI in database systems, the evolving landscape of public and private datasets, and what the next decade holds for data-driven automation. Whether you're a DevOps engineer or a database architect, this episode is packed with insights into the intersection of AI and cloud technology.🎯 Picks:Warren - L8 Conference 2024 (Warsaw)Barzan - Book: Never Split the Difference by Chris VossJillian - Infinity Nikki (Open-world dress-up game)
-
236
AI, law, and automation with John Maly - DevOps 234
In this episode hosts Warren Parad and Will Button sit down with John W. Maley, an attorney with a master's degree in computer science from Stanford University, to discuss the fascinating intersection of AI and the legal system. John shares insights from his book "Juris ex Machina," a sci-fi exploration of a future where AI replaces humans in the jury system. The conversation dives deep into the current state and future potential of AI, touching on its overhyped status, potential vulnerabilities, and security concerns. As they navigate the topic of AI's integration in society, John, Warren, and Will explore riveting ideas about AI's role in the modern world and its implications in diverse fields, from dating apps to deepfake detection. Join us as we tap into the complexities and innovations of AI technology and ponder its future impact on society and the legal system.🎯 Picks:- Psycho-Pass- Book: Extraordinary Popular Delusions and the Madness of Crowds- Book: Juris Ex Machina- TheraGun - Muscle Massager
-
235
The Impact of Open Source on Business and Development Practices with Daniel Loreto - DevOps 233
Hosts Warren Parad and Will Button dive into a compelling discussion with guest Daniel Loreto, founder and CEO of Jetify. The conversation revolves around innovative solutions in the DevOps world, particularly focusing on the use of Nix and DevBox to streamline developer environments. Daniel shares insights from his vast experience at prominent tech companies like Google, Airbnb, and Twitter, detailing how Jetify is leveraging AI and agents to enhance software development. The trio explores the challenges of reproducible development environments, the complexities of ML development, and the strategic benefits of open sourcing tools. Along the way, they touch on the impact of AI agents on the industry and the balance between innovation and practical application. Prepare for an engaging episode filled with technical insights and thoughtful reflections on the future of software development.
-
234
AI-Powered Ads: How digital marketing is transforming
Will is absent this week due to an infrastructure issue, so Warren and Jillian take the reins in a fascinating discussion with Hikari Senju, founder of Omneky. Hikari shares how AI is revolutionizing ad technology by automatically generating creative assets tailored to different audiences and platforms. His insights reveal how businesses can scale their advertising while maintaining brand consistency and reducing content creation costs.The conversation explores the balance between automation and human oversight, the ethical implications of AI-driven marketing, and how the ad personalization technology is evolving. Hikari also dives into the infrastructure behind Omneky, discussing cloud-based AI deployment, model selection, and strategies for ensuring ad content stays fresh and relevant. If you've ever wondered how AI is reshaping the future of advertising.🎯 Picks:Warren - DuneHikari - AccelerandoJillian - Data Discovery with Dabble of DevOps AI
-
233
Exploring the Role of AI in DevOps
Join host Warren Parad, co-hosts Jillian and Will Button as they delve into a compelling conversation on the pervasive influence of AI across industries. Special guest Alex Kearns from Wiz shares his expertise on the real-world applications of AI, navigating through its rapid evolution and discussing both the opportunities and challenges it presents. From the impact of generative AI on business processes to intriguing ethical considerations, this episode provides valuable insights for professionals in the DevOps field. Tune in as the panel explores the dynamic relationship between technology, responsibility, and innovation, offering listeners a thought-provoking exploration of AI's role in shaping the future.🎯 Picks:Warren - Short Paper: Comparing genomes to computer operating systemsWill - Book: Juris Ex MachinaJillian - Resourcely (Fraim)Alex - Keychron K2 Wireless Mechanical Keyboard + MX Master 3 Mouse
-
232
Real-World Testing: Insights from Rainforest QA Expert AJ Funk - DevOps 231
In today's show, we dive into the intricate world of quality assurance and testing strategies with our special guest, AJ Funk. AJ, a seasoned software engineer at Rainforest QA, shares his unique journey from playing professional baseball to developing cutting-edge QA solutions. Joining us are co-hosts Will Button and Jillian, along with fellow guest Matteo Collina.AJ walks us through the evolution of quality assurance at Rainforest QA, emphasizing the importance of balancing confidence and velocity in testing. He highlights innovative approaches like using visual layers for testing, eliminating the need for extensive code-based tests, and explains how their no-code solutions empower teams to maintain high-quality standards efficiently.From discussing the myth of 100% test coverage to the role of AI in QA, AJ and our hosts explore practical strategies for developers. We also touch on the importance of real-world testing environments, handling microservices, and tips for leveraging Rainforest QA's robust tools effectively.Join us for a thought-provoking conversation that covers everything from the basics of end-to-end testing to advanced QA practices, and even takes some entertaining detours into the personal lives of our speakers. Whether you're a seasoned developer or just starting out in the tech industry, this episode is packed with insights you won't want to miss. Tune in for some expert advice, a few laughs, and a whole lot of valuable information!
-
231
Simplifying DevOps - DevOps 229
Ready to show off your coding skills to the world? Not so fast. In this episode, Will and Jillian discuss why developers need to simplify their product with the end goal in mind—the customer. They share some awesome examples of how to do this, how you can win Future You's approval, and the steps to create a smooth user experience.“I think it's a hard mental shift to say that my area of expertise shouldn't be visible in the product. But, you need to understand the end goal. My goal is to automate myself out of a job, then move on.”Will ButtonIn This Episode: Jillian shares a killer example that should inspire all DevOps people to simplify their process As a programmer, you want to tell the customer how great your programming is, right? Hold on…Will shares a different perspective How Will approaches his programming that avoids all customer confusion and creates a seamless experience What Jillian believes is MORE important than learning how to code (sometimes) Why you need these TWO things that will earn Future You's approvalPicksJillian - Airflow and StrapiJillian - Non-techy: Harriet the Hamster PrincessWill - Until the End of Time by Brian Greene
-
230
Navigating Salesforce DevOps Challenges and AI Innovations - DevOps 228
In today's episode, Will and Jillian dive deep into the evolving landscape of Salesforce DevOps, joined by the experienced Vernon Keenan. This episode covers a range of critical topics including recent changes in Salesforce's AI product offerings and the complexities surrounding their pricing. Vernon takes us on his personal journey from telecom to pioneering a tax engine on Salesforce, leading to his current focus on Go and Kubernetes-driven DevOps solutions.They unpack the challenges developers face with Salesforce's SFDX and metadata API, explore how companies like Capado, Flowsome, and Gearset are tackling these issues, and discuss innovative approaches from firms like Elements Cloud. The conversation also ventures into the realm of cognitive DevOps, AI-driven virtual employees, and the economic and societal impacts of these technologies. Additionally, they touch upon the shifting responsibilities in Salesforce management and the rising importance of CIOs in this space.Whether you're grappling with Salesforce's limitations or seeking insights on how AI is transforming the DevOps field, this episode is packed with expert advice, industry insights, and forward-looking perspectives. Stay tuned, as they also address public policy concerns, the potential for job displacement, and the future of SaaS DevOps. Listen in and join as they navigate the future of technology and workforce transformation!SocialsLinkedIn: Vernon Keenan
-
229
Kubernetes Schema Validation Tools with Eyar Zilberman - DevOps 227
Eyar Zilberman joins the adventure to discuss Kubernetes schema validation tools.The panel jumps in and discusses the power of and the pros and cons of the different kinds of schema validations.LinksWhy you need to use Kubernetes schema validation toolsA Deep Dive Into Kubernetes Schema ValidationDatree.ioEyar Zilberman - DEV CommunityLinkedIn: Eyar ZilbermanTwitter: Eyar Zilberman ( @eyarzilb )PicksJillian- GitHub | cloudposse/terraform-example-moduleJonathan- Sid Meier's Memoir!: A Life in Computer GamesWill- Paperlike
We're indexing this podcast's transcripts for the first time — this can take a minute or two. We'll show results as soon as they're ready.
No matches for "" in this podcast's transcripts.
No topics indexed yet for this podcast.
Loading reviews...
ABOUT THIS SHOW
Join us in listening to the experienced experts discuss cutting edge challenges in the world of DevOps. From applying the mindset at your company, to career growth and leadership challenges within engineering teams, and avoiding the common antipatterns. Every episode you'll meet a new industry veteran guest with their own unique story.
HOSTED BY
Will Button, Warren Parad
CATEGORIES
Loading similar podcasts...