Whiteboard Confessional: My Metaphor-Spewing Poet Boss & Why I Don’t Like Amazon ElastiCache for Redis episode artwork

EPISODE · Apr 3, 2020 · 12 MIN

Whiteboard Confessional: My Metaphor-Spewing Poet Boss & Why I Don’t Like Amazon ElastiCache for Redis

from Last Week In AWS Podcast · host Corey Quinn

About Corey QuinnOver the course of my career, I’ve worn many different hats in the tech world: systems administrator, systems engineer, director of technical operations, and director of DevOps, to name a few. Today, I’m a cloud economist at The Duckbill Group, the author of the weekly Last Week in AWS newsletter, and the host of two podcasts: Screaming in the Cloud and, you guessed it, AWS Morning Brief, which you’re about to listen to.LinksCHAOSSEARCHRedisAmazon ElastiCacheTwitter: @QuinnyPigTranscriptCorey Quinn: Welcome to AWS Morning Brief: Whiteboard Confessional. I’m Cloud Economist Corey Quinn. This weekly show exposes the semi-polite lie that is whiteboard architecture diagrams. You see, a child can draw a whiteboard architecture, but the real world is a mess. We discuss the hilariously bad decisions that make it into shipping products, the unfortunate hacks the real-world forces us to build, and that the best to call your staging environment is “theory”. Because invariably whatever you’ve built works in the theory, but not in production. Let’s get to it.On this show, I talk an awful lot about architectural patterns that are horrifying. Let’s instead talk for a moment about something that isn’t horrifying. CHAOSSEARCH. Architecturally, they do things right. They provide a log analytics solution that separates out your storage from your compute. The data lives inside of your S3 buckets, and you can access it using APIs you’ve come to know and tolerate, through a series of containers that live next to that S3 storage. Rather than replicating massive clusters that you have to care and feed for yourself, instead, you now get to focus on just storing data, treating it like you normally would other S3 data and not replicating it, storing it on expensive disks in triplicate, and fundamentally not having to deal with the pains of running other log analytics infrastructure. Check them out today at CHAOSSEARCH.io.When you walk through an airport—assuming that people still go to airports in the state of pandemic in which we live—you’ll see billboards saying, “I love my slow database, says no one ever.” This is an ad for Redis. And the unspoken implication is that everyone loves Redis. I do not. In honor of the recent release of Global DataStore for Amazon ElastiCache for Redis. Today I’d like to talk about that time ElastiCache for Redis helped cause an outage that led to drama. This was a few years back and I worked at a B2B company—B2B of course, meaning business-to-business. We were not dealing direct-to-consumer—I was a different person then, and it was a different time, specifically, the time was late one Sunday evening, and my phone rang. This was atypical because most people didn’t have that phone number. At this stage of my life, my default answer when my phone rang was, “Sorry, you have the wrong number.” If I wanted phone calls, I’d have taken out a personals ad. Even worse when I answered the call, it was work. Because I ran the ops team, I was pretty judicious in turning off alerts for anything that wasn’t actively harming folks. If it wasn’t immediately actionable and causing trouble, then there was almost certainly an opportunity to be able to fix it later during business hours. So, the list of things that could wake me up was pretty small. As a result, this was the first time that I had been called out of hours during my tenure at this company, despite having spent over six months there at this point, so who could possibly be on the phone but my spineless coward of a boss? A man who spoke only in metaphor, we certainly weren’t social friends because who can be friends with a person like that?“What can I do for you?” “As the roses turn their faces to the sun, so my attention turned to a call from our CEO. There’s an incident.” My response was along the lines of, “I’m not sure what’s wrong with you, but I’m sure it’s got a long name, it is incredibly expensive to fix.” Then I hung up on him and dialed into the conference bridge. It seemed that a customer had attempted to log into our website recently and had gotten an error page, and this was causing some consternation. Now, if you’re used to a B2C or business-to-consumer environment, that sounds a bit nutty because you’ll potentially have millions of customers. If one person hits an error page, that’s not CEO level of engagement. One person getting that error is, sure it’s still not great, but it’s not the end of the world. I mean, Netflix doesn’t have an all hands on deck disaster meeting when one person has to restart a stream. In our case, though, we didn’t have millions of customers, we had about five and they were all very large businesses. So, when they said jump, we were already mid-air. I’m going to skip past the rest of that phone call in the evening because it’s much more instructive to talk about this with the clarity lent by the sober light of day the following morning. And the post mortem meeting that resulted from it. So, let’s talk about that. After this message from our sponsor. In the late 19th and early 20th centuries, democracy flourished around the world. This was good for most folks, but terrible for the log analytics industry because there was now a severe shortage of princesses to kidnap for ransom to pay for their ridiculous implementations. It doesn’t have to be that way. Consider CHAOSSEARCH. The data lives in your S3 buckets in your AWS accounts, and we know what that costs. You don’t have to deal with running massive piles of infrastructure to be able to query that log data with APIs you’ve come to know and tolerate, and they’re just good people to work with. Reach out to CHAOSSEARCH.io. And my thanks to them for sponsoring this incredibly depressing podcast. So, in hindsight, what happens makes sense, but at the time when you’re going through an incident, everything’s cloudy, you’re getting conflicting information. And it’s challenging to figure out exactly what the heck happened. As it turns out, there were several contributing factors, specifically four of them. And here’s the gist of what those four were. Number one, we used Amazon ElastiCache for Redis. Really, we were kind of asking for trouble. Two, as tends to happen with managed services like this, there was a maintenance event that Amazon emailed us about. Given that we weren’t completely irresponsible, we braved the deluge of marketing to that email address, and I’d caught this and scheduled it in the maintenance calendar. In fact, we specifically were allowed to schedule when that maintenance took place. So, we scheduled it for a weekend. In hindsight: mistake. When you’re having maintenances like this happen, you want to make sure that they take place when there are people around to keep an eye on things. Three, the maintenance was supposed to be invisible. The way that Amazon ElastiCache for Redis works is you have clusters, and you have a primary and you have a replica. The way that they do maintenances is they wind up updating the rep...

Join me as I continue a new series called Whiteboard Confessional by exploring a time in a previous life when Amazon ElastiCache for Redis caused an outage that led to drama, what it was like to work for someone who can be described as a “metaphor-spewing poet,” how every event and issue makes sense in retrospect, why you should never schedule important maintenance on a weekend, how Amazon ElastiCache for Redis works, the four contributing factors that led to the outage in question, why blameless post mortems are only blameless if you have that kind of culture driven from the top, and more.

NOW PLAYING

Whiteboard Confessional: My Metaphor-Spewing Poet Boss & Why I Don’t Like Amazon ElastiCache for Redis

0:00 12:00

No transcript for this episode yet

We transcribe on demand. Request one and we'll notify you when it's ready — usually under 10 minutes.

MG Show MG Show The MG Show, hosted by Jeffrey Pedersen and Shannon Townsend, is a leading alternative media platform dedicated to uncovering the truth behind today’s most pressing political issues. Launched in 2019, the show has grown exponentially, offering unfiltered insights, comprehensive research, and real-time analysis. With a commitment to independent journalism and factual integrity, the MG Show empowers its audience with knowledge and encourages active participation in the political discourse. Eat to Live Jenna Fuhrman, Dr. Fuhrman Our health is our most precious gift and smart nutrition can change your life. Each month, join Dr. Fuhrman and his daughter, Jenna Fuhrman as they discuss important topics in the world of nutrition. Eat to Live will change the way you eat and think about food. French Your Way Jessica: Native French teacher founder of French Your Way Boost your French listening skills and test your comprehension with this one of a kind series of podcasts. Get the chance to listen to a real conversation between native speakers talking at normal speed AND customise your learning experience through carefully designed sets of questions (2 levels of difficulty) available for download at www.frenchvoicespodcast.com. All interviews also come with the transcript. French teacher Jessica interviews native speakers of French from around the world who share a bit of their life and passion. Where else would you meet in one same place a French yoga teacher based in Melbourne, a soap manufacturer from Provence, or a couple cycling around the world? That Hoarder: Overcome Compulsive Hoarding That Hoarder Hoarding disorder is stigmatised and people who hoard feel vast amounts of shame. This podcast began life as an audio diary, an anonymous outlet for somebody with this weird condition. That Hoarder speaks about her experiences living with compulsive hoarding, she interviews therapists, academics, researchers, children of hoarders, professional organisers and influencers, and she shares insight and tips for others with the problem. Listened to by people who hoard as well as those who love them and those who work with them, Overcome Compulsive Hoarding with That Hoarder aims to shatter the stigma, share the truth and speak openly and honestly to improve lives.

Frequently Asked Questions

How long is this episode of Last Week In AWS Podcast?

This episode is 12 minutes long.

When was this Last Week In AWS Podcast episode published?

This episode was published on April 3, 2020.

What is this episode about?

About Corey QuinnOver the course of my career, I’ve worn many different hats in the tech world: systems administrator, systems engineer, director of technical operations, and director of DevOps, to name a few. Today, I’m a cloud economist at The...

Can I download this Last Week In AWS Podcast episode?

Yes, you can download this episode by clicking the download button on the episode player, or subscribe to the podcast in your preferred podcast app for automatic downloads.
URL copied to clipboard!