Jay Ashe from Cava - Elixir in Production episode artwork

EPISODE · Apr 4, 2019 · 17 MIN

Jay Ashe from Cava - Elixir in Production

from Elixir Wizards · host SmartLogic LLC

We talk with Jay Ashe from Cava about their current and past Elixir projects and how they are deployed. Jay Ashe - Cava Find Jay elsewhere online: https://twitter.com/jgashe 0:40 - Give us a quick overview of the Elixir projects you have in production. CAVA is a fast-casual mediterranean restaurant chain with 75 stores across the US. Elixir and phoenix power CAVA’s online ordering platform (order.cava.com and the CAVA app). We’ve got a REST (and websockets) api sitting behind react and our mobile apps, and we use phoenix templates for some of our back of house systems. 1:11 - Why are you using Elixir in production? We have from the start! The application was originally implemented by Chris Bell and his team at madebymany. Chris, by the way, has a fantastic talk from ElixirConf 2016 that goes into our architecture and how we use elixir and OTP constructs to model our business logic. Chris will occasionally talk about the CAVA project on his Elixir podcast, ElixirTalk. Chris’ Talk - https://www.youtube.com/watch?v=fkDhU-2NWJ8 1:58 - What are some of the high level advantages / disadvantages of Elixir, from your perspective? Advantages: Elixir and Phoenix gives you rails-esque productivity/developer experience that scales. I think phoenix channels are a great example of this. Build a channel with complex real-time functionality and let it scale effortlessly. Disadvantages: Hiring and onboarding, depending on your mindset, can be difficult. If you’re used to hiring for experience in  your stack, its just going to be more difficult. Lately we’ve started doing one-hour weekly knowledge shares that cover elixir basics and are closely tied to our usage of them. So, here’s a test case, and here are all of the test helpers that we have set up that will help you write that test. We also just sent a new Elixir dev to lonestar elixir 3:59 - What do you use to host your Elixir app? Heroku How do you deploy your application? Heroku-buildpack-elixir https://github.com/HashNuke/heroku-buildpack-elixir 4:44 - Are you able to get zero downtime deploys? As close as possible! We get that out of the box with heroku. When we deploy, heroku won’t point traffic to the new dyno until the app is healthy. We make extensive use of Phoenix channels over websockets, and our clients will reconnect automatically and transparently. 5:10 - Do you cluster the application? Nope. 5:52 - How does your Elixir App perform compared to others in your environment? I can’t really talk about numbers here, but Elixir is not at all our bottleneck. We don’t have other production applications 6:25 - How are you solving background task processing? Quantum for cron jobs, genservers for everything else. We’re running a single elixir application that handles all synchronous and async processing 7:07 - What libraries are you using? Phoenix Phoenix_swagger for API documentation that integrates with controller tests https://github.com/xerions/phoenix_swagger Ex_rated for rate limiting calls to our integrations https://github.com/grempe/ex_rated Timex and calendar for datetime support with timezones https://github.com/bitwalker/timex A combination of httpotion and httpoison for HTTP clients, but im interested in trying Mint https://github.com/ericmj/minthttps://github.com/appcues/mojito Bamboo for transactional emails, like order confirmations etc https://github.com/thoughtbot/bamboo 8:59 - 3rd Party Services (i.e. Email, Payment Processing, etc) Sendgrid for email, Google for geocoding, slack for some internal alerting of application health, LevelUp for payments. https://www.thelevelup.com/ 10:07 - Do you have a story where Elixir saved the day in production? Yes and no. So I could tell this story by explaining the issue we saw and the underlying cause at the same time, but I think it would be more fun to tell it like our team experienced it. One day at lunch our application started going down. Lots of 500 errors. Red lights flashing. Panic ensuing. Lunch is our busiest time of day, so 1) we thought it was load related and 2) we really needed to fix it None of our traditional resources (database, cpu, memory) were constrained and our integrations that were synchronous were fine. Our logs were littered with errors from an analytics integration that ran asynchronously on genservers, but it didn’t seem related because we could see the error logs at times when our application was otherwise healthy. The team that used the analytics didn’t have a pressing need for them, and we deprioritized fixing the issue because the bug we were working on was so much more important (that’s foreshadowing). I spent a little time looking at websockets, but I was easily able to match the load of the websocket portion of our application on my local machine with no degradations in performance (thanks, phoenix), so that was out. At this point the issue was going on every day at lunch and I was getting annoyed at seeing the logs from the analytics integration when debugging, so I spent like 15 minutes finding and fixing the issue (a bad API key, basically) Voila, issue gone. Time to grab some lunch. We spent a while coming up with an explanation for this. Eventually we learned about max_restarts on a supervisor. By default, if a process crashes 3 times in 5 seconds, the process won’t be restarted again. So if another process (like the one handling a web request) tries to call that process that wasn’t restarted, the caller would crash, and we’d start to get 500 errors, customers couldn’t log in, mass confusion. So there are a few takeaways from this story: For a while, elixir saved the day in production. - A supervision tree prevented failures from the analytics process from affecting customers, until the scale of our failures exceeded the max_restart level. - Our supervision tree needed some love though, clearly. - Monitor your resources. CPU is a resource, but calls to another API are also a resource and can get unhealthy too. 15:00 - Are you using any cool OTP features? GenServers, definitely. There’s lots we can do asynchronously especially in terms of our integrations. One process per store is a cool model that scales well and keeps issues isolated to a single store. 15:50 - If you could give one tip to developers out there who are or may soon be running Elixir in production, what would it be? If you’re on a small team, Heroku or a similar provider might give you a lot of value in terms of infrastructure you can set up and forget. Learn more about how SmartLogic uses Phoenix and Elixir.Special Guest: Jay Ashe.

We talk with Jay Ashe from Cava about their current and past Elixir projects and how they are deployed. Jay Ashe - Cava Find Jay elsewhere online: https://twitter.com/jgashe 0:40 - Give us a quick overview of the Elixir projects you have in production. CAVA is a fast-casual mediterranean restaurant chain with 75 stores across the US. Elixir and phoenix power CAVA’s online ordering platform (order.cava.com and the CAVA app). We’ve got a REST (and websockets) api sitting behind react and our mobile apps, and we use phoenix templates for some of our back of house systems. 1:11 - Why are you using Elixir in production? We have from the start! The application was originally implemented by Chris Bell and his team at madebymany. Chris, by the way, has a fantastic talk from ElixirConf 2016 that goes into our architecture and how we use elixir and OTP constructs to model our business logic. Chris will occasionally talk about the CAVA project on his Elixir podcast, ElixirTalk. Chris’ Talk - https://www.youtube.com/watch?v=fkDhU-2NWJ8 1:58 - What are some of the high level advantages / disadvantages of Elixir, from your perspective? Advantages: Elixir and Phoenix gives you rails-esque productivity/developer experience that scales. I think phoenix channels are a great example of this. Build a channel with complex real-time functionality and let it scale effortlessly. Disadvantages: Hiring and onboarding, depending on your mindset, can be difficult. If you’re used to hiring for experience in  your stack, its just going to be more difficult. Lately we’ve started doing one-hour weekly knowledge shares that cover elixir basics and are closely tied to our usage of them. So, here’s a test case, and here are all of the test helpers that we have set up that will help you write that test. We also just sent a new Elixir dev to lonestar elixir 3:59 - What do you use to host your Elixir app? Heroku How do you deploy your application? Heroku-buildpack-elixir https://github.com/HashNuke/heroku-buildpack-elixir 4:44 - Are you able to get zero downtime deploys? As close as possible! We get that out of the box with heroku. When we deploy, heroku won’t point traffic to the new dyno until the app is healthy. We make extensive use of Phoenix channels over websockets, and our clients will reconnect automatically and transparently. 5:10 - Do you cluster the application? Nope. 5:52 - How does your Elixir App perform compared to others in your environment? I can’t really talk about numbers here, but Elixir is not at all our bottleneck. We don’t have other production applications 6:25 - How are you solving background task processing? Quantum for cron jobs, genservers for everything else. We’re running a single elixir application that handles all synchronous and async processing 7:07 - What libraries are you using? Phoenix Phoenix_swagger for API documentation that integrates with controller tests https://github.com/xerions/phoenix_swagger Ex_rated for rate limiting calls to our integrations https://github.com/grempe/ex_rated Timex and calendar for datetime support with timezones https://github.com/bitwalker/timex A combination of httpotion and httpoison for HTTP clients, but im interested in trying Mint https://github.com/ericmj/minthttps://github.com/appcues/mojito Bamboo for transactional emails, like order confirmations etc https://github.com/thoughtbot/bamboo 8:59 - 3rd Party Services (i.e. Email, Payment Processing, etc) Sendgrid for email, Google for geocoding, slack for some internal alerting of application health, LevelUp for payments. https://www.thelevelup.com/ 10:07 - Do you have a story where Elixir saved the day in production? Yes and no. So I could tell this story by explaining the issue we saw and the underlying cause at the same time, but I think it would be more fun to tell it like our team experienced it. One day at lunch our application started going down. Lots of 500 errors. Red lights flashing. Panic ensuing. Lunch is our busiest time of day, so 1) we thought it was load related and 2) we really needed to fix it None of our traditional resources (database, cpu, memory) were constrained and our integrations that were synchronous were fine. Our logs were littered with errors from an analytics integration that ran asynchronously on genservers, but it didn’t seem related because we could see the error logs at times when our application was otherwise healthy. The team that used the analytics didn’t have a pressing need for them, and we deprioritized fixing the issue because the bug we were working on was so much more important (that’s foreshadowing). I spent a little time looking at websockets, but I was easily able to match the load of the websocket portion of our application on my local machine with no degradations in performance (thanks, phoenix), so that was out. At this point the issue was going on every day at lunch and I was getting annoyed at seeing the logs from the analytics integration when debugging, so I spent like 15 minutes finding and fixing the issue (a bad API key, basically) Voila, issue gone. Time to grab some lunch. We spent a while coming up with an explanation for this. Eventually we learned about max_restarts on a supervisor. By default, if a process crashes 3 times in 5 seconds, the process won’t be restarted again. So if another process (like the one handling a web request) tries to call that process that wasn’t restarted, the caller would crash, and we’d start to get 500 errors, customers couldn’t log in, mass confusion. So there are a few takeaways from this story: For a while, elixir saved the day in production. - A supervision tree prevented failures from the analytics process from affecting customers, until the scale of our failures exceeded the max_restart level. - Our supervision tree needed some love though, clearly. - Monitor your resources. CPU is a resource, but calls to another API are also a resource and can get unhealthy too. 15:00 - Are you using any cool OTP features? GenServers, definitely. There’s lots we can do asynchronously especially in terms of our integrations. One process per store is a cool model that scales well and keeps issues isolated to a single store. 15:50 - If you could give one tip to developers out there who are or may soon be running Elixir in production, what would it be? If you’re on a small team, Heroku or a similar provider might give you a lot of value in terms of infrastructure you can set up and forget. Learn more about how SmartLogic uses Phoenix and Elixir.Special Guest: Jay Ashe.Links:ElixirConf 2016 - Selling Food With Elixir by Chris Bell Heroku-buildpack-elixir Phoenix Swagger Ex_rated Timex Mint Mojito LevelUp

NOW PLAYING

Jay Ashe from Cava - Elixir in Production

0:00 17:25

No transcript for this episode yet

We transcribe on demand. Request one and we'll notify you when it's ready — usually under 10 minutes.

REWIND: The Musician’s Podcast - How to Grow Your Music Career Amit Weiner Welcome to Rewind!An optimistic podcast that will help you in your successful Career in Music!Amit Weiner hosts musicians, composers, professors, and sound wizards, as they share their life stories and career decisions.Stay tuned, it’s gonna be epic! Behind The Irishman Netflix Go behind the scenes of Martin Scorsese’s The Irishman with this official companion podcast from Netflix. Hosted by comedian and The Irishman co-star Sebastian Maniscalco (“Crazy Joe” Gallo), this three-part series features interviews with cast and crew, including Martin Scorsese, Robert De Niro, Al Pacino, and Joe Pesci. Hear the story of Frank “The Irishman” Sheeran from the man he chose to tell it—and how Scorsese and De Niro fought for years to bring that story to the screen. Plus, learn about the brand-new technology the visual effects wizards at Industrial Light and Magic built to bring this epic tale to life. This podcast was produced by Netflix with FannieCo and Crossroad Productions. Thinking Elixir Podcast ThinkingElixir.com The Thinking Elixir podcast is a weekly show where we talk about the Elixir programming language and the community around it. We cover news and interview guests to learn more about projects and developments in the community. Whether you are already experienced with Elixir or just exploring the language, this show is created with you in mind. We discuss community news, Functional Programming, transitioning from OOP, coding conventions, and more. Guests visit the show to help challenge our assumptions, learn about new developments and grow in the process. Subscribe to join us on this journey! The Magic Academy John Fletcher, Russell Earnshaw Supporting coaching wizards! Rusty and Fletch love to hang with rockstars from the coaching world and get them sharing through some pretty cool questions

Frequently Asked Questions

How long is this episode of Elixir Wizards?

This episode is 17 minutes long.

When was this Elixir Wizards episode published?

This episode was published on April 4, 2019.

What is this episode about?

We talk with Jay Ashe from Cava about their current and past Elixir projects and how they are deployed. Jay Ashe - Cava Find Jay elsewhere online: https://twitter.com/jgashe 0:40 - Give us a quick overview of the Elixir projects you have in...

Is there a transcript available for this episode?

Yes, a full transcript is available for this episode. You can read the complete transcript on the episode page.

Can I download this Elixir Wizards episode?

Yes, you can download this episode by clicking the download button on the episode player, or subscribe to the podcast in your preferred podcast app for automatic downloads.
URL copied to clipboard!