We are joined today by Ben Curtis, from Honeybadger. Ben is a web developer and entrepreneur for the past 20 years. He enjoys working on the web, building products, and seeing people get to use those products.

Ben talks about the why and the how of building Honeybadger (currently just a five-person team!), the beauty of a good developer blog such as their own, replacing conference engagement digitally now that people aren’t conferencing right now, engaging with users via newsletters, and how they use technologies such as Terraform, Ansible, Fargate and Amazon Web Services (AWS) in production.

Should you find a burning need to share your thoughts or rants about the show, please spray them at devrel@newrelic.com. While you’re going to all the trouble of shipping us some bytes, please consider taking a moment to let us know what you’d like to hear on the show in the future. Despite the all-caps flaming, you will receive in response, please know that we are sincerely interested in your feedback; we aim to appease. Follow us on the Twitters: @ObservyMcObserv.

Jonan:

Hello, and welcome back to Observy McObservface, the observability podcast we let the internet name and got exactly what we deserve. My name is Jonan. I’m on the developer relations team here at New Relic, and I will be back every week with a new guest and the latest in observability news and trends. If you have an idea for a topic you’d like to hear me cover on this show, or perhaps a guest you would like to hear from, maybe you would like to appear as a guest yourself, please reach out. My email address is (jonan@newrelic.com). You can also find me on Twitter as @thejonanshow. We are here to give the people what they want. This is the people’s observability podcast. Thank you so much for joining us. Enjoy the show.

I’m joined today by my guest, Ben Curtis, from Honeybadger. How are you this morning Ben?

Ben:

I’m doing well. Thanks.

Jonan:

So Ben, tell us a little bit about yourself and what this Honeybadger product is and does maybe.

Ben:

Sure. I guess I consider myself a web developer, but I’ve been an entrepreneur and a web developer for the past, I don’t know, 20 years. And I really enjoy working on the web and building products for the web and seeing people get to use those products.

Jonan:

Building developer products is a joy. Like, I am a huge fan of anything that I can build specifically for developers. It reminds me of what I came into software to do. Like I was here just for the fun of it, and building a product for other developers gives me that tight feedback loop, where they are showing me that fun again because they’re experiencing my product, and it’s making their lives easier. And they’re excited about it.

Ben:

Definitely, I mean, one of the main reasons we built Honeybadger is because we felt like developers deserved awesome tools and we felt that we could, we could build one of those. And, yeah, it’s great having the feedback from other developers when they enjoy using your product.

Jonan:

You released Honeybadger how long ago?

Ben:

So we launched in 2012.

Jonan:

I remember at the time going to GORUCO I think in New York, the Gotham Ruby conference, and seeing someone launch an error monitoring product called Airplane. And at the time, thinking, Oh, this is yet another error reporting tool. It was maybe a popular thing to go after at the time. Is that do you remember that product or others of the error? Like, is that perception true in your experience?

Ben:

Yes. And yes. I remember Airplane. In fact, Airplane, I don’t know if you’re called, but they became Influx. So those founders went on to create Influx because they needed to create a time-series database for Airplane. And after they built that database, they realize, Hey, wait a minute, there are a lot of error trackers out there, but there aren’t so many time-series databases, and everyone has to build the time series database, so why don’t we build that instead? And so, yeah, Paul and his team went off and did that. And they folded Airplane, not too long after that, but it definitely was a time where there were a bunch of exception trackers showing up at the same time. I believe Honeybadger, Airplane, Sentry, Rollbar… we all launched around that same time frame.

Jonan:

That’s amazing. I hadn’t actually realized that that’s where Influx came from. I am familiar with InfluxDB, but I didn’t realize it was the same founding team. I just remember this Airplane getting announced at the GORUCO. And then I checked in on it maybe a year or two later, and it appeared defunct. You know, they still had to sign up for something, but it wasn’t like growing and becoming more popular. So that’s really interesting. I hear that Influx thing is catching on, though. Do you use it?

Ben:

So we have used it in the past and loved it, but we’re not currently using it. We’ve done other things.

Jonan:

Other things like what?

Ben:

So we have used the heck out of Postgres, that’s our favorite data store and the basis of, you know, where we put most of our stuff. We also recently have become more fond of DynamoDB. It’s really got quite a good scaling story, as you might expect. And you know, along the way, we played a bit with Cassandra as well. We had that deployed in production for a while, and that was a lot of fun, but eventually, the administration burden of that just got too onerous, and we moved that kind of workload to DynamoDB.

Jonan:

You were a pretty lean team over there at Honeybadger, right?

Ben:

Yeah. So we are five employees.

Jonan:

[sigh] sure.

Ben:

And for many years, we were only three of us. Whenever we try to deploy something, we try to make sure that it’s as low of an administrative burden as possible. [laugh].

Jonan:

You were apparently doing a real good job of that because it’s an amazing product for five people to have built and maintained all these years, eight years old as a company. And most of that life was with three people.

Ben:

Yeah. Yeah. We like to feel that we punch above our weight a bit when it comes to the impact per employee.

Jonan:

I feel like that is a fair statement.

Ben:

[laughs].

Jonan:

That’s amazing to me, I’m still kind of shocked because I knew that Honeybadger had been a startup that was, you know, you didn’t take investment, right? You bootstrapped from the beginning.

Ben:

That’s right. Yeah.

Jonan:

And that’s still the case. You still have not raised money.

Ben:

It’s still 100% customer-funded. Yeah.

Jonan:

I love this. This is what I want to build someday. Some company that I can own entirely in a way that I’m not driven by anyone else’s expectations about revenue. And I can focus on the things that I feel matter because, very often, startups lose sight of developers. I mean, almost every time I feel like the typical path of startups that I have experienced is, you start off, and you get a couple of founders, you have some early money, and then eventually there’s another raise. And suddenly you’re going from like 10 people to 150 in one year, and then maybe to 500 in the next year. And a couple of years later, that product is being sold and marketed and developed for the enterprise customer because they land their first enterprise customer. And it’s a whale, and it supplants the need for all of the rest of their actual customer income. Eventually, the developers think, “Wow, this product’s kind of terrible now, it’s just for enterprise people. I’m not interested.” And then the company turns around and tries to re-win the developer market to bolster the enterprise sales because it turns out you’re only able to sell to the enterprise by nature of having the developer’s interest in the first place. This is a lifecycle that I see just repeated over and over again with these companies, and your approach seems so much more sane and much more fulfilling and more likely to keep me wanting to work on the same thing for 20 years.

Ben:

Yes. Yeah. All true. Yeah. We see that pattern quite a bit. And you know, if you get on that funding train, you have to make sure that your growth matches the expectations of those who have invested in you. And in our world, it seems like the playbook is you go sell an enterprise to make sure you get that return on that investment. And we were very intentional in starting Honeybadger about opting not to have an investment. I’ve had people contact us because, again, the space was pretty hot at the time. And we just felt like; you know what? We don’t want another boss. Like, we’re happy running our own show. We don’t want someone else calling the shots. And it has been a very, at times slow process because we had to, you know, take things slowly. We didn’t have a bunch of money to hire those 50 people, but it’s always been a really fun process. And having 100% ownership, you really feel like, Hey, this is my baby. I’m going to; I’m going to make it as successful as I possibly can because it’s all mine. [laughs].

Jonan:

Yeah. I love that dream. I am sure that Honeybadger doesn’t need my well-wishes. I think you’re going to be very successful over the coming years, but I am rooting for you from the outside.

Ben:

Thanks.

Jonan:

Specifically, because you have this focus on the developers still that I think is pretty rare, I read your blog. One of the few blogs that I revisit regularly, and an example that I held up to my own team recently, trying to get them to focus on a developer blog as a technical resource and not so much as a product resource. I know where to find your product information and your marketing material that describes Honeybadger. [chuckles].

Ben:

Right.

Jonan:

I’m not here for that. I want to find the tactical details of things that are interesting to me. I mean, the review posts that you’ve had coming through lately have been fascinating. You’ve got database posts on here all the time. There are things that are relevant to software developers but not necessarily directly related to your product. What inspired you to take that tack? I guess it’s the developer focus, but is there more to it than that?

Ben:

Yeah, so, I mean, it helps that the three co-founders are all developers, right? And so we market it like how we want to be marketed to, right. We don’t do outbound sales. We don’t do a bunch of cold emails. You know, we provide great information. We provide great service, and we feel like that kind of speaks for itself. And the product kind of stands on its own. In fact, the marketing stuff recently the content, in particular, we’ve kind of revamped how we did that. It used to be that Star would write a bunch of content. She would write blog posts that would talk about Ruby and depth and exceptions and that sort of thing. But over time, I found like, it got to be unsustainable because of the depth that started to go into those things. And just she’d have to be writing all the time. So recently, in the past several months, she has set up a system where we’re actually hiring authors to write blog posts for us. And of course, of course, paying them well to do so, we don’t believe in short JSON developers at all. And that has been a fantastic success for being able to bring great blog posts, like you mentioned, to our blog. And we feel like if we can provide that resource to developers, that that will just naturally result in them checking out the product and being like, Oh, Honeybadger speaks to me as a brand, as a company. And I’m willing to give them a shot. And, you know, you can’t directly measure that. You can’t say, Oh, well, this particular visitor hit this particular page and then signed up and, you know, etc., etc. But we do see over time that, yes, that does help people become aware of who we are because of that fantastic content. And it’s worked out really well for us.

Jonan:

I imagine it has. I remember when it came across the Twitterverse that Honeybadger was looking for people to do some paid blogging for them. And I was surprised by that number that you were paying because so often people think they’re just hiring someone who’s effectively writing documentation, but it’s an intense and time-consuming creative process to write a good effective blog post.

Ben:

Yes. Um-hmm.

Jonan:

And I have a similar problem where when someone tasks me with writing about a thing, and I don’t actually take subject suggestion very well, it’s really hard for me to work a blog post that someone brought to me; it’s much easier for me to say, how do I want to describe Ruby Metaprogramming? Well, I’m definitely going to use the space analogy because spaceships are awesome. And then I, you know, go weaving down this trail, but that could take including the research behind it, you know, two weeks, it could take me 80 hours or a hundred hours of work…

Ben:

Certainly.

Jonan:

…to get that blog post done. One of the advantages that developer advocates have is that we can then reuse that content in a lot of places; I’m not just producing this blog post. So I have conference talks and, you know, streaming or YouTube videos about it. I became familiar with your product because you were so involved with the conference community.

Ben:

Yeah.

Jonan:

And now that we live in a world without conferences, how are you replacing that engagement?

Ben:

Yeah, so we were really into conferences back in the early days of Honeybadger, let’s say 2013, 2014, we were everywhere. And there were a bunch of regional Ruby conferences at the time. So it was easy to be everywhere, right? There’s always something happening. And so we were all over the place. But yeah, these days, of course, currencies aren’t happening so much, and sponsoring an online conference just doesn’t give you quite the same impact as, you know, sponsoring or presenting at an in-person conference. And of course, it’s still an experiment. I think, I believe that everything in business is you just try stuff and you see what works. But our current experiment is we’ve shifted that conference budget to podcast sponsorships. So, we know a number of podcasts that are talking to the people that we care about, people, developers, who are in our audience. And it’s a great way for us to put great content out there, but not directly, but through other podcasts, like no plans to merge. And, we’ve done several this year, and that’s worked out pretty well for us. Again, that’s not one of those things that you can track and say, Oh, well, you know, person A. Listen to podcast B. And now they signed up, and they’re on plan C, but we do get people saying, we have a little survey in our onboarding process. Hey, you know, how’d you hear about us? What are we trying to do? And we do have people dropping in, Oh, I heard about you on, you know, ‎The No Plan B Marriage Podcast or whatever, whatever on the news. And so it works. Like people show up as a result of hearing about us through podcasts or newsletters that are geared towards them.

Jonan:

I often see people trying to measure this thing in developer relations. It’s very common. We have a lot of ways that we try to measure it. My fundamental belief is that oftentimes you’re talking about goodwill in the community and net promoter score discussion aside, I have a lot of problems with NPS, but it’s really hard to measure how much developers like you or dislike you as a company, how well you’re serving their needs. There are a lot of tools that we use, and we can use sign-ups that we can give people cookies and promo codes and things like that. But I think that you’re better off, in the beginning, using this experimental model that you’re talking about. We try a thing. If it works, we will see the impact of that. We’ll hear back from our customers that they’re enjoying it. And the piece where you are producing valuable content for the community, again, means to me that you were focused on educating and lifting up developers at large, and that’s the beginning and the end of that story, in my opinion, if you get to the end of that and you take now, how do I sell them? Then you’re missing the point. The point is that you gain benefits from communities by being active, contributing members of those communities. And you’re not there for the benefit. If you’re there for the benefit and the outcome, then you lose sight of the real goal, which is just to lift people up and make the world a little better, hopefully.

Ben:

Yeah, really our goal, our whole passion, and mission as a business is to help developers have a better life in their development world, right? We want them to have a better experience with our tools. We want them to have a better experience in their day to day work. And that means providing an awesome tool, providing awesome service, providing great content that helps them level up as developers. We have a newsletter as well that we send out to people call the leveling up newsletter. And it’s all about, you know, taking the lessons that we’ve learned and that been learned by others that have gone before us, packaging them up in a way that developers who are just coming in or who want to get to the next stage in their career can use and make their lives better. And, that’s really what we’re about.

Jonan:

I see this newsletter thing coming back up a lot recently. I feel like newsletters are experiencing a resurgence, now that my inbox is mostly useless to me for work. I try very hard, at least, to get my peers to interact with me through Slack. I love that there’s a record of history. It’s very easy to thread. Anyone who comes along later can join in the conversation and participate in a way that is visible to everyone. Email threads, you spend a lot of time trying to include the right people and catch people up and forward this. And it’s basically like a direct message tool that is slow and bad and confusing to use, and no one does correctly. Like, that’s email. And now I have Slack, and really inside Slack, I don’t even want to use the DM structure if I can help it. I just want people to have public conversations. Obviously, if you’re having like an HR discussion or something like that. But the newsletter thing is big lately for some reason, maybe because the inbox now has some vacant space that we can fill with valuable pieces. What sort of things have you been talking about in your newsletter recently? What have you been seeing is valuable content of the community that you’re pushing out through the news?

Ben:

Well, actually, as you mentioned, we take the content that we’re making for our blog, and we put that into our newsletter. So we might recap two or three blog posts in a particular newsletter, put a little new content around them and package them up and send that out. So we have actually processed that Star as she is creating content for the blog. She’s also creating summary content Josh can use because Josh is in charge of the newsletter, then Josh can package up that information into a new newsletter email and send that out. So it’s a great virtuous circle. We then have developers who come back to us, responding to the newsletter, saying, “Oh, thank you so much for this story or this particular information. I’ll tell you about an experience that I had, and we’ll get, you know, we get some of that coming back to us, and it’s great.”

Jonan:

And the content cycle continues to grow.

Ben:

Exactly.

Jonan:

So I hear that you have, I suppose we should eventually talk about DevOps or at least, or observability, right? At least DevOps. We’ve got a lot of things that we talk about on this podcast. I think it’s more important to me to just have a real conversation and give people insight into the things that are interesting. But the bit that you mentioned before you came on here is that you’d been working with Terraform lately. And I’m curious how you’re using it and how you find Terraform as technology. Because I remember hearing about HashiCorp at that same conference at GARUCO, I think I may have spoken to Mitchell Hashimoto. I’m not sure if he was there. It was someone who was there who mentioned this Vagrant thing to me and how much simpler my life could be if I was just using a VM to do this thing. So I’m curious to hear how you’re using Terraform and what you think of the product.

Ben:

Yeah, so we are big fans of their software. We use Terraform quite a bit, and we use a console quite a bit as well. And we at Honeybadger, we’re 100% hosted at Amazon, and everything is in VMs. So we’re not, we’re not big fans of Docker, at least not yet. And we use Terraform to orchestrate all the infrastructure that we have at AWS. So we’ll have, say for example, our API endpoints, we’ll have a care form configuration that handles the load balancers, the target groups, the listeners, the SSL certificate for those load balancers, the auto scaling group with its launch template that spins up a bunch of instances. All of that will be in a Terraform configuration. And we’ll have that for our API endpoints. We’ll have a separate one for our psychic workers that are processing the pipeline. I have a separate one for our app instances. That’s running the main rails app that we deploy. So if there’s something to deploy, then Terraform is deploying it. We’re big, big believers in automation at Honeybadger. That’s, I mean, that’s how we punch above our weight is, we don’t have people doing things. We have robots doing things. And Terraform and combine that with Ansible for deploying changes to instances, once they’re in place by Terraform, that gives us the automation that we need to be able to manage a whole fleet of stuff without having to spend a lot of time doing it.

Jonan:

So Terraform for some people maybe are less familiar with Terraform as a technology, but if you’re coming from the Kubernetes ecosystem, you’re familiar with controllers, certainly. And a controller would be responsible for doing the types of things, although I feel like in a more specific way. A controller is I want to add a specific facet of a specific feature to micro Brunetti’s cluster. And a Terraform config is kind of overarching. You describe it as something that configures your ELBs and your instances and everything all at one time. But for a single, I guess, application or project, you have one Terraform config. Correct?

Ben:

That’s right. It’s super handy to have a repeatable process and to be able to detect any drift that might happen because as you run Terraform apply to apply the changes that you specified in your config, then it’ll tell you, Oh, by the way, I’m going to add this egress, or I’m going to change the certificate on that load balancer. And you can see really quickly, like what things have diverged from what you think they are. Another benefit that’s been really huge as we’ve gone through the compliance process and gone through the auditing process is having all of that infrastructure as code; our auditor can say, OK, well, tell me about this change that got made to production. And then you can go back and you can look and get hub history and you can say, OK, well, here’s the ticket that caused the beginning of this change. And here’s the commit that said, OK, we’ve increased this number of instances or, you know, we change this firewall configuration or whatever it is. And so you had that audit chain that shows what’s happening to your production infrastructure. So you can say, yes, we can verify that these fireballs are definitely doing their thing. And you know, those ports are locked down, etc.

Jonan:

This compliance piece is really important. I think it’s a huge part of the value that people get from embracing a Terraform and similar technologies that they don’t understand until they know that pain. But I was involved with some of the workaround compliance that we did at Heroku back in the day. And the audit trail especially is a difficult beast to tackle when you’ve been moving quickly and growing in many directions at once. And then someone comes along and says, well, how exactly did this one line of code get introduced into the system, who created it, who approved it? And when was it Floyd and on what machines? And suddenly you’ve got a little bit of work to do, but the automation really improves that process. And I guess really every part of making software there, in the end, are not very many things that can’t be done by a robot, as far as controlling infrastructure, you are using Ansible as well. And I am curious to hear how Ansible fits into the equation because I think of Ansible is a tool that I would use similarly to Terraform. But again, I have much less knowledge of the space than you do. I would think that I’m using, I know what Ansible does in the sense that it will be given a new instance and it will turn that instance into something valuable; I’ll install NGINX and set it up with the NGINX config, and then it points to this server and this server, and I’m throwing up web apps. So tell me how Ansible fits into yours.

Ben:

Well, you can use Ansible and a lot of places instead of Terraform because it can do things as well as like setting up load balancers and that sort of stuff. But the way we’ve decided to go at Honeybadger is we use Terraform for everything that’s infrastructure and then Ansible for everything that’s on an instance on a VM. So as an example, we’ll have, let’s say, an API instance that has a special role. It has certain software, it has, you know, our endpoint, our ingestion code that might be a rack of a rack app. And it has particular settings to handle a bunch of traffic, right? We’ve increased the limits. These are limits on certain files and things like that, right? So there are some specific settings that happen to, let’s say a [inaudible 00:22:37] to the instance. And Ansible walks through all those changes that we want to make. So it’ll set up maybe an SMT peak configs so that we can email the results from Cron jobs off the box, easily enough without having to run postfix at all. Add that engine next config; as you mentioned, it will take the Github repo that has our collector. And it’ll deploy that to the instance. So anything that you might do via the shell on an instance to set it up for production use, we use Ansible for that. So it puts the configurations down there. It starts with all the software. And then, we also use Ansible for creating AMI or Amazon Machine Images. So these are our golden images that we use as part of our auto-scaling groups in that launch template that I mentioned. So that when an API instance, let’s say the API servers they’re getting their CPU is going up, and so we have an auto-scaling rule that spins up a new instance, and that instance uses an AMI that’s been configured by Ansible. That’s ready to go. It has all the software on it. It’s configured. It doesn’t need to be doing any bootstrapping. It just goes. And if we have a say configuration change to make that we want to persist, we’ll spin up a new instance. We’ll run Ansible on it to make sure that it’s in sync with what we think should be on there. And then we’ll prep that image, make a new golden, and then update our auto-scaling group with that information so that it’s ready to be deployed for any auto-scaling that needs that we have.

Jonan:

You call them golden masters because this is your GM AMI.

Ben:

Yup.

Jonan:

I was hearing from someone the other day criticizing people who say, AMI, like I’ve got an AMI up there. They complained that it’s two syllables, developers, as it turns out, how a lot of opinions about names. I appreciate that AMI. I like the explicit call out to the fact that this is an acronym in conversations with new developers, which I have often they will very reasonably, I think, assume that the word that you were saying is just a word that they haven’t looked up and go looking at a dictionary for it, finding it as an acronym. So I have for your API, I’m working at Honeybadger and, I realize we have seven instances of the API running, and we’re getting overwhelmed cause we just got mentioned by Oprah, and we need to double that right. Then I would go to Terraform and ask it to double it. And underneath the hood, Terraform would use Ansible to create the new instances.

Ben:

Well, well, actually, you wouldn’t have to do anything because we have an auto-scaling policy that just watches the CPU usage on those instances and automatically increases the number of running instances based on that target. So we have a target for the say, 75% of CPU; if it gets up to there, it’ll launch a new instance, so you’re done. All that is set up via Terraform. So you might say, Oh, you know what? 75% is maybe too high. Maybe they’re getting too high. Maybe we want that to be 60% instead. So we’ll go, and we’ll update the Terraform configuration for that apply that configuration. So then we’ll have that new configuration out there, but on a day to day basis, we’re not doing any kind of intervention. Like, the automation rules are just taking care of all of that. And then if we have a configuration change that we need to, maybe we have a new rule that we want to ignore in our NGINX logs, for example, it’s like we decided that we want to ignore the load balancer, you know, health checks. So then we would go to Ansible. We would update the NGINX config that’s in our repository there and then use Ansible to apply that new, updated configuration to instances, and then create a new AMI from that. And then add that to our Terraform config.

Jonan:

So now you have a new golden master AMI going forward. Everyone’s going to use that AMI when they spin up a new instance. So you’re back to the robots running the ship.

Ben:

Exactly.

Jonan:

When things break, what goes wrong? What could happen here? It sounds to me like you live in this, this observability utopia where you don’t actually have any problems in your own stack, but I bet that’s not true.

Ben:

We do have the occasional problem more often than not. The problem that we have is, is caused by some sort of human intervention. [laughs] For example, this past week, I was working on our API stack, and there was a manual drift and I had gone in and I had made a manual change in the AWS console to handle some, I don’t know if you remember this, but a few months ago one of the SSL certificate providers, one of the certificate authorities, their certificate expired, and it caused a bunch of sites around the web to have problems because all of a sudden their SSL certificates were invalid because one of the members of the chain was now expired. Even if, in our case, their certificate we were using was valid, was not expired, but our chain certificate was expired. So when that happened, I went in really quick in the AWS console, and I set up a new load balancer with a new SSL certificate and worked around the problem. And then this week I was like, OK, you know, it’s time to go and resolve that drift. And when I did, unfortunately, we had a configuration problem that caused a security group to drop its ingress rules. And all of a sudden, it was dropping traffic. So yeah, [laughs], so I had to do a quick fix of that. So typically, our errors, the problems that we have at Honeybadger, are caused by me [laughs] when I’m not using the automation. [laughs].

Jonan:

This is a familiar story. If people describe this as PEBCAC, right?

Ben:

Right.

Jonan:

The problem exists between people and the computer. Yeah. I actually have done many similar things in the past because when I’ve gotten into DevOps work, it was typically that I owned an application in production, along with a couple of team members who had much more experience systems, infrastructure experience that I did and so. The playbooks—run books—are written by them. And my experience was walking through a run book, which I can do. I’m comfortable with a command-line environment, and I can operate there. But I did a lot of when I’m building my own things, and I tend to do a lot of our [inaudible 00:28:12]. And that is not a great way to approach this sort of problem that we all have and keep our servers online. But in an emergency, you often find yourself in that position. I was at a company one time earlier in my software career. And one of my first tasks on that team was to log in to the production database and manually so records back together that had become detached by some bug or something. And this system, it was like a user has this many reports, but every fourth report doesn’t have the user’s ID on it. So I’m looking up the reports that have a null user ID and finding what user they should belong to, and sewing the database back together from the inside. But I think as a general rule, if you find yourself logging into production, anything in this day and age, you probably need to be looking for a better approach. Would you agree?

Ben:

Yeah, totally agree. I mean, it does, it does happen. There are like instances where data gets polluted, and you have to get it fixed. But you know, I like many people who are listening, I’m sure to have had an experience early in my career where I just chose a table in the production database because I ran a query without a where clause. Right. And whoops, there goes all the data. So I agree, that yeah, you should do everything you can to avoid having to hop on a console and production because it is so easy to make a mistake that it has pretty, pretty dramatic consequences.

Jonan:

Well, you could look at it another way. There is a very popular new form of SaaS with chaos engineering, where people are building chaos engineering companies. Gremlin is the one that I’m well aware of. A lot of smart people are working at Gremlin. I’m a huge fan. And you could look at it like you are your own gremlin. You just log in; you save so much money paying Gremlin to break your stuff if you’re just in [inaudible] thing.

Ben:

Yeah. It was really fun. There was a conference that Josh had attended, and I believe it was Monitorama, which is a fantastic conference.

Jonan:

I love it. I love it.

Ben:

Someone, I think one of the speakers was talking about horror stories from their production, and someone had issued a flush all on their production Redis database.

Jonan:

I’ve heard this so many times. There’s that specific command; there should be three prompts when you flush off.

Ben:

[laughs] So, Josh came back to our Slack, and he said, “Ben, what if I told you that I had just run this and production?” And you know, I had a mini heart attack, and he’s like, “Well, I did, but it’s good to think about, isn’t it?” I’m like, yeah. So we, you know, we made up a kind of, we did at the table topics experiences like, Hey, how do we avoid that sort of scenario happening in our production thing? And yeah, it’s, it’s, it’s definitely good to be thinking about those kinds of things and come up with a way hopefully like, to not be your own Gremlin. Yeah. I like that.

Jonan:

There’s a term for this that is escaping me right now, but where you’re putting together, I guess it’s a game day, you’ve put together imaginary situations, imaginary scenarios that would go on, or you even play them out. Tell me a little bit about that process because I’m just assuming that I am learning all of this from scratch; I mostly, yeah.

Ben:

Yeah. [laughs] We’ve had a couple of new employees show up in Honeybadger over the years just to officially act as a couple. And during that process, we learned, like, Hey, there are gaps in our documentation, right? There are things that a new employee just doesn’t have in their head because they haven’t been around for eight years. And so we took quite a bit of time to write-ups, you know, the run books like you mentioned, so that these are the problems that we’ve experienced in the past, and these are how we fixed it. And so here you encounter this, this is what you should do. And I think as you walk through that, you really start to realize, Oh, well, there’s this thing. And there’s that thing. And we could do this better. And it really, I think what we did is we started making a list. OK, well, you know, we really shouldn’t have to log in to fix this particular problem. So how can we write some automation around that to fix that problem for us? And then just start working on that list. Right?

So one of the best things that you can do if you’re a DevOps person, especially if you sit back and realize you’ve got a lot in your head, is to document the heck out of everything. Like, literally, if someone could walk, walk through your documentation and do exactly what you do, then I think you’ve reached a point where you’ve documented well enough, where they don’t have to have any knowledge inside your head. And that documentation, when you look at it, if you’re a little bit embarrassed by all the things you gotta do, let’s say to roll over a database or whatever, that’s OK. Like, those are just task lists for you now to say, OK, we can, we can make some better automation around this. We can Patrone on our Postgres cluster so we can have automated failover or whatever the thing is so that you don’t have to have a human waking up at 4:00 a.m. to handle something.

Jonan:

I’ll do this often when I’m building up a new Docker file; I’m more familiar with the container ecosystem. Although having worked at Oracle for a long time, I significantly nerfed my ability to work with any kind of server, but I’ve caught up on the container ecosystem this last. And when I’m building my container into what I want it to be, I like the advantage of having that Docker file to run things, you know, programmatically, I’m configuring this thing saying, but the loop for learning or for in my case, I’ve got to figure out sometimes what I’m trying to accomplish. I don’t know the exact command I’m trying to run or what configure I’m copying where. And so if I just put it into the Docker file and spin up the container and it’s wrong, cause I checked in the container, and then I spit it back then the loop, the feedback loop is quite long, but what I’ve been doing instead recently is just configuring it manually with all of my commands. And then I just take my bash history, and I can go through and delete the bits that I don’t need. And I am actually creating all of these artifacts as I go through, it does seem like a lot of work to go and document your processes. But the real crux of what went on is captured in that bash history. And then, I can write pros around each of the steps or just add a comment. And the next person to come along, the new hire, knows how everything is accomplished. In an emergency, that person may be getting paged at 2:00 a.m. to solve a problem while I’m on a flight somewhere and unable to help them or be involved. So I have to try and pair those people. They act as pain magnets, kind of like they can find the really awful parts of your process and the system if you listen carefully when you hire someone new, and it takes them a week and a half to set up a development environment, that’s a real good sign that you’ve got some work to do.

Ben:

Yeah. And the documentation process is fantastic for creating some of that blog content as well. I know that in the past few weeks, I’ve been working on experimenting with Docker and AWS Fargate and basically replicating our entire stack, but without using VMs. So I want to have Docker containers running in Fargate, talking to various things. And so I went and created a Terraform configuration that does all that for me. And you know, it took a while to nail everything down. And I looked at that, and I’m like, you know what? I should really document this really well. And as I was doing that, I’m like, you know what? This would make a great blog post. Like, how do you run a rails app in production on Fargate? Like that’s a great blog post, right? So, now I’m working on that. And I think that blog posts can be great internal documentation for your team and with the added benefit that they are great marketing content for your company as well.

Jonan:

I am sensing a theme here in that basically, every part of your business is repurposed for ecstasy in some other sphere. And that vision of having the internal documentation for process mostly make up a blog post is actually really smart. I think a lot of companies play things close to their vest that they don’t need to. Well, this is a secret. We can’t just tell people we’re using Terraform. Then they’re going to hack all of our Terraform. Well, no, they probably want to hear that you’re using something that is a stable, mature technology, and not Jonan is awesome. Made-up container orchestration system, right? So a long series of bash scripts that I talk someone into letting me write instead of using a proper product to accomplish this. So the experience working with Fargate, how has it been?

Ben:

I am loving it. Yeah, I’m really, really enjoying that. It’s, you know, I think Heroku really popularized the whole, you know, 12-factor config and having stuff in the environment. And that was, I think, a watershed moment and web development in particular. And to me, Terraform Fargate, they continued that tradition of having all that configuration really well documented because a Docker container itself is pretty well documented, as you were talking about, that Docker files layout exactly, you know, what steps we took to get to where we are at the end. And then I Fargate configuration when you put it on top; it’s like, OK, now what’s everything in that environment that has to happen for that Docker container to actually work. You know what? Environmental variables are we injecting, and that’s all documented right there in that Fargate configuration. And then, you know, step back another layer, OK, well, what other services does this container depend on? Well, there’s a- if there’s a Redis service that it needs, OK, we can, we can deploy an Alaska cache cluster. And if there’s a database that needs to talk to you, OK, we can deploy an RDS cluster, right? And all these configuration things live in that Terraform configuration. And then you have it all at one place, like from the user all the way down to the last bit of code that you are putting in that container is all thoroughly documented and reproducible. It’s fantastic stuff.

Jonan:

I am a huge fan. I hear you using a lot of AWS services. You are mostly Amazon or all Amazon?

Ben:

We are 100% Amazon.

Jonan:

You know, the cool thing these days is to be multi-cloud because [laughs] it significantly increases the need for your product.

Ben:

Well, it significantly increases the need for well-paid DevOps people. So I mean, of course, it’s a great, it’s a great idea. [laughs].

Jonan:

Well, people would buy more Honeybadger if you talked about using multi-cloud [laughs] [inaudible 00:37:47].

Ben:

Sure. We actually started Honeybadger on- not an AWS. We started at a hosted a colo facility, and it was great. It was cheap, which was the primary concern at the time.

Jonan:

Sure.

Ben:

But, over time, we found just the automation ability that AWS has, and I’m sure, you know, Google Cloud has. I’m sure Azure has. I just don’t know them as well. But having that automation ability to recover from failure was really key in getting us to switch. So yeah, it turned out to be more expensive, of course, cause it’s, I mean, they, they charge you for that. But being able to sleep at night is priceless, really.

Jonan:

Right. I think that a lot of people overlook the cost of convenience. Time is the thing that you can’t get back. I myself overlook this regularly when I do things like remodel my own home. I’m just going to add a bathroom here. I could hire a lot of people to build that bathroom while I did consulting work in software and pay for many bathrooms. But it’s interesting to me; it’s fun for me to learn, to read plumbing code. I know it sounds like a strange thing, but those detailed bits of knowledge that are hidden away, you know, in the software world to bring that analogy back around, I’ve got some piece of documentation on the internals of how a service operates in the Amazon ecosystem. That stuff is fun for me to go and learn. And in some cases intensely frustrating when the documentation is missing, but in general, knowing the intricate details of a system and how things work in layers all the way down to, as you’re saying, the little bits of software or the literal bits on a CPU, knowing that entire circle is a lot of fun for me. And one of the things I see happening now in the DevOps ecosystem, I guess, that I see, I have an interesting problem trying to call it DevOps because DevOps is the methodology, site reliability engineering industry, where the observability industry, what would you call it?

Ben:

I’ll take any of those above things. I think they’re all great. [laughs].

Jonan:

OK. So this infrastructure ecosystem, we’re creating these layers of abstraction, but at some point, I wonder if we do harm to newcomers to the community in that they don’t have to understand the details down below that create those things. And maybe they’ll encounter some bug where the tooling that they’re using is not addressing something right. Docker engine itself is broken, and you need to run an s-trace to figure out what’s wrong with that process. And that’s not a skill that a lot of people will develop because they’ve never had to do it. So I’m curious what you think about that, are these high levels of distractions harmful for developers and their ability to solve their own problems?

Ben:

I don’t think so. I think the help that they bring more than outweighs the harm that, you know, being able to get started so easily with, again, what development is my thing. So getting started so easily in the web development world these days compared to what you had to understand, let’s say 10 or 15 or 20 years ago. That’s huge. Like, the audience is so much bigger now. I want everybody to be a developer, right. Because then that’s a bigger audience for me to buy Honeybadger. Right? [laughs].

Jonan:

Right. Exactly.

Ben:

But you know, less capitalistic. I do want everyone to be able to enjoy development if that’s what they want to do. And I wouldn’t want to say, well, OK, Bob, you have to understand how a CPU works, and you have to understand how assembler runs, and then you have to understand C, and etc., etc., all the way up before you can put out a hello world page. Right? I don’t think that’s fair to newbies, but at the same time, I do think that if someone’s interested, they will find a way to get past those hurdles of, Oh, my Docker container isn’t running. Why? Right. If they get to the point where it’s painful enough that they really want to solve that problem because I agree with you about, you know, talking about the plumbing codes, like learning those things to me, is just fascinating. I could just spend all day every day just reading up on stuff and learning new things and never really doing anything with it.

And that’s part of what makes developers tick, I think, as learning about details of minutia, that just really nobody really needs to care about, but it’s fun anyway. So I think that having those distractions gives a great landing pad for people who are new to it. And for those that are interested enough to dig in deeper when they need to, it’s all there, and there’s nothing hidden from, so I think, yeah, I’m a big fan. I like, you know, I love rails, for example, it covers so much of the plumbing work that you really don’t need to worry about that you used to have to worry about. And I love things like, you know, containers and VMs and all this abstraction that AWS handles for me. So that, I mean, even though I love running a Linux box and I have for, let’s say 25 years or whatever, I don’t really want to do that on a day to day basis. Right. I just, I just want it to work. [laughs].

Jonan:

Yeah.

Ben:

And I think those abstraction layers help get us to that point where it just works at least most of the time.

Jonan:

I think that’s a sentiment common to revests. And I subscribe, I think in many cases, well, for example, a lot of reviews to use Macs. We use a Stan and my friend when I was first getting into it, as I was trying to throw jobs about a thing that I didn’t really understand. I’m like, well, you’re using a Mac to do this thing. And my friend said, “It’s the best Linux machine I’ve ever used.” And I don’t have to worry about getting my projector to work. It just works. It’s, I guess it’s BSD based, right. But it’s the best pauses machine that I’ve ever used. And he made an excellent point. I eventually came around very fortunately, my first engineering manager offered my budget for my laptop; earlier, I was ready to buy this ASUS gaming laptop that would have melted my kneecaps off of me. And I was like, I’m just going to run Linux. It’s going to be great. And my manager came to me and was like, I want you to learn those things, but I don’t want you to learn those things now. I want you to learn Ruby right now and get really good at Ruby. And then you can come back around to those things; they will always be there for you. And I feel the same way about the abstraction that you do. I think that a lot of people get unnecessarily concerned, but if you look at it from the software perspective, you’re exactly right that we’re not starting with binary code, right. [chuckles] Here’s a common joke with the keyboard for the ultimate hacker, and it has a zero and a one on it.

Ben:

Yeah.

Jonan:

It’s going to take you a real long time to, you know, make a string, right? [laughs] Build up the ability to make a string and then add another string to it. Right. That’s no way to go about development. And I feel like the DevOps ecosystem; the infrastructure ecosystem was there not super long ago. And now we’re getting these abstraction layers that build on top of each other. So my question to you is, where does it top out? I think that dynamic languages, maybe in software, are near the top of what’s available today. And now, there’s even the no-code movement where people are just using Wysiwyg editor online to write a full web application with API integrations and all sorts of things. I’m not sure new code is entirely there yet, but maybe it will be. So what’s the infrastructure ecosystem equivalent experience to that no-code movement, or what’s coming next do you think?

Ben:

It’s hard to say; I do, I am fond of the no-code ideas. I think we’re in the early days yet. I think there’s going to be a lot of interesting stuff happening. I’m not ready to hang up my code editor just yet. But I think on the infrastructure side, there is still a whole lot you have to understand, you know, about how, for example, deploying a web app works, like why do you need load balancers? And you know, that sort of thing. And I think Heroku, in particular, did a lot to help abstract that sort of thing, but still, you have to worry about dinos and memory and CPU usage and request backlogs.

There’s still a lot of details that you have to be aware of from a performance standpoint, from a monitoring standpoint, if you want to have a reliable system. So maybe we’ll get to a point where these systems are better about telling you what you really need to care about. I’m a long time as an administrator and forever, like, you know, back in the nag those days, right? You’re like, OK, so you got to monitor your CPU, and you got to monitor your disk usage, and you got to monitor the network traffic. And if any of these things get certain levels, you have to get an alert so that you can go and do something about it. Right. And we’ve gotten away from that, thankfully. And now we’re caring more about, you know, what’s my latency and, you know, what’s my 90 percentile response time, which are much better measures than, you know, how much CPU is my particular instance using.

So maybe the future is we get to a better place, you know, we get to maybe, and maybe that’s something [along the lines 00:46:35] of RUM where we’re doing user monitoring and real-time. And we’re saying, OK, well, people are experiencing this slowdown and, you know, tools like County call help us, pick out exactly where that is happening, right? Oh, there’s a network issue that’s causing a high or, you know, whatever it is. So maybe the future is instead of having to page me at 4:00 AM to handle that network issue because a load balancer is starting to fail. Maybe it’s the systems like, Oh, I detected that there is a networking issue. And so I just started a new load balancer for you. And I’m auto replacing the unhealthy thing, and you don’t have to worry. In fact, you didn’t even know about it. And I sent you an email and the next day is saying, by the way, this happened, and we can see that in places here and there and various technology stacks today. But I think that we could get to a point where we can make that better, right. Or we can say, we don’t have to worry so much. And, maybe you can specify what your target SLA is. You know, you care that someone gets a response in 400 milliseconds, we will make that happen.

We will deploy whatever instances or whatever networking, or maybe even pops, we’ll put it out into, you know, very like, what does that? CloudFlare. They have their, you know, CloudFlare workers, right. Which are doing work for you throughout their pops throughout the world. So maybe the system will be smart enough to say, OK, I’ll just deploy your code to all my endpoints. And then that response time will go drop through the floor, you know, you don’t have to worry about it. I think that would be pretty cool.

Jonan:

And you’re just telling the system then, Hey, I don’t want anyone to ever have to wait for more than a third of a second to see any content across this entire structure. And you have some neural network somewhere that knows how to achieve that.

Ben:

Yep.

Jonan:

It has access to the right tools, and it just makes the world shaped like what you want it to be. And then it can certainly still alert you to problems, but it’s alerting you that there was a problem that “I fixed it, Boss.” Right?

Ben:

Exactly.

Jonan:

We’ve got this under control. I like that vision for the future very much. And I actually think we are closer than a lot of people realize because more than other areas, maybe other industries certainly, this is a question of numbers. If this number and this number and this number are this, then this, there are concrete paths to accomplish things. People don’t very often just kind of like hold their number of instances on some dial. And they’re like, you know, my gut tells me, I think I need 55, but maybe 40 is, you know, it’s not a very intuition guided exercise. Although there’s certainly something to be said for being intuitive enough to find the bug and taking right down, which is not a thing that we have to do. If we’ve all installed Honeybadger in our systems. I encourage you all to check it out and check out Fargate, which is another thing we talked a lot about Terraform. These are all great technologies, at least be familiar with what’s out there. And then you could just make more informed decisions about the choices you make. So many different tools do the same thing. So bad. I wonder if maybe you have some advice for our audience. If someone is just starting out, maybe they want to follow in your footsteps. They want to bootstrap their own company. And in 10 years, be where you are with Honeybadger. What advice would you give yourself or that upcoming entrepreneur that you wish you’d had back in the day?

Ben:

I think the best thing I could recommend is just to develop an attitude of learning. We’re always going to have stuff we don’t know. And if we’re in the attitude where we always want to learn new things and deepen our experience and our knowledge and whatever the realm is, I think that is the key to success. You know, if you decide, Hey, I want to be an entrepreneur. Well, then there are plenty of resources out there for you to learn about that environment, you know, from, you know, MicroComp, from the indie hackers communities, all the way up to VC backed Airbnb, right? I mean, there’s so much information out there on how to accomplish that particular career path that I think really the key is just to develop a love of learning. And that’ll take you far.

Jonan:

I agree with you a thousand percent. I think that’s how most of us ended up here. I was looking for a job that would let me learn a lot every day for the rest of my life. And I’ve certainly found it. And you’ve taught me a lot today. Ben, thank you for coming on the show. I really appreciate taking the time.

Ben :

Oh, my pleasure. Thanks for having me.

Jonan:

Thank you so much for joining us for another episode of Observy McObservface. This podcast is available on Spotify and iTunes, and wherever fine podcasts are sold. Please remember to subscribe, so you don’t miss an episode. If you have an idea for a topic or a guest you would like to hear on the show, please reach out to me. My email address is jonan@newrelic.com. You can also find me on Twitter as @thejonanshow. The show notes for today’s episode, along with many other lovely nerdy things are available on developer.newrelic.com. Stop by and check it out. Thank you so much. Have a great day.

Jonan spends most of his time staring into tiny boxes and pushing buttons. He likes Ruby, Go, machine learning and playing with robots. View posts by .

Interested in writing for New Relic Blog? Send us a pitch!