Observy McObservface Episode 15: Pens, Pages, and Pain – Taming Alert Storms with Quintessence Anx

photo of Quintessence Anx

In this episode, Quintessence Anx, Developer Advocate at PagerDuty where she helps to train people on patterns for notifications, alerts, stopping alerts, and stuff so you don't get woken up unless you absolutely really should be woken up, talks about battling alert fatigue, mean time to recovery and post-mortems as being not the way to measure things, the evolution of ChatOps, and why it’s important as a mentor to introduce newbies to your professional network.

Should you find a burning need to share your thoughts or rants about the show, please spray them at devrel@newrelic.com. While you’re going to all the trouble of shipping us some bytes, please consider taking a moment to let us know what you’d like to hear on the show in the future. Despite the all-caps flaming you will receive in response, please know that we are sincerely interested in your feedback; we aim to appease. Follow us on the Twitters: @ObservyMcObserv.

Hello and welcome back to Observy McObservface, the observability podcast we let the internet name and got exactly what we deserve. My name is Jonan. I’m on the developer relations team here at New Relic, and I will be back every week with a new guest and the latest in observability news and trends. If you have an idea for a topic you’d like to hear me cover on this show, or perhaps a guest you would like to hear from, maybe you would like to appear as a guest yourself, please reach out. My email address is jonan@newrelic.com. You can also find me on Twitter @thejonanshow. We are here to give the people what they want. This is the people’s observability podcast. Thank you so much for joining us. Enjoy the show.

Jonan Scheffler: I am joined today by my guest, Quintessence. How are you, Quintessence?

Quintessence Anx: I am great. How are you?

Jonan: I'm hanging in there. I am excited to be back on a podcast and excited to talk a little bit about a lot of the things you do. Actually, I was noticing before this episode started the fountain pens, and I feel like that should be a priority for us. We're sitting here on video, and I'm looking at the background. Quintessence has, I think, maybe eight bottles of ink with various glitters and things in them that she was showing me. This is like the fountain pen collection. There's a rack of maybe 20 fountain pens, not counting the ten that you held up on your desk. Would you tell us a little bit about your fountain pen addiction?

Quintessence: Sure. So there are more bottles of ink than maybe he indicated. So there are the eight larger bottles, which are mostly J. Herbin, for anyone who knows what that means. And then I have a bunch of tiny Diamines over here, and there's about 25 of those because I bought the Inkvent Calendar last year. So, that's all a thing. I write with fountain pens. I do normal writing; I don't tend to do calligraphy, although sometimes I do with Pilot Parallels, which is a different type of fountain pen. And I just really like being able to take notes with that type of pen, partly because it's just fun to mess around with the ink stuff but also because it actually has a lot less hand fatigue. I grew up mostly in the latter half of the '90s/early 2000s; it was right when digital rollover happened. So I still take notes by pen even if I type them up and pop them in Google Drive or whatever so the team can see them. My initial notes are usually in pen. When I'm drafting out anything, talks, content webinars, et cetera, I'm usually doing drafts in pen, scribbling things out in an idea cloud, and then I will transfer it to something that actually makes sense to other brains besides mine.

Jonan: [Laughs]

Quintessence: And that's when I digitize things.

Jonan: I do that when I'm sketching out a talk or if I'm doing an outline for a post or something or if I'm doing architecture diagrams, I can draw or sketch out what a page is going to look like. There are a lot of tools to do mock-ups and things in digital form. But I think, like you, it was about that time—I learned how to write cursive when I was coming through elementary school. It was a significant part of our education years. We started with D'Nealian, where you put the little tails on the letters and transition. It was a lot. And now my kids are like, "Dad, why didn't you just type?" It's a whole different world on its own. So, we are ostensibly here to talk about observability, speaking of observing fountain pen ink on pages. See, that was a good transition.

Quintessence: [Chuckles]

Jonan: That's called a segue in professional podcasting.

Quintessence: Is it?

Jonan: Yeah. So right now, you're working at PagerDuty.

Quintessence: Yes.

Jonan: And you are primarily responsible for waking people up in the middle of the night, I think, right? Is this your full-time job?

Quintessence: My personal job is to help train you on patterns for notifications, alerts, stopping alerts, and stuff so you don't get woken up unless you absolutely really should be woken up.

Jonan: This makes a lot more sense to me than my original pitch. I think that my elevator pitch for PagerDuty could probably use some work. But I think at this point, PagerDuty is kind of the default for the industry. If there are competitors out there, good news PagerDuty: I don't know who they are, and I don't care.

Quintessence: Cool, because we like that. We like you. [Laughs]

Jonan: PagerDuty has been out there in the communities where I've been for my whole career. I've always had PagerDuty swag. I'm sure that I own several shirts and many, many stickers from PagerDuty. So what you're doing there now is developer advocacy and then, presumably out in the communities talking about things. What sort of things are you talking about most recently for PagerDuty?

Quintessence: The most frequently requested talk I give, still, even from my pre-PagerDuty days, is about alert fatigue. I wrote it back when I was finishing up my stint as an actual SRE at the last full-time engineering role I had as I transitioned into DevRel. But it's still very relevant because it talks about how to make sure that your alerts and notifications stay in sync with your actual development. So when you're changing endpoints, then you're alerting those endpoints. You either accidentally mute something because it's not giving you the data you think it should or vice versa; it's always firing. And then there are conversations around alert storms where there's a problem that segment component service impacts everything within its blast radius, so you're getting 50 notifications about that, and just how to clean that stuff up. So that's the talk I give most. I also give a few talks that are more loosely in the DevOps space that are really more about trying to maintain process flow, process ideas, or incident management directly — we talk about that a lot.

Jonan: This is a thing that has been evolving rapidly from my perspective. I know that for people who operate as SREs and deeper in SRE communities, this has all been coming for a long time. But in the receiving end of the alerts and the incidents, I have seen a lot of change very quickly for the industry at large. I remember when I was first getting pages about an application, that it was exactly that; it was a constant stream. And then if you don't acknowledge them within a certain period of time, then the boss is getting woken up. You're trying to respond to an incident live, and at the same time, more alerts are coming in from all of these different systems, and you're clicking on your PagerDuty to get them all to silence, so the boss doesn't have to drop in unnecessarily. Now we're in this world where people are getting that dialed in, and we also have these real, I don't know if rigid is the right word, but well-formed incident response processes. You have a commander, and the channel opens, and they are updating people about the incident using bots; a lot of the time, it's chat-based workflows. This is the thing that PagerDuty has been well involved with.

Quintessence: Yeah. We basically codified an incident command structure that we based off of the Southern California firefighters' ICS incident in the command system because they had a very similar response. We basically matched the pattern that they kind of built. And the idea is when you have a response structure, as you mentioned, you have a commander who is running at the top, and they're making the top-level decisions of who's doing what. And this is for, just to be clear, major incident response when you have multiple teams working, if you have a single human working, you can tighten this up, right?

Jonan: Yeah.

Quintessence: But when you have multiple people working on the same problem but different facets of it, you need someone who's steering the ship or keeping track communally of the information. So you have the commander and their deputy kind of tag-teaming that, and then you also have someone who's scribing who is just documenting everything that's happening, or you can use ChatOps to assist and build timelines with that. And then you also have communications liaison so that way, the people who are working on whatever is going on, the various services, aren't the people that are expected to tell execs what's going to be posted externally-facing if it needs to be, that's not their job; their job is to get everything up. And then it's the liaison's job to do internal and external comms.

Jonan: So you have an appointed PR representative for the incident. So you've got the commander in there being like, "Everything is burning, but tell the executive team it's kind of smoldering." And then the comms person goes and manipulates the executives.

Quintessence: The information should be accurate, but it needs to be factual. And by that, I mean you want to make sure that you're including a succinct description of what actually is happening, usually at a 15 to no longer than a 30-minute interval, depending on what you need. But the idea is they don't necessarily need to know if I rebooted Cassandra. They need to know if the database or if the service is back up and running.

Jonan: This is like the executive summary version of events that are going down.

Quintessence: Right. And then you're going to have a post-mortem after everything is restored. People have a chance to think through what worked and what didn't, and that's going to be something else that they would get a copy of or direct access to, right?

Jonan: Yeah. And this post-mortem process, I have heard people talk about things like MTTR and post-mortems as being not the way to do things, but I don't have enough context to know what they would even propose as an alternative. Do you know what people might be hinting at when they're talking about that? Is it just the phrasing 'post-mortem'? I don't really know. I have no answer for this question, but I have heard people be critical of both of those terms, MTTR specifically, like, mean time to recovery is the wrong thing to measure, people say. Do you know anything about that?

Quintessence: So MTTR usually we've seen it or use it as mean time to resolve.

Jonan: Resolve is what I meant. Yes. Okay.

Quintessence: Yeah. So when you have MTTR, there are certain things that you should track and certain things that you should target because if you make your goal about hitting the wrong metric, it's about the metric and not about the process, you can inadvertently introduce some negative behaviors along the path to get there. So MTTR is a good metric to have. You shouldn't have it by itself. You need to have it with, I mean, people talk about MTTI and other metrics, KPIs, and are you within your error budget? You need all of them. It's not one of those things where if you can only track one, track one, you know, it's not the '90s where you have so much data, like, track whatever you please. You need context. Let's say the mean time to resolve is five minutes, and I know that's unbearably short because I'm saying resolve and not acknowledged, but we're going to go with it. If we say that the mean time to resolve is five minutes and someone else points out, "Hey, that is atypically short." Well, what's the context for that information? Are most incidents things that are not human-actionable that are being auto remediated by a script? Because in that case, yeah, the mean time might be significantly skewed. You have no context to that data. So whenever you're getting data points like that, I would say the main issue isn't MTTR or any really of the other individual metrics: it's that you're establishing metrics with no context, and that will burn you.

Jonan: Because you end up getting skewed numbers. If you have something that is being resolved automatically with a script, but you have a long-running incident that really should be addressed as an organization, that'll be kind of masked by the fact that you have a lower MTTR.

Quintessence: I wouldn't say that the number is necessarily skewed because if that's the way your org works, if most incidents are resolved in ways that you can automate, bravo. Do it up. But you need to know that context because if you're capacity planning humans on machine response, you're going to have a bad time.

Jonan: It's almost like people are the important part of tech.

Quintessence: It is almost like people are the important part.

Jonan: As soon as we all get converted to robots, that's going to save us so much time.

Quintessence: Yeah. We'll play outside again.

Jonan: [Chuckles] Yeah.

Quintessence: And robots will make all the machines and hope it doesn't go wrong. [Chuckles]

Jonan: I haven't seen any movies about that going terribly wrong. I think we'll be okay. If we're going to go a few years out from here, following the current evolution of these things, this new command structure that you had mentioned briefly, having a large team or a large coordination effort across multiple teams it's an important time to have this ICS in place. But I quite like being able to appoint myself deputy, and commander, and liaison, and I run comms for myself, for my blog when I have my database go down.

Quintessence: [Chuckles]

Jonan: Where do you see that evolving from here, the next generation of this plan? What's the next level? ChatOps is becoming a much more popular choice around this.

Quintessence: So ChatOps is part of the puzzle, but I don't think it's going to truly replace as long as we have humans involved in the process. So the more automation you introduce or automated remediation as you might call it, the less the humans need to intervene, the less you're going to need to implement a structure around incident response because that kind of structure isn't about the machine response: the code or whatever you've done is about the machine response. So I don't see the structure itself changing too much. I see the types of incidents that we're responding to shifting kind of the way that you can't actually automate away ops; you can make ops responsible for automation. You're shifting the labor.

Jonan: Yeah, this makes a lot of sense. So in the context of what I work with this observability space, there's a big transition to AI, and this is actually the part that I've been waiting for and that I'm really excited about, being able to just pay attention when things matter and have computers tell you when things matter. There is actually a thing that computers are super good at, and that's detecting anomalous data. We should be building that into these structures. So I'm excited to see that happen. But do you think that that has applications here? It seems like to me AI would be able to put us in the right direction, at least, right?

Quintessence: And we do talk to that, and we do have certain functions about that available within our product ecosystem. It's not a separate feature that you have; it's just something that goes into the overall implementation. But the idea is going back to like alert storms, if you can say, "Oh, there are all these notifications going out. We're going to aggregate them and send out one push to your phone; you're getting one call, to your point earlier, one call and you're not acting on 50 notifications, you're acting 1, and it's propagating down to everything that's affiliated with that incident.

Jonan: Bundling up all of the alerting under one category.

Quintessence: Right. And that's a very machine learning type of thing, not that you can't also manually intervene and say, "This belongs here." But when you're talking about machine learning, you can say, "Oh, I pattern recognize that these things go together."

Jonan: We have a lot of these scripts to restart things, and it ends up creating very dangerous scenarios sometimes from a systems perspective where you have a lot of flapping or something. Like, you haven't implemented a circuit breaker pattern on this API, and so the system just keeps killing it and restarting it, not knowing that it's never actually able to receive requests for the first few minutes that it stands up. It becomes this exacerbated scenario off of these kinds of things.

Quintessence: That goes more into alert design, and that goes into alert fatigue. So when you're talking about something that's flapping, if something's flapping, you're likely, not necessarily, but likely notifying on the wrong condition. You know it's flapping; flapping is not anomalous. What is anomalous? Alert on that. So if flapping is normal, you can either resolve the flapping if possible or adjust the timing, or change the endpoint that you're checking or whatever it is, right?

Jonan: Yeah. I think we have a relatively well-versed audience, but this flapping word that we keep using, the system is coming up and then dying and then coming up and then dying on and off kind of behavior that you see in these systems. So in the future, we can expect AI to simplify the process of detecting those anomalous things. Humans will likely always be involved. We won't get to go play outside. But the ChatOps piece, I was interested to hear why you think that is not going to be kind of the way forward. I mean, you did say that it would be relevant.

Quintessence: It will be relevant. I don't think it will fully replace based on how we talk about incident management, I guess. So let's talk about ChatOps. When you're talking about using it as a part of your response, either for major or non-major incidents, it's a place where you can have chatter. You could have it adjacent to a web comm or whatever you've spun up to discuss what's going on. But when you're using it, you're not going to use it as a replacement for other types of things just by the nature of how chat works unless we pivot that, but that's another communication mechanism. What I would see for ChatOps is things you need to know about but not right now or things you need to talk about and read later. I know we all live in Slack, whatever your suite of choice is, but it's probably Slack — let's be real. So we all live in Slack, in our community Slacks and everything else, and when you are talking about things, it's mostly async. It almost feels real-time because we might live in there, especially now that we're all bored.

[Laughter]

Quintessence: We do kind of live in there but chat by its nature even when you're texting with a friend, you might fire up the message, and then you go read or do whatever you're doing for work whatever time of day it is that this makes sense, that's async communication. And then chat and Slack is kind of the same. It's mostly async unless you're spinning up a channel for something. You're sending out something for someone to read later, later today, sometime tomorrow. But an incident is live stream communication, so you're not going to use it as live stream communication wholly. Like I said, it'll be a part of the chain, but it's not going to replace the whole chain just because of how it works, I think.

Jonan: I think this is a wise take. You're going to end up with hopefully a lot of the incident report, all of the aftermath of this thing having happened recorded carefully because we're all interacting there. So you are up on video conference at the same time; how do you prevent loss of data there? There will be things said in that conference that are not translated over.

Quintessence: Unless you have a scribe that does. So we have an integration with Slack that will pull in the Slack messaging from whatever channel you select. It will pop it into a timeline that it'll create as part of the incident within the entire structure of what incident means within the PagerDuty app. And when you do that, you can establish a timeline. You still need a scribe to keep more accurate; I shouldn't say more accurate, more well-described descriptions of events. But you could have someone post in Slack, a scribe, and say, "Rebooted cluster at..." timestamp. And then wait, and "Results of rebooting are..." And that's their job, and then that gets pulled into the app. But then outside of that, they're also taking notes that are more fully fleshed out — this particular command reboots only this cluster and not all of everything. And they're explaining what actually happened "Noticed on restart that it produced a few of the following anomalous errors in the logging," and then they're explaining how long it took to restart, etc. So that's the scribe's job.

Jonan: And in the case of one of these happening then at PagerDuty, you end up with a report that is well enough detailed that you think you could recreate it again, like, you can run a fake version of that event to kind of drill on it. Is that a thing? Is that a thing that you do regularly?

Quintessence: I don't personally do it regularly, but when you think about what chaos engineering is, we do failure Fridays. And the incidents are real in the sense that you're intentionally breaking things. So it's not quite what you mean. You're thinking I meant dummy incidents, and I'm saying test in prod.

[Laughter]

Quintessence: Responsibly test in prod. Before anyone goes too far with that, responsibly test in prod. So when you're talking about running incidents, one of the things that we talk about is don't mute your notifications during an experiment because that's part of the experiment. If the alerts don't fire as part of the experiment, oh no, because you know they should be; you just fired that script off. You know that something's wrong; that’s part of it. So you want to make sure that you're testing your alerts as part of your experiments, and that's not a dummy experiment. But it's still very relevant to making sure they're well-designed too, and they're going to the right people, that they contain the right information. If you get one of those alerts that gives you so little detail that you don't even know what system's impacted, you need to go in and adjust the wording.

Jonan: Yeah. This whole test in production, I come mostly from the application development side of the house, but it's always been very important to me to convey to people when I'm counseling them on setting up their infrastructure to copy your environments exactly. I think it similarly applies in this case. If you can, to the best of your ability, get your staging and your production. I've worked places where staging had been taken over by sales, and they were using it to demo, so it was inaccessible to a developer. Those kinds of problems in a systems world where you are testing in production by running chaos engineering that makes a lot of sense to me. And then, by turning off your notifications, you're effectively creating this disparity. But there are other things that just can't be recreated, presumably.

Quintessence: Right. To your point, you want to keep the two environments or three, depending on how many you have, as pretty much mirror-exact as possible and as it makes sense. If you're going to scale maybe one down so that it's smaller, you do not want to scale it down in a way that you can reasonably predict will impact the outcomes because maybe you don't want to spend however much it would cost to run up — I'm going to use AWS problems because that's what I'm familiar with. [Chuckles] You do not want to have that running on your bill if you don't have anything actually running in those environments. But at the same time, you don't want to change the net behavior. So it's important to be careful. Several jobs ago, when I was still in my junior phases of things, the longest production incident that I got to partake in, I ended up having a net resolve time of a week back before we had the slim era budgets we do have now, but still, it was very hard. And the core issue ended up being dev and prod were not in sync. It passed in dev; it crashed production.

Jonan: This reminds me that one of the hard parts to recreate if you are doing these kinds of dummy incidents instead of testing in prod would be that actual load on the systems, which seems pretty relevant in this equation. If you're setting up a whole pretend environment and testing against that to check your workflows, you could be using synthetics and things to drive traffic to those, I suppose. But if you have access to chaos engineering, there is a thing called Gremlin. I guess there are probably other people doing this, right?

Quintessence: There probably are. I'm most aware of them in the space; we do workshops with them. We actually did a conference last month. I had to think about that for a second because time's weird. [Laughs]

Jonan: Yeah. Wow. I really liked the name of the product. I like the idea of having actual Gremlins in the house running about and mucking with things. I think chaos engineering is going to be a very relevant piece for more companies going forward. I hope so. I hope that this becomes the way that we do things that you are actually able to test the effectiveness of the processes you've put in place around these things in a way that is safe because, with chaos engineering, there is a chaotic element to it. But when you talk about it that way, I think it is sometimes frightening to enterprises. They're like, "I'm sorry, you're just going to break everything on purpose?"

Quintessence: Yeah, but it goes into being able to recover agilely. And I wish I could remember the name of the talk, but one of the last conferences I got to actually attend in person this year was DevOpsDays in New York City. And one of the speakers there was talking about how you need to fail with practice. So when you're doing something like chaos engineering, which this talk wasn't specifically about, it was more about resilience. But in this case, it's wrapped up in the same idea where part of practicing the failure part is not just testing your workflows and whatever; it’s practicing the failure part. And he talked about one of the examples. I think it was the 1700s or 1800s where the naval captains that were more risk-prone instead of risk-averse actually had longer, studier, better careers. Why? Because when the unexpected happened, they had already done it themselves, whereas the people who were more regimented and followed the rules a little too much, in this sense, they were not able to adapt.

Jonan: Because they were less likely to be dealing with the crises that would arise.

Quintessence: Yes.

Jonan: In that context, the fact that I regularly light electronics on fire, that's a good thing. I'm used to putting out fires.

Quintessence: If you're talking about metaphorical fires with electronics, sure.

Jonan: [Laughs]

Quintessence: Don't burn your breadboard, please.

Jonan: Don't burn your breadboard. I have not actually lit anything on fire in a while.

Quintessence: [Laughs]

Jonan: When I was a kid, I once took a toy, and I removed the 4 D batteries from it and then took an old lamp cord straight out of the wall and hooked it onto the battery terminals; that did not go well for that toy. It turns out that's a significant voltage jump. Yeah.

Quintessence: Yeah. I remember when I was in high school, one of the physical experiments we did was we broke open a disposable camera, and the teacher trying to be helpful was like, "Don't touch the discharge element from the flash." And I was like, "Why would you..." And as I was saying the sentence, "Why would you..."

Jonan: [Laughs]

Quintessence: One of my classmates was just like, "What is..." And he didn't get hurt, hurt because it was a small disposable.

[Laughter]

Quintessence: I was like, "Oh, no. Okay. Now I know why you said that thought out loud. Okay."

Jonan: Don't touch the capacitor. There's some big ones in those old projector televisions. I've seen capacitors in there as large as my fist, and I’m reaching around them to try and rewire things I had to replace in the motherboards. These kinds of things, though, I think that you bring up an interesting point with the naval analogy. We have systems all over the place. We are in large part building these things around the structure that already exists in human society like the application of this ICS concept, this is what firefighters use to organize humans, and now we are using computers in the same way. I expect that to continue. And I'm sure there are many more examples of the ways that we organize ourselves as a society being reflected in technology. I am a little bit disappointed about the reverse influence. I think that as technology has come to shape society in particular directions, I am less pleased with the outcome. But I think that the points you make with the naval analogy make a lot of sense as far as where we are going as an industry. We are going to have to get used to the idea that things will break and prepare ourselves for things being broken by breaking them intentionally. And maybe if chaos engineering is an off-putting concept because it has the word chaos in it, just think of it as a planning exercise.

Quintessence: Yeah. Right. I know that Gremlin says a lot; I'm sure that anyone else in that space would say the same. If you've never done an experiment on your system, don't start by testing in production. Run your experiments in dev staging until you're used to running experiments and then graduate and run them elsewhere as makes sense.

Jonan: It makes perfect sense to me. If you were back in time a bit earlier on in your career and you could take something with you, what would you choose from today? What has really been so influential that you'd carry back in the time machine?

Quintessence: As a thought, or as a technology, or?

Jonan: Anything, including fountain pens.

Quintessence: Okay. I'd probably bring back one of those. Most of the stuff that's highly beneficial to young me is less about specific technology like, hey, you know that Cloud Foundry thing you're doing? Kubernetes is going to take off very soon; you haven't heard of it yet, and you'll care. So there's stuff like that. But in order to build a better junior career, it's all the mentor stuff, making sure you get attached to a good mentor, which I did eventually. They can guide you through the technology because ultimately, it doesn't matter that Kubernetes superseded because some technology will supersede them. It matters that you can adapt and understand enough of the abstraction that you can flow with.

Jonan: When I was first entering the industry, I spent a lot of time worrying about what I was learning — do you really think this is the right language? Do you think I should be using this framework or this framework? Down to the individual library. I would quiz senior developers on like, "Do you think this library is better or this one's better? Which one should I focus on learning?" And you get to a point where you realize that tools are pretty interchangeable ultimately. If you're starting to understand the fundamental concepts behind them, they transfer really quickly. A senior engineer can learn a new language or framework certainly to competency; I think in about a month, like, they can start shipping something new.

Quintessence: A senior, year.

Jonan: Yeah, right? I think when I was early on, that was certainly not the case.

Quintessence: Yeah, because you're learning all the underlying everything.

Jonan: Yeah, figuring out data structures and all. I wonder what other advice you might have for your young self. So find a good mentor, don't worry too much about the tools you're choosing.

Quintessence: Learn how to navigate the landscape of hiring both localized to you or localized for the remote hiring area, which is a little harder to define. I do still mentor, and one of the things that you can do for newbies that's very important to them is make sure that they know your professional network, and you start to integrate them to the areas of your professional network that intersect with what they need to know and what their career goals are because you really need them to start learning to empower and advocate for themselves because you're not going to be attached to their hip or anything about it. They need to be able to do it, and normally they can. They just need that push in the right direction.

The other thing that I really didn't like hearing as a newbie, and I encourage people to not do this when they start mentoring is, think of a project that is around something you want to learn and then just go build what you want to learn. You're going to be limited in the ignorance of the types of solutions you think to find and how you think to implement them because you're not the industry expert. But if you're the mentor and you are the industry expert, then you can define a project that's appropriately scoped and has the appropriate recommendations to be like, "These are the concepts that you're going to need in this industry. So while you're researching this broader project, know these few things in here and be mindful of them as you navigate through." You don't want to give them a template to just copy and paste, but you want to give them the push for the terminology to search for to find that information.

Jonan: Yeah. This is the most valuable part for me as I was starting out; it was having that notebook next to my desk. And every time I heard an acronym, or a technology, or an app inside the company that I didn't know, I would write it down. And I just had five, 10 minutes at the end of my day where I'd sit down with someone and say, "Okay, just tell me enough to Google. I want to figure it out on my own. I just don't know how to search for this thing. I'm trying to figure out what's happening with the systems calls in this thing." Okay. Look up strace, right?

Quintessence: Exactly. And the other thing, and this goes into intersections and diversity, and inclusion and stuff, but something that mentors can be aware of is, and it's not usually with malice, skipping over the malice, we know what malice looks like. People will respond to other humans with different intersections in different ways. So when I was a newbie, one of the things that I noticed when I had another more junior engineer that was male, he would usually get answered with the pattern, "Look up X, Y, Z," whatever it was, whereas I would be answered with the pattern "Let me show you so you can replicate it." And as a one-off response, that doesn't mean anything, but if you do that over and over again, you're training proactive versus reactive engineers. I actually talk about this a lot when we're talking about how you're mentoring up-and-coming engineers. And so when people start to say things later in their career about "I wonder why male versus non-male engineers respond differently," or "I wonder why white engineers versus non-white engineers respond differently." It's like, I guarantee you, in their up-and-coming days when they were being mentored subtly, and maybe not maliciously, they were guided to behave that way. It's the whole who's the person who is going to do and then apologize later, and who's the person who will ask first before they do? And all of that is socialized.

Jonan: This is a really excellent take. You have a lot of excellent takes. I'm so glad that I had you on this show. You presumably type these words on the internet sometimes. Where would people find you?

Quintessence: If you Google my Unquantified Serendipity presentation for this particular area that we're talking about now, I have it up on Notist. And there are a few recordings of it at different conferences I've given. I've given it both as a five-minute lightning where I go through it very quickly, but I've also given it as a fuller 25-minute version.

Jonan: And if they wanted to find you at The Fountain Head, go to the source of your entity online. Would Twitter be the place you're mostly hanging out?

Quintessence: Yeah, Twitter is the place I mostly hang out and respond to DMs. I haven't logged on LinkedIn; I log into it once a month-ish. So I will respond to you faster on Twitter.

Jonan: I'm surprised that LinkedIn actually achieved any kind of relevance in our work given where it started. I remember that we just wholly ignored it for years and years. I didn't even have anything on there. I think my name was on there, and that was ten years ago.

Quintessence: 10 years ago, yeah.

Jonan: And now we have this LinkedIn thing being pretty central, and people message me on there regularly expecting an SLA of some variety, I'm not sure. [Chuckles] Just hit me up on DMs. I leave my DMs open. For those who don't, just tag them in a tweet, go talk to the developers on Twitter. This is a great way to achieve this process of making friends in your industry that is so important right now in the absence of physical events. I was always telling people -- I go to code schools a lot because I came through a code school. Go around to every meetup you can find. Go and get the perspective and figure out what people are actually doing. And now that's just gone. It's just vaporized.

Quintessence: Well, it's gone, but it's also shifted. So, for example, I live in Buffalo, New York, and the closest major city is not New York City, which is downwards; it’s Toronto, which is upwards. And I would not normally, with any regularity, be able to make it to Toronto to visit any of the dev people, DevRel, DevOps, to visit that community just because it's about a two-ish hour drive with the way that traffic flows into the city. But virtually, borderless, I can actually make time to attend their meetups or partake in their Slack or whatever. It's easier to get broader communication. There's a lot of small meetups that are disappearing because they didn't have the membership to really transfer to virtual well. We had a lot of that happen in Buffalo, too, where we got one super meetup instead of the more topical meetups. Yeah.

Jonan: And I think we'll flip back around on that. I think the mentorship piece is vitally important. It's also kind of a big ask. Sometimes I used to use meetups as an opportunity to form mini mentorships with people, so-and-so knows about this thing, and I just hit them up in rotation—you kind of keep a list of who knows what. I think when and if we recover from this pandemic, those physical events will return, and I would continue encouraging people to attend.

Quintessence: Yeah.

Jonan: So we can find you on Twitter. Your prediction for next year is that the robots are going to take over the world, and we're all going to hang out outside. And you're welcome back on the show. I hope that you will come back in a year so we can compare and analyze our past experience. We'll go through the post-mortem.

Quintessence: We'll go through the post-mortem, and we'll see if anything can sustainably reopen and if vaccines end up working out.

Jonan: I'm betting on both of those, I am.

Quintessence: I'm hoping. I'm not betting, yet.

Jonan: I've got $3 on it. I'll bet you $3 a year from now we can go outside and are vaccinated.

Quintessence: Okay. Let's change it from money to a latte or something. I like lattes.

Jonan: I will buy you a puppy. I'll buy you ten puppies. I'll just ship them to wherever you're living unexpectedly.

Quintessence: No living companions as gifts. Luna cat would be distressed.

[Laughter]

Jonan: Speaking of, we did not mention the fact that you have a companion living in a little aquarium there that I want to --

Quintessence: I have a mantis shrimp, which is neither a mantis nor shrimp. It's just the common name.

Jonan: What is your mantis shrimp's name?

Quintessence: I call her Cora. So when I first got her, she was very orange in color. She's about the length of your pinkie, to give viewers an idea of her dimensions. She is not a peacock mantis. There are about eight-ish inches when they're fully mature, so she's not that large. But every time she molts, she has slowly gotten greener and greener. So now she's less coral-colored, which is how I named her, and more like algae-colored.

Jonan: She's much more on theme for PagerDuty these days.

Quintessence: Yeah.

Jonan: Excellent.

Quintessence: That's my girl.

Jonan: Well done, Cora. We will check in with you next year and see the transition. If you change companies, Cora may have to rebrand, but we'll see how it goes, hopefully not.

Quintessence: Nice.

Jonan: Thank you for coming on the show, Quintessence. I really appreciate it. And I look forward to having you back in a year.

Quintessence: Awesome. I look forward to speaking with you.

Jonan: Take care.

Thank you so much for joining us for another episode of Observy McObservface. This podcast is available on Spotify and iTunes, and wherever fine podcasts are sold. Please remember to subscribe, so you don’t miss an episode. If you have an idea for a topic or a guest you would like to hear on the show, please reach out to me. My email address is jonan@newrelic.com. You can also find me on Twitter as @thejonanshow. The show notes for today’s episode, along with many other lovely nerdy things, are available on developer.newrelic.com. Stop by and check it out. Thank you so much. Have a great day.

By Jonan Scheffler

Jonan Scheffler is a former Director of Developer Relations at New Relic. Jonan spends most of his time staring into tiny boxes and pushing buttons. He likes Ruby, Go, machine learning and playing with robots.

The views expressed on this blog are those of the author and do not necessarily reflect the views of New Relic. Any solutions offered by the author are environment-specific and not part of the commercial solutions or support offered by New Relic. Please join us exclusively at the Explorers Hub (discuss.newrelic.com) for questions and support related to this blog post. This blog may contain links to content on third-party sites. By providing such links, New Relic does not adopt, guarantee, approve or endorse the information, views or products available on such sites.

750+ integrations to start monitoring your stack for free.

See All Integrations See All Integrations

Observy McObservface Episode 15: Pens, Pages, and Pain – Taming Alert Storms with Quintessence Anx

Related Topics

Related