Grafana With Brandy Smith

Posted on Tuesday, Aug 20, 2024
This week Brandy Smith joins Mandi to talk all things Grafana, and some cool Raspberry Pi projects!

Transcript

Mandi Walls: Welcome to Page It to The Limit, a podcast where we explore what it takes to run software in production successfully. We cover leading practices used in the software industry to improve the system reliability and the lives of the people supporting those systems. I’m your host, Mandi Walls. Find me at LNXCHK on Twitter.

Mandi Walls: All right, welcome back folks. Welcome back from our little bit of summer break. It’s been a few months since we’ve had a regular guest on, so today I’d like to welcome Brandy Smith. Brandy, welcome to the show. Tell us a bit about yourself and what it is that you do.

Brandy Smith: Sure thing. Thanks for having me. I’m Brandy Smith. I’m a staff solutions engineer, over at Grafana Labs. A little bit about me. I’ve been in the tech industry for more than a decade now, and I’m really passionate about observability. So, my career in the tech space, it’s kind of been all over the place, but it’s heavily focused on cloud, working with customers and making them successful on their digital transformation journeys. So whether that is giving them prescriptive guidance, talking them through best practices or getting hands-on and helping them implement their large architectures. And then I’ve spent time at Google. I’ve spent time at AWS, AppDynamics over at, now they’re part of Cisco, but over at AppDynamics. And then now I’m over at Grafana, and I’ve been here at Grafana for a couple of years now, but I’ve actually been using Grafana for, gosh, I think it’s been six or so years in my home lab, Grafana and Prometheus.

Mandi Walls: Awesome. So for folks who aren’t familiar with Grafana, do you just want to take us through the basics there? What it is, what it does, it’s open source so folks out there can get a chance to try it out if they want to.

Brandy Smith: Yeah, so a lot of folks know Grafana for the front end dashboarding, but it’s so much more than that. It’s an open and composable observability stack. That’s what you’ll see out on our website. But what does that mean exactly? It’s got a little bit of everything for metrics, logs, traces, there’s a cloud version of it, there’s an enterprise version, there’s the open source version and on the cloud version, there’s even a free tier. So if you don’t want to go and set up the open source version for whatever it is that you’re monitoring and tinkering with, there’s a free tier for cloud. So you can go and set that up and we host the backend for you as well. And Grafana has a ton of different plugins to popular tools. So maybe you are using BigQuery in your day-to-Day life, we have a plugin for that. We have an integration with PagerDuty, actually. I’ve got a ton of customers that are both customers of PagerDuty and Grafana. And so I think we’ve got more than 200. It may even be more than 300 plugins, sorry, marketing team or anybody from Grafana that seeing this, we grow so fast, being open source, we have a big community, and so we’ve got a ton of folks that are contributing always. And it feels like that number is growing all the time.

Mandi Walls: Yeah, absolutely. And we’ll put some links in the show notes for folks who can find more information and join your community if they’re in for it and contribute maybe that would be great. Yeah, the plugin ecosystem, and we have it too. It’s an interesting way to get everything aligned for people. How do you find folks leveraging that when they want to do a digital transformation, when they want to improve their observability? Are you finding they have most of the things that they want? The plugin system is really growing, I think across everybody, but is it well populated on your side there too?

Brandy Smith: Yeah, so a lot of times I’ll see customers, and this is even before joining Grafana, I’ll see enterprises that’ll say, Hey, I’ve got all of these different tools and they’re kind of spread out all over the place and I just need to see them in one place so that when an incident occurs, I’m not hopping from system to system to system. I just kind of need to see everything. And it’s not just when an incident occurs, it can be just day to day life. But I mentioned just when an incident occurs because that’s kind of the most critical time where you don’t want to be jumping from tool to tool.

Mandi Walls: Yeah, absolutely.

Brandy Smith: Yeah. And if you do need to jump from tool to tool, it’s nice to be able to just link out to it from one place. And I guess the other thing is if there’s not a plugin, you can make your own and it’s really easy to publish them out in the community. I think it’s the same for PagerDuty, right?

Mandi Walls: Yeah. We have a process where you can submit whatever you want to, really, and see what folks adopt and go from there.

Brandy Smith: Yeah, it’s the same over here. There’s a process around it, but it’s fairly simple. And so I’ve seen folks, and I’ve even got a couple of things that I’ve dabbled in. They’re mostly internal over here, but creating the plugin, we have this notion of scenes where it’s an app that sits within Grafana, and so you could say like, Hey, I want my custom collection of dashboards to where it acts as if an app, and you could create that and publish it as well. And I think that’s pretty cool because you can kind of say, there’s nothing that fits my use case, but I’ve created this thing and now I do have something.

Mandi Walls: The customizability super important. We see that too. We don’t want to force folks to change the way they work, just use your tool. And they’re comfortable with the kinds of telemetry and things that are already in their mind that are important about their application. They know more about that than we do for sure. We have no idea what you’re looking at in your stuff, but that customizability has to be super valuable to everyone who uses it.

Brandy Smith: Yeah. Yeah. That’s actually what kind of really got me excited about observability in the first place. So I was working a customer on a data streaming project, so I started my career in data analytics, but I was working with a customer on a data streaming project and I needed to build a dashboard. I had the use case was streaming data from an item that was outside of the cloud and it was going to be streaming into the cloud, and then we’d build a dashboard on top of it and to get the data from outside of the cloud, I was just using raspberry pie and I don’t think you can see it because my camera’s blurred, but I’ve got this raspberry pie smart car up here. I’m always tinkering with something technical. But that raspberry pie ended up turning into this raspberry pie smart car. But that was a long time later.

Brandy Smith: Sorry, I’m kind of going on a side tangent here. Let’s go back. So I ended up coming up with this, okay, let’s have Raspberry Pi, let’s put a sense hat on it. Let’s collect some metrics about the environment around us, and then we’ll ship that into the cloud. And then once we do that, we’ll build the dashboard because it’s very simple. You can kind of show the flow of data and then we’ll build a dashboard. And the dashboarding tool that we were using, it was just very BI centric. And it’s not like any of the big names. I don’t even think it exists anymore today because this was a long time ago, but it’s very BI centric and I couldn’t visualize the metrics and they were metrics scraped from Prometheus. And so it was just one of those, okay, what’s out there? And I was chatting with a friend that was at a different customer.

Brandy Smith: I was at my weekly office hours on site, and he was like, you should check out this open source tool. This is again, six years ago. He’s like, you should check out this open source tool. And that’s when I learned about Grafana. I built. I was like, well, this is really cool. And that’s kind of when I started down this path of observability. And I don’t even know if we were calling it observability at the time, to be honest with you. I think it was just still monitoring, but it’s really so much more than that and there really is a distinction between the two.

Mandi Walls: Well, let’s go into that. It’s been a while since we’ve covered observability on the show. So for folks who are maybe new to the concept and for PagerDuty, we’re going to receive whatever you want to give us. So whether you’re learning off of traces or regular monitoring, do whatever you want. But for folks who are looking for something that’s a little bit more sophisticated than regular monitoring, take us through what observability sort of gets them versus just regular monitors.

Brandy Smith: Yeah, so I’ll get the primer of observability first. I think it’s this ability to get the insight and understand your systems in all of your systems. So the full stack internal state by monitoring the internal outputs and external outputs, excuse me. So you’ve got the three pillars, metrics, logs, and traces, and those are those outputs. And I almost like to describe it having a window into how the system’s performing and that helps detect and troubleshoot those issues. And for me, good observability is proactive. I tell all my customers, anybody that I talk to, good observability is proactive. And those metrics and traces are the absolute basic observability data. And then if you can correlate those that helps identify and resolve issues quicker. And so having that for the primer, a common misconception is that observability and monitoring are the same thing. And that’s actually a very common question that I get for me and how I explain it is that monitoring tells you when something’s wrong, and then observability lets you kind of dig in and ask why it went wrong. And so that monitoring world is often reactive. Observability is proactive. And I have a daughter that started driving a couple years ago and when she was driving and had a check engine light that came on, I had the perfect analogy for this. I’d like to think of it as the difference between a check engine light and a car diagnostics tool while monitoring is a check engine light, but observability takes all of those check engine lights and plugs into the diagnostics tool and says, okay, here’s going on.

Mandi Walls: Yeah, excellent. And I think of it sometimes too as the ability to it ever to have an ant farm when you’re a kid or have one in the classroom, and it’s a glass side, so you can see all the things that are going on. And I kind of think my perception of observability is more of a glass window into all the gears and moving things that are going on in your system, especially now that we’re in distributed systems. It was different when we lived in the monolith, everything was in one place. It was in one service probably that was the whole point. It was all in there. You have one place you’re tracing from. I worked with AOL server, which would allow you to get in and look at threads. There was a native interface for that. It was a totally different ballgame than one we’ve got now. We are trying to figure out all these things connect to each other, and this thing uses API, and that thing uses RPC, and here it’s calling this and there’s the database connection, and it failed somewhere in the middle, but good luck. Yeah,

Brandy Smith: Yeah, no, you nailed it too. Talking about distributed systems, that has completely changed everything.

Mandi Walls: Yeah, I think folks kind of forget that for a while we didn’t kind of need so much sophisticated observability stuff. We were working in monoliths. It’s the devolution of that that’s sort of driven entire other tool set revolution. And the tools are amazing, but we didn’t totally need them before. Now we definitely do.

Brandy Smith: Yes. Yes, a hundred percent.

Mandi Walls: So for folks that you work with, your customers and stuff, what do you see as their first issue, their main component that they’re like, okay, now we finally need something like this to help us out. Are they being driven by like you say outages, or is it just more of a need to know for their KPIs, something like that?

Brandy Smith: Yeah, so I think it’s a couple of things. Sometimes I’ve seen it be an outage. It’d be like something major comes up and they’re like, okay, we to, we’ve got all the data, but we need a place to visualize it. Ideally, that’s not how it goes. And I’m like, please don’t wait until it gets to that level of use. A lot of times it’s a case of, Hey, I’ve actually got CloudWatch. I’ve got Google Monitor. I’m old school Google Cloud. And so I’m always trying to call it Stackdriver, but I think it’s Google Operations Suite or Google Cloud monitoring. So for the Google Cloud folks, please don’t come at me.

Brandy Smith: So you’ve got your CloudWatch, you’ve got your Google operations, you’ve got your Azure Monitor, and so you’re kind of hopping around between all of those. And a lot of times folks are like, oh my gosh, I missed this 505 or 404 or error that was popping up, and it’s because I was actually really focused on AWS this week and it was happening over here in GCP or I missed it because it was in the logs, not in the metrics. And they’re kind of hopping from system to system, and they just want that single pane, or they may already be using something like PagerDuty and they may say, Hey, we’ve got all these other tools. How do we actually get alerts from this data into something like PagerDuty? How do we integrate our metrics logs and traces with something like this? And so they’ll come to us with a plug that, and we’re like, oh, well, we can bring all these connect in via plugins and the data lives where it is. And most of these cases, unless you’re scraping metrics and you put it in our metrics backend, that’s a completely different story though. So you connect in via plugins and then you can connect in via the integration. And so that’s one of the ways. The other thing may be, okay, we’re completely revamping and we’re going from a monitoring strategy to observability, and we’re going from monolith to microservices. And with that, we’re kind of changing from monitoring to observability, which is it’s a big digital transformation change.

Mandi Walls: Absolutely. And it’s good that they’re being intentional and not sort of coming at it accidentally or flailing and maybe not finding it in time and being a bit more proactive about it.

Brandy Smith: And there’s a handful, there’s been a handful of cases where it’s like, oh my gosh, we had an outage wasn’t massive, but it was an outage, and we need to figure out what we need to do so it doesn’t happen again.

Mandi Walls: Yeah, absolutely. We see that as well. It’s just kind of the nature of the beast. Things are working well until they’re not, and then you have to do something. So along with that, as folks come to you, one of the questions we often ask on the show is if there’s a myth that you want to sort of bust about observability or these kinds of things, is there anything that folks come to you with sort of repeatedly, you’re have to reset their mindset on maybe some preconceived notions or they don’t quite understand exactly what you’re offering or they kind of had the wrong idea about it?

Brandy Smith: So I think it’s a couple of things. I think it’s the observability versus monitoring piece that we talked about when it comes to Grafana specifically that we’re just dashboards. We have a whole end to end suite. And so that’s another piece of it. And we have integrations with other great tools. So that’s big thing there. And then the other thing that I see a lot of, and this is kind of more a newer thing that I’ve seen probably in the past three months or so, really thinking outside of the box when it comes to what you can do with Prometheus. I haven’t talked a whole lot about Prometheus, but network data, for example, I’ve helped a couple of customers recently. There’s SNMP exporters for Prometheus, and I think a lot of people know that, but I love to bend and really push the limits with technology.

Brandy Smith: And so I’ll have a customer that will say, Hey, I need to monitor X, Y, Z thing. Can I do it? Or I need to get observability of X, Y, Z thing. Can I do it? And it’s like, well, let’s go see if there’s a Prometheus exporter and let’s go. And I guess my advice there is if something doesn’t exist, you can write a Prometheus exporter. An exporter can exist for anything, and you can make a dashboard or you can make a plugin. So especially if it has an API. And so my advice is think outside of the box because I’ve done some really cool stuff over the past few months with customers to where it’s like, I didn’t think about that for an observability use case. And so networking is one of those examples to where I’ve had customers take out what I would say is legacy network monitoring tools. Oh,

Mandi Walls: Sure.

Brandy Smith: With just Prometheus and the SNMP exporter, that’s pretty cool stuff.

Mandi Walls: And I would imagine it’s a lot more flexible and powerful having worked with some of those legacy environments. They are what they are, and they were meant to do what they were meant to do, but they were probably meant to do it in 2003, and that was great. And

Brandy Smith: Just

Mandi Walls: For everything else, we want different and improved stuff now. Yeah.

Brandy Smith: Yeah. You nailed it. You nailed it. And some of the other things like serverless, being able to get serverless observability, so I have a blog out there on our site, but just deploying, I’ve deployed the Grafana agent into Fargate, for example, on AWS. And a lot of folks were like, oh, I didn’t know you could deploy an agent on it. And it’s like, I mean, it works. It works and it collects metrics, logs, traces, you can do it. And that all happened because I had something that was failing. I was like, I’m just going to try it. And I actually tried it for the first time before I even worked here. I was over at AWS and I was really frustrated with something that was failing and needed it to work. And so my advice when it comes to observability is just give it a go. Try it Open telemetry first. If that doesn’t work, maybe try an agent for whatever tool that you’re using, whether it’s our alloy agent or any other observability tools agent, and you never know it may work. That’s exactly what happened with me. I was like, it worked. I documented the process and now lots of folks use it.

Mandi Walls: Yeah, that’s super awesome. I feel like folks do have those kinds of baseline questions, especially about serverless. It’s super interesting, but it’s a totally different kind of thinking and environment and you feel like you’re missing all the other components that you might have had in your more native environment when you push things over there. So we’ll definitely link to that in the show notes for folks who are working in serverless environments. It’s super interesting. Awesome. Sweet. So you mentioned Open Telemetry. I don’t think we’ve covered open Telemetry on the show yet. It’s on my back list. But what is Open Telemetry and how do you integrate with that with Grafana?

Brandy Smith: Okay, so OpenTelemetry or OTel, it’s this collection of different APIs, SDKs, and tools, and you can actually use it to collect and export telemetry data. So that’s going to be metrics, logs and traces. And I believe that profiles are coming soon. And what it sets to do is standardize on telemetry data. And so it’s a CNCF project. And when it comes to Grafana and using OTel, so we’ve been OTel native as long as I know, but we have our agent called Alloy. This is previously called Grafana agent. So you’ve heard Grafana agent terminology and you’re watching this Alloy is just the rebranding of Grafana agent, it’s our OTel collector. And so lots of folks have OTel collector. So if you use AWS for example, they have the ADOT collector and I’m trying to think of other cloud collectors, but I know the ADOT collector, you can add it to a Lambda layer for example.

Mandi Walls: Oh, nice. Okay.

Brandy Smith: Yeah. So Alloy is our OTel collector, but it’s our agent too. And so essentially you would just go out and deploy Alloy and ship it to when you set up Grafana Cloud or your Grafana Enterprise or Grafana in general, excuse me, you have a list of your endpoints and one of those is going to be an OTel endpoint. You ship to that. OpenTelemetry is great because, so one, you’ve got that set of standards, I guess you could say standards when it comes to metrics, but two, it makes it to where essentially your vendor agnostic. You could say, Hey, I’m collecting these 20 metrics from my application and I’m using Grafana today. I want to go use another tool, or I’m collecting on AWS, now I want to port over to Grafana. It doesn’t matter, you’re vendor agnostic. You can take your tel collector and import from vendor to vendor, collect from multiple vendors at once. It doesn’t matter who the vendor is. It’s essentially making IT vendor agnostic across the board.

Mandi Walls: Awesome. That’s super powerful. We’ll have to cover it more in depth at some point I think in the show for folks out there. So changing gears a little bit, so you mentioned your home lab. What kind of cool fun stuff are you working on right now? You’ve got your car in the background, but what else have you been working on?

Brandy Smith: Yeah, so I love Raspberry Pis. I could talk about them all the time. One of my first projects was, this is going to probably date me, but I had a Motorola Atrix with the lap dock. I don’t know if you’re familiar with that at all, but it was this classical phone that you could turn into a laptop and I saw this raspberry pie project that you could turn the raspberry pie into a laptop. Now today you can just go buy a laptop kit. So I’m like, that’s not fair. Back then it was a very hacky project. And so that was my very first Raspberry Pi project. But since then I’ve got a bunch of little Raspberry Pi with sense HATs and they’ll tell you the temperature and humidity and all of those fun things around the house. And essentially that tells me is my climate control in the house working efficiently? Did the kids leave the lights on because it has lights sensors in it. And so I can say like, Hey, I’ll turn the upstairs lights off. You’re wasting electricity, those kinds of things.

Mandi Walls: Modern parenting. Modern parenting, yes,

Brandy Smith: Yes. That’s one use case. I haven’t fully finished it out yet, but I’ve got a little setup. We have a tortoise and so I’ve got a little set up for the tortoise enclosure, but I need to finish it and it’ll use the same thing, Raspberry Pi, and a sense HAT. I’ve got, so my oldest daughter was doing rocketry when she was in high school for her ROTC thing. And one of the things that I wanted to do was like say, could we take these rockets that they’re doing and help them get better for their competitions? And it’s just like model rockets, but could we put these little sensors on Raspberry Pis to collect metrics, logs and traces and see, okay, if we adjust these pieces of it, if we adjust certain things, could we get higher launches? Could we get all these things? And so that was one of the things, there’s actually a little module you can get.

Brandy Smith: It’s built on an ESP 32 board and it’s called Rocket ROCKIT. They’ve customize an ESP 32 board so you can do that. And so those are a handful of things. I really love just collecting the data and visualizing it and gaining insights. So like I said, I started my career in data analytics and the Raspberry Pi smart car. So that guy I’ve got connected into a Grafana dashboard and because that’s my oldest Raspberry Pi, it was overheating for a while. And so I was able to go and just using open source at the time, I was able to go and say, okay, why is this thing overheating? What’s happening with it and able to troubleshoot and dig in? And my favorite Prometheus exporter is actually node exporter for that reason because it gives you so many metrics and you can just really zero in on what’s going on.

Mandi Walls: That’s so cool. Oh my gosh, that sounds like a very cool mom thing with the rockets right there. I’m just like, we did rockets, but it was the eighties. They didn’t do anything. We certainly weren’t going to get any telemetry off of them. And that’s like NASA level cool stuff right there to launch those and know what’s going on. That is super cool. Hundred percent. Well this has been great. Is there anything else you’d like to share with folks out there who are interested? Any recommendations for getting started or anything like that?

Brandy Smith: So I would say if you’re interested in getting started with observability, there’s a ton of resources out there to just take a look at what’s going on. There’s a blog and I can share it so that we can link it here on just getting started with open telemetry and instrumenting for metrics log and traces and it’s auto instrumentation I believe for a Java app. I believe that there’s a sample app and everything that goes with it. Really easy to get up and going there. And once you get started there you’re like, oh, what can I do next? I’ve also got some other resources for self-serve learning just for observability. There’s some Grafana stuff in there too that I’ll share. The next thing that I would say is if you’re wanting to get deeper in Grafana, we do webinars, workshops, we’re open source, so there’s a ton of resources, but you’ll see a ton of those pop up.

Brandy Smith: So just check us out on the community forum, check us out, and I’m always happy to answer questions so you can connect with me on LinkedIn. I’m always happy to answer questions. You’ll see me hosting webinars, workshops, all of those things all the time. And I think all the recordings are up on our site too. You can always join in and tune into those. My favorites are the workshops because you get to get hands on and we provide the environments to you. And so those are fun. But that would be my suggestion on getting started. There’s so many free resources in the open source community, so go check it out and again, reach out if you have questions. I’m an open book.

Mandi Walls: Perfect. That sounds great. Like you said, there’s so many things to do. I feel like folks kind of maybe get overwhelmed, so having a place to start and sort of kick off their learning journeys, super helpful for the folks that we talk to. So that’s great. Thank you so much for that.

Brandy Smith: Yeah, yeah, of course.

Mandi Walls: So this has been super fun. Thank you so much for, I’m super fascinated about your Raspberry Pi work. The telemetry stuff is great. We talk to folks and our customers who are working on those projects, but the Raspberry Pi stuff’s super interesting. I have never gotten my hands on any of them, so haven’t dug into that environment yet. But we’ve talked to people who have a lot of their home lab stuff and my last guest on our book club, he’s got stuff deployed in his house for, is there water on the floor in the basement and all those kinds of things. So it’s super.

Brandy Smith: I saw one of those kits and I am going to have to go watch that one because I saw one of those kits and it’s on my wishlist.

Mandi Walls: They’re so cool. The things that folks can do with them now, it feels like it’s an interesting sub-market right now for all that sort of home automation things. It’s just me here, so I’m not watching anybody leave lights on or anything like that, but it would be super fun someday for my garden when things need water or need more fertilizer or something like that.

Brandy Smith: Yeah, I’ll send you a tutorial. I sent one to a friend, he’s a customer, but he’s also a friend of mine and the tutorial talks about, and I think it uses an Arduino, do we know, but that’s there. It talks about seeing if your seeds need water and it looks at the weather. It does all these things. It’s really, really cool. And I think they have a kit that you can get that talks you through installing all the patches and all of that. So it’s pretty cool stuff. And there’s also one out there that somebody has used with Grafana to monitor their beehives, but it’s all based on Raspberry Pi and open source Grafana.

Mandi Walls: Wow, that’s super exciting. I haven’t gotten to having bees yet, but very serious about my tomatoes, so they’re super important out there. It’s a good time for tomatoes. Alright, well Brandy, thank you so much for coming on. This has been so fun. I am so impressed with all the work that you’ve done and so excited about our partnership with Grafana, helping people out there get all the things that they need so they’re not flailing when it comes time for an incident and they get those dashboards and everything put together. Great. So everybody has what they need. So thank you so much for coming along.

Brandy Smith: Absolutely. Thank you for having me. I’ve enjoyed chatting with you today.

Mandi Walls: Awesome. For everyone else out there, we’ll be back in a couple of weeks with another episode. In the meantime, I will wish you all an uneventful day.

Mandi Walls: That does it for another installment of Pager to the Limit. We’d like to thank our sponsor, PagerDuty for making this podcast possible. Remember to subscribe to this podcast if you like what you’ve heard. You can find our show notes at pager to the limit.com and you can reach us on Twitter at page it to the limit using the number two. Thank you so much for joining us and remember, uneventful days are beautiful days.

Guests

Brandy Smith

Brandy Smith

Brandy has spent more than a decade in the tech industry. She has a wealth of experience architecting and implementing solutions for customers on their journeys through digital transformation in a myriad of technical roles at industry-leading tech companies including Google, AWS, and Cisco. She is passionate about observability and solving customers' problems. Outside of work, Brandy loves spending time with family, hiking, and tinkering with all things technical.

Hosts

Mandi Walls

Mandi Walls (she/her)

Mandi Walls is a DevOps Advocate at PagerDuty. For PagerDuty, she helps organizations along their IT Modernization journey. Prior to PagerDuty, she worked at Chef Software and AOL. She is an international speaker on DevOps topics and the author of the whitepaper “Building A DevOps Culture”, published by O’Reilly.