Scaling Runbook Automation Across Enterprise

Posted on Tuesday, Apr 15, 2025
Right before hoping on stage at PagerDuty on Tour London, Andy Slater joins us to chat about automation and how at Specsavers they scaled from 0 to over 1000 automated runbooks

Transcript

Daniel Afonso (00:09): Welcome to Page It to the Limit, a podcast where we explore what it takes to run software and production successfully. We cover leading practices using software and industry to improve both systems, reliability and the lives of people supporting those systems. I’m your host Daniel Afonso, and you can find me on X, LinkedIn and Bluesky at danieljcafonso. Welcome everyone. So [00:00:30] we are back for one more episode here at PagerDuty on tour in London. Right now with me, I have one of our amazing speakers who will also be doing a session today. I’m going to let him introduce himself first. So tell me about you, tell who you are.

Andy Slater (00:43): Yeah. Hi everyone. I’m Andy Slater. I’m the enablement lead for Observability and AI ops at Specsavers based here in the uk, but part of the Global IT team. My role is really interesting. It’s kind of the classic hybrid multifaceted role, but effectively it kind of distills down [00:01:00] to being the bridge between our development team and the rest of the business, liaising with projects, understanding demand workloads, translating really high level non-technical requirements into sort of functional requirements and technical requirements. And then working with our team to deliver that in a little bit of a Scrum master slight light product owner type approach.

Daniel Afonso (01:23): And what are some things you’re passionate about? Tell me about yourself.

Andy Slater (01:27): I’m an IT grad, although I’m not necessarily hands-on in [00:01:30] the role I do now, but I’m kind really passionate about technology. I’m passionate about the way we use technology at Specsavers. We have a really kind of core mission and a real kind of value to add to our customers. Everything that we do is geared up to improving the site and eye care and healthcare of people’s eyes and ears and everything we do is truly distilled down to that kind of focus on our customers that visit our stores that may need help, may not know they need help. [00:02:00] We consider ourselves almost as kind of a health provider first retailer second.

Daniel Afonso (02:06): Okay. Yeah, that’s really exciting. Could you tell me what are you doing here at PagerDuty on Tour

Andy Slater (02:11): London? Yeah, so I’m really excited to be speaking to Marty later on stage and talking about how we use Runbook automation at Specsavers to really kind of deliver and scale the amount of work we’re able to deliver, react to incidents, resolve incidents without the need for kind of human in the loop in a lot of cases. So yeah, some cool stuff that we’re doing and to share that [00:02:30] later on.

Daniel Afonso (02:32): And speaking about automation, this is a more of fun question and I’ve been doing it with some of other speakers. So if you’re listening and you’re listening to the PD D on tour special episodes that we’re doing here, I’ve always had a version of this question. So the question I have for you is how would you explain automation to a five-year-old?

Andy Slater (02:53): Yeah, so I have a five-year-old at home, so I should probably have this answer nailed down. But yeah, she thinks I spend a lot of time talking [00:03:00] to people on the computer and sending emails on my phone. So it doesn’t necessarily help with the answer to this question, but I suppose to distill it down, our purpose in automation is to save people time and effort. How can we take away really repetitive tasks, things that people have to do daily or weekly or things that can take a lot of time, but they’re actually quite basic tasks with lots of different parts to that task, but that could introduce make a mistake and part two ruins the whole [00:03:30] thing. You have to do the whole thing again. Can we use automation to actually make things better in that regard? So yeah, I suppose if I was kind of explaining it to my daughter, she’s really into the Wallace and Grommet films and particularly the wrong trousers, and Wallace wakes up every morning instead of getting out of bed and getting dressed and brushing his teeth and making his breakfast, all that kind of happens and he kind of ends up at the breakfast table covered in toothpaste and food.

(03:54): That’s sort of what we do.

Daniel Afonso (03:57): I love that answer. That’s [00:04:00] why I keep asking this one because there’s so many different stories and it’s so fun. Thank you for that. So one thing that you’re going to be talking a bit on your session a bit later, so this is kind of a tease if you’re listening to the podcast, I recommended that you then go watch the recording, which is going to be available and it’s going to be linked here on the platform, so you should be able to also go through them. One thing you mentioned is you had a bit of a journey from going from zero to a thousand runbooks, and this is part of a journey. So could you walk us through [00:04:30] how you approached building your first automation and what made them successful enough to scale to this number that you have right now?

Andy Slater (04:38): Yeah, sure. So I wasn’t around when we first started using rundeck. That was probably over a decade ago now. And I suppose it started, a lot of people do very organically, very much a kind of, we could use this to do a few bits here and this could help. And then that kind of spiral from a couple of developers doing stuff in the open source [00:05:00] Rundeck community platform to growing and growing and growing and more and more people involved. And then we started to need to think about, well, how do we formalize this a little bit more? How do we put a few more guardrails around it? How do we keep the kind of almost open source community feel to that within Specsavers? But again, just protect ourselves and mean migrating through the release cycles and things like that became a little bit easier.

(05:28): Our platform, [00:05:30] we are responsible for the reliability of the platform. That was getting a little bit more difficult as there were more directions people were trying to take it in and more things that people were doing. So we made a challenging migration from community to enterprise probably two years ago now, and that enabled us to not necessarily draw a line in the sand, but think about how we wanted it to look going forwards and design to that and then bring things through. So rather than just do a direct [00:06:00] lift and shift, we tried to do as much future casting as we possibly could, as to what’s this going to look like in a few years time. We’re now at that point and probably say we’re maybe 75, 80% successful in what we thought, but kind, that’s the nature of it. Things move and the constantly move and there’s more features we can take advantage of and more things we can do.

(06:20): And we’ve now started to become less and less of the doers of everything, which is still a huge amount that our [00:06:30] team do. But trying to give that access and availability to not just developers across the business, but people within our IT service desk. They can start to automate the issues that they see day to day and adopt either sort of a pad programmer type approach or kind of a mentor type approach. And we can, as I say, apply that layer of governance and apply some polishing and best practice and standards to the work they deliver, but it allows us to grow that exponentially. And yeah, it’s really that kind [00:07:00] of rolling stone gathers pace type approach, which has really worked for us and really worked well.

Daniel Afonso (07:05): That’s really exciting, like, its expanding. It just keeps growing and growing and rolling. I’m really excited to see what’s coming next. And speaking of what’s coming next, what’s next for Specsavers in the automation world at this point?

Andy Slater (07:18): So every organization we are partway through one digital transformation and starting two or three other digital transformations at the same time. As I said before, it never stops right in it. So [00:07:30] yeah, that will open lots of new doors and bring lots of new challenges to us as a platform team, but we are spending a lot of time to focus on how do we get more and more people utilizing the platform and kind of democratize that access as best as we can. I talked about the kind of service desk getting involved. We’ve just run a really successful pilot with our security team and our InfoSec team who’s growing exponentially as I’m sure every organization is at the moment. [00:08:00] So starting to tackle different challenges and create a secure area for them in a slightly different way to how we’ve used rundeck before, but they’re really starting to see the power and the value of that and their sort of runbook executions that’s grown exponentially by the day as well. And we’ve got a number of different internal products that will be released over the next five years. And it’d be interesting to see how runbook [00:08:30] automation fits into that new world because a huge proportion of what we do is looking at some of our legacy infrastructure, but that’s growing and that percentage is starting to shift as we shift percentage wise in terms of using slightly more modern microservices infrastructure. Again, the same challenges everybody’s facing. Right?

Daniel Afonso (08:50): Yeah, I think that’s pretty common right now everywhere for everyone. So I don’t want to take more time for you. I just have one more question and [00:09:00] the question is, what is the one thing you are currently excited about and you look forward to

Andy Slater (09:04): Exploring for the rest of the year? Yeah, well, I’m certainly excited to hear a little bit more about generally like AI today. I think it’s kind of, AI is one of those horrible marketing buzzword that means so much and so little all at the same time. But I think that generative AI and that kind of copilot type approach is going to be massive for us, particularly as we grow with I suppose less deeply technical people using the platform, [00:09:30] the ability to kind make them a coder, we can start to enforce some coding standards and try to push everyone towards Python is our goal. It’s not going to work for everything, but actually it doesn’t matter whether someone’s done a Python developer or not, they can use generative AI to produce the code and just kind of smooth that access in and make it super easy for everybody to kind of get involved.

Daniel Afonso (09:54): Yeah, that seems definitely really exciting. AI is everywhere at this point. You can not miss it. And there’s definitely [00:10:00] going to be some discussions about it here on PD on tour. So if you’re watching listening at home at this point, feel free to go watch the recording. And he’s going to have a session with Martin later today. So also go there and listen to it. Andy, if anyone wants to connect with you online, where can people find you?

Andy Slater (10:18): Yeah, you can find me on LinkedIn and I think they’ll share my email address probably as part of the slideshow or will get in touch through your PagerDuty account team. We’d love talking to other customers to see what they’re doing that we can steal. And equally, [00:10:30] we are not afraid to show what we’ve done and share that kind of knowledge as well. Thank

Daniel Afonso (10:35): You so much, Andy. Thank you for being here and for everyone back at home or wherever you’re listening, have an uneventful day. That does it for another episode of Page It to the Limit. We would like to thank our sponsor, PagerDuty for making this podcast possible. Remember to subscribe to this podcast [00:11:00] on your favorite streaming service. If you like what you’ve heard, you can find our show notes at pageittothelimit.com and let’s continue this conversation on PagerDuty Commons at community.pagerduty.com. Thank you so much for joining us and remember, uneventful days are beautiful days.

Show Notes

Additional Resources

Guests

Andy Slater

Andy Slater

A Certified Scrum Alliance Product Owner & Scrum Master. His role is to bridge the gap between developers, different technical teams and the business to ensure they’re continuing to drive value from their Observability and AIOps tools. Across the course of a working week he tend to wear a number of different hats rotating through the role of IT Service Delivery Manager, Product Owner, Business Analyst and Consultant. He facilitates and leads learning sessions and drives community programmes as well as working closely with their 3rd party onshore and offshore teams and solution providers to define best practice, governance and working standards.

As an IT grad and self confessed, a bit of a tech’ geek, Andy tries and keep his finger on the pulse and regularly attend industry and tech community events and expos as well as being a regular judge and contributor to Computing magazine awards and events; all enabling which enable him to talk technical and business in the right context.

Hosts

Daniel Afonso

Daniel Afonso (he/him)

Daniel Afonso is a Senior Developer Advocate at PagerDuty, SolidJS DX team member, Instructor at Egghead.io, and Author of State Management with React Query. Daniel has a full-stack background, having worked with different languages and frameworks on various projects from IoT to Fraud Detection. He is passionate about learning and teaching and has spoken at multiple conferences around the world about topics he loves. In his free time, when he’s not learning new technologies or writing about them, he’s probably reading comics or watching superhero movies and shows.