Mandi Walls: Welcome to Page It to the Limit, a podcast where we explore what it takes to run software in production successfully. We cover leading practices used in the software industry to improve the system reliability and the lives of the people supporting those systems. I’m your host, Mandi Walls. Find me at LNXCHK on Twitter. All right. Welcome back, folks. This week my guest is Abby Bangser and we’re going to talk about platform engineering, cool stuff like that. So Abby, tell us about yourself and what you do.
Abby Bangser: Yeah, absolutely. Hi, I’m Abby. I work at Syntasso as a principal engineer and our goal at Syntasso is to help platform engineers thrive, to help them with their jobs and how to build internal developer platforms and internal platforms that support organizations.
Mandi Walls: Just level set for everyone out there. Maybe they’ve heard the platform engineering as a term, they’ve probably seen it somewhere on LinkedIn. It’s very popular right now. But from your side, what is platform engineering and why should folks be thinking about it?
Abby Bangser: Yeah. It’s a great question because it’s absolutely on the hype cycle right now. Another role that I play is I am a co-lead of the platforms working group within the CNCF or the Cloud Native Computing Foundation. And that is a not-for-profit community-driven group that is trying to help people with wrestling with exactly that question you just asked me. And we’ve released a few papers. Back in April of 2023, we released a whitepaper that defined platforms and that’s sort of the outcome of platform engineering, right? So platforms are the ability to standardize and support the delivery of software in your organization and speed that up through that standardization and through that support of many teams, platform engineering being the act of delivering those platforms. Right? The act of designing what does the organization need and then buying or building the pieces of that platform that you need to be able to support your teams. What’s been exciting is we’ve also progressed as a group and released a maturity model more around that platform engineering side, that how do you actually deliver these platforms? And that was just released at the end of 2023 in I think November. And that focuses on how you invest in those internal teams that support these platforms, how you create feedback loops of adoption with the users of those platforms because they are products that need people to actually use them to be able to get benefit from them, how those users will interface with it, and basically, how do you use platform engineering to create a software product that is a platform that helps your business be successful?
Mandi Walls: How did we get here? I understand the opposite is probably complete chaos, but how did we get to the point where we now have an entirely sort of new discipline for folks to look at?
Abby Bangser: For me, one of the big questions I get a lot aligned with that question is, isn’t this going back to the world before DevOps? Didn’t we already go here?
Mandi Walls: Right. We played this game before, right?
Abby Bangser: We played this game before and I feel like I’m restarting. And in many ways we are restarting but we’re restarting with learnings and that’s what we do all the time in software, all the time in this. We’re learning from our past, we’re trying to iterate on things. Same thing happened with containers, same thing happened with virtual machines and all these things, they’re just iterating. And so if you look at the past when we’re experiencing a world that some people built software and other people had to run it in production and that was painful because the feedback loops weren’t getting to the right humans, this was an issue. So we iterated on that and we said, let’s make sure the people that are building software are also the people that are running that software so they can get that feedback loop from users. This is great. Then we created DevOps. And that solved a lot of that problems of the divide, but it created new problems. All of a sudden we have super high cognitive load on our teams. What does full stack mean? Is it going down to plugging in the servers or only down to cloud providers? Even if it’s only down to cloud providers, do you need to know how to secure and optimize your database as well as design a really fantastic user experience on a mobile app? I don’t know. That doesn’t maybe feel right for a small group of humans. And so we had new problems and now we’re iterating on those new problems. And what we’re doing is we’re learning. We’re learning that we don’t want to divide how you deliver software by builders and operators because we need feedback loops. We don’t want to go back to the old way. But we do need to divide somewhere. And so we’re using I think some of the concepts behind what we do with domain driven modeling and software and we’re calling that under team topologies I think has done a great job. The authors of that book from 2019 have done a great job of verbalizing this idea of dividing the organization so that each group is building and delivering something that other teams depend on, but they are now dividing those things in a way that are more manageable by a reasonably sized group of humans. And so what we’re doing is we’re taking these internal tools, we’re creating them into products that can be built and managed and operated by a team and depended on by other teams who can then build, manage, and operate on top of those platforms.
Mandi Walls: So what does it look like for folks? Back in the bad old days, if I needed more storage, I put a ticket in for the storage team and they tell me they have no [inaudible 00:05:34] left, so it’s going to be eight weeks until we get something else added to the data center or whatever. And we’re not going back to that, but what’s a good platform look like for engineers?
Abby Bangser: I will say in some places we are going closer to that big bad old world than we’d like to, just to call out. And I don’t think that’s good, but I do want to… Sometimes I think it’s reasonable to call out what is a reasonable path for people to have taken, but also one that may lead you to pain. So if you find yourself at the beginning of this path, step back and have a think about what it might look like in a year or five years. And I say this because I was here, I have delivered internal platforms for many years now at organizations and many times the solutions focused around automation of infrastructure, so things like Terraform modules, things like chef cookbooks and Puppet and all these things as well as helm charts when we came into the Kubernetes world. And I started to expose those to my users, so the software teams. I said, well, if you’d like a database, you can use this Terraform module and create your database and if you’d like to deploy to Kubernetes, use this helm chart and fill in the values file and you’ll be great. And that was helpful except that then I still was in this queuing world. So you mentioned the queuing world. If you have to put in a ticket and ask for something, it takes weeks to rack and stack the servers. It’s still a queue if someone has to make a pull request into a repo to get that approved by someone that they don’t have direct control over or collaboration with where it’s happening in real time. And I think that’s where we’re getting to is some of these template based platforms. And so you asked me what does good look like? And what good looks like is a world where you can have self-service and on-demand access to things. You no longer need to wait for pull requests to be approved because the safety nets of those pull requests have been built into the response to a request, that automation includes all the safety net checks and things like that. And it looks like the ability to basically make calls to APIs so that you can get access to the tools when and where you need them. Do you need a CLI or direct API calls from your continuous integration and deployment pipeline? Maybe. Maybe though you want a website that you can click around because that would be helpful to you in your context. And by building platforms as APIs, you can then also build those interfaces on top of those APIs to meet your customers and meet the users where they need to be at any given time.
Mandi Walls: Yeah. And what kind of things do you see folks including in their platform? [inaudible 00:08:11] you start out with maybe your cloud infrastructure. Where do you go from there? Do you also include monitoring and metrics and observability and maybe PagerDuty in there somewhere?
Abby Bangser: Yeah. I think we talk about in platforms there’s capabilities and then there’s the cross-functional aspects to those capabilities. So your capabilities often do start quite low level, things like a database or a bucket or something that is exactly transferable to infrastructure but can build up on each other quite well. So you have the ability to create a database and a web service and a bucket and a queuing system. So all of a sudden you have the ability to just request a test environment that depends on those things and you no longer have to think about the database and the queuing system and all those things. You just get your test environment and up and up and up the stack. So yes, I think often you start with those building blocks, you make those solid and then you build up as you go. Each of those then have characteristics to them that are supportive of reducing the load on developers. So we’ve talked in the past about shifting left security and getting earlier and faster feedback, but shifting left doesn’t actually remove complexity or remove risk or remove the risk of misapplied security policies or anything. All it does is just tell it someone else’s job and maybe hopefully find issues sooner. Cool, great. When we start talking about pushing down into a platform, now we’re actually abstracting away the complexities and we’re building in repeatability and building in confidence in the application of those policies and requirements. And so when you say things like would monitoring be involved? Would observability be involved? Would security and compliance be involved? Yes, all of those would be applied ideally across all the components that your platform exposes and things that are just built in for you, including things like being able to page people and what are the processes? How do you sign people up as owners of a service and therefore on the rotation in PagerDuty and things like that? So all of that I think can be abstracted away and that’s what the platform engineering team should be focusing on doing for you.
Mandi Walls: Yeah. That’s a lot. How do organizations figure out they want one of these or it’s time to invest in a platform engineering project? Because it sounds like it’s going to be a big investment and a lot of resources to get something like this built and the trade-offs downstream of that. Where’s the tipping point? It sounds like yes, there’s a lot of frustration, there’s a lot of things like people were trying to do too many things. So when do you step in and say, hey man, it’s time for you to have a platform?
Abby Bangser: Yeah, I’ve heard some people try and put numbers to it and I just am not even going to try.
Mandi Walls: Probably all vibes man. We don’t want numbers.
Abby Bangser: So rather than trying to put numbers, I think that the work that we did on the maturity model spoke to characteristics of what you need and what you want. And so we made the levels of the maturity model speak to the requirements of your organization. Are you at a stage where you just need to be able to move fast and you’re not really worried about a lot of things? Great. You still need a platform, but your platform might be direct API calls to one of the cloud providers. It’s still a platform. You are not literally buying the pieces, building your own computers and working your way up from there. So let’s just acknowledge we all have platforms. But from on there, eventually you’re going to need to get to operational state. You’re going to have users who are dependent on your software. You’re going to have a growing set of internal tools and services that require maintenance and you start to grind to a halt on new delivery if that maintenance becomes too large. So when you need to get to operational, when you need to get to the ability to run your systems at the scale that you need them to and you’re finding your team is not suitable for that anymore at that scale, you need to start investing more into your platform. And here’s where you probably do want to start having a dedicated team because you want to start having… Be more feature driven, be able to actually start to consolidate some of those different solutions that people have made during the provisional state. But you can stay there for a very long time if you’ve got only a handful of devs or even a handful of teams or so forth. Where you really start to go to the next level of internal platform investment is when you need to start scaling. And scaling is often based I mean on your organization scale, you’re scaling your people, you’re scaling your teams, you’re scaling your reach of sales and things like that. It can be scaling of your systems, but often operational can stretch quite far into horizontal auto scalers and things and it all works. And then eventually you’re now a scaled organization and what you’re looking for here is cost savings and you’re looking for optimizing your organization and that’s the next level of big investment into your platform is where you go, it’s not good enough that we can onboard a new team easily or create a new service easily. What’s actually the issue is that it costs us time and energy to do those things still. We want to really tune in on things. So those are our four levels of investment and I don’t think that they go to specific numbers as much as vibes, vibes of what your organization requires to be successful.
Mandi Walls: Yeah, I totally see that. It gets to the point where you feel like I’m too busy doing all these things that aren’t delivering direct customer value and that’s not where you want your engineers to be. You want them in that path to deliver things that customers want and not fiddling around in the cloud too much.
Abby Bangser: You want that, but you also acknowledge that without the security and the investment in performance and the investment in all these things, you’re going to also screech to a halt and you need those things. And that’s exactly right. It’s that divide and that team topologies model between the stream aligned teams who are investing in the things that actually make your business money that are aligned to your business requirements and the platform teams and the platform group that is accelerating those teams. So they are both working towards the business goals, but they’re doing so from different perspectives with different lenses.
Mandi Walls: Yeah. So what’s the shape of a platform engineer, someone who’s going to be actually doing that work? It feels like they need to know a lot of how APIs work and how to manage all that stuff, but also you’re going to interface with the people who are using your thing a lot more than if you’re building things for external customers. So there’s a bit of being a little bit more empathetic maybe than you might have to be if you’re just building backends for customers you’re never going to meet.
Abby Bangser: I feel like that empathetic is such a good call because you do see your customers more, but also you feel like you are them more. So if you’re a software engineer and you’re building something for medical research and you’ve never done medical research, you’re sort of just like, tell me what you need, I’ll build it for you. And you sort of are like, I’ll listen to you, you’re the expert in this space. When you’re an engineer building for other engineers, if you don’t have empathy and you don’t actually reach out to the golden rule, treat others how they want to be treated, you get sometimes into these bad spaces of you building based on what you think you need and that can actually be more dangerous. So you asked about the shape. I think I’d speak more to the shape of the team because just like any product team, you need different skills that are very hard and unlikely to be in any one person. So build a team that can achieve it. Build a team that has someone who has user research oriented skillsets and interests and can think about that kind of user interface side of things. Build a team with a group of people who have software delivery experience with building APIs and how do you organize your verbs and how do you version control these things and how do you roll out and deprecate things? That’s a skillset that you’re going to need. But also you’re going to need the skillset of the team that can actually implement the backend of those APIs, understands the infrastructure, understands the scalability and the risks and the performance of those infrastructure capabilities. So I have not found the ability to scale those skill sets in single humans before, but I do find that in teams it’s exactly what you’re looking for for any other product team. You need people who do front end and backend. You need people who can test the product and who can plan the product into the future. And we just need to apply that now to our internal products as much as our external.
Mandi Walls: Yeah. And I feel like one place where I see that fall apart more than other things is that there’s no product manager for some of these internal built tools and you’re just kind of like… Folks assume that particular team of engineers can operate without a product manager and you’re like, whoa, dog, we totally need a product manager on this thing and we don’t want to forget that that’s a whole other discipline that’s going to help us move forward too.
Abby Bangser: Absolutely. And you said product manager and that’s so key because I think at most what you get is project management at these levels. I won’t say at most because I’ve actually met some amazing internal developer portal and platform product managers recently, but it’s a newer discipline. I think many organizations spend their time with project management at most. You’re exactly right, and this is where that maturity model, one of my personal goals of being involved in the work was to help teams call out where they need the support. And so when you move from that ability to just automate things and make them operational, to making them scalable and make them actually work for the organization, you need to start treating it like a product. You need to start having roadmaps that don’t just speak to what’s on fire most, but also what has the opportunity to make a difference. And that is where you start needing those skill sets of product management. So great shout on that.
Mandi Walls: And these become critical paths then, right? This thing has to be up. It has to be reliable, it has to be available so that everybody else in the house is able to get their work done as well. So it’s not toys, it’s actual mission-critical product streams they’re putting together here.
Abby Bangser: Absolutely. You’re building your own internal cloud, right? You’re building what AWS, GCP, Azure, they give us from public cloud resources where they’re API driven and completely managed services for you. That’s what you’re building internally on top of tools like public clouds and other SaaS providers that you might depend on. So all these public cloud providers, they have on-call engineers, making sure that your databases are running and that they’re up to receive API requests and all that. This is where we’re not going back to that archaic age of build separate from operate. You are building and running and operating this platform for your organization, meaning you will be on call, you will be in PagerDuty, you may have a different shape of callouts, you might have much higher level callouts during the working day of your engineers, which depending on your organization might be 24 hours a day, but yeah, you need to be on call for that. You need to be supporting that just like any other product needs to be supported.
Mandi Walls: Yeah, definitely. I’m learning a whole lot because I didn’t have a good feel for platform [inaudible 00:18:55] because on our space too, things are kind of muddied because there’s a whole lot of chatter about internal developer platforms, things like Backstage and yada yada that are I think muddying the conversation a little bit about what platform engineering is doing as well. It’s getting a little confused back and forth between these IDPs and platform engineering. How do they relate?
Abby Bangser: Do you remember the whole, I mean it continues today so you don’t have to remember too far back, but CS. Is it continuous delivery or continuous deployment? Or both. Or neither? Or what? IDP, is it internal developer platform or internal developer portal? I don’t know. Both? I don’t know. So you’re right. There is a muddy world out there right now and when you mention something like Backstage, in my opinion, that falls under the portal space and that doesn’t come with any negative connotations. It comes with grouping connotations. A portal is about an interface that is typically graphical in nature, typically web facing nature, not always, but typically. And there’s some really great tools out there that do that. And Backstage is a great example of an open source one, there’s Compass from Atlantis, there’s Port from Get Port, and so there’s a bunch of these tools that focus on the collection of information into a web interface so that you can understand what’s behind the scenes. Some of these tools help you also create the logic behind, but many of them focus mainly on that interface. Now what’s behind there? And that’s why I mentioned those APIs because a graphical user interface of a web-based portal is good, but it’s not complete. What happens when you need to call creation of a test environment from your CI/CD pipeline as much as from your laptop? These are going to look very different. And so having that logic external from any one interface including your portal is important. And so IDP is sort of a sailed away set of acronyms, even platform versus portal I think is a difficult one to differentiate at this point. And I’m not one to try and get into arguments over are you using the word correctly? I think as long as we are all acknowledging that there is value in interfaces and there is value in shared business logic in an API behind those interfaces so that you are not locked into any one interface, whatever you want to call it.
Mandi Walls: Awesome. Yeah, we’re looking into [inaudible 00:21:19], we’ve got a new plugin for that and all that other kind of stuff. It’s an interesting conversation and I get the value of having those kinds of things aligned so that folks are there. One other weird question, what’s the adoption like? In the past as I’ve worked with larger organizations, there’s always a couple of teams that will be like, no man, we’re going off on this side quest. We don’t want to run what everybody else is running. We really, really need this one special crystallized thing that nobody else wants to know about or whatever. And then you’re stuck with this weird little thing in your infrastructure that doesn’t look like anything else. How do you get those kinds of wacky people to come on board? The platform’s great, the water’s fine, we’re all going to learn to do things together.
Abby Bangser: Yeah. So adoption is part of the journey of building a platform for sure. A lot of people will tell you that platforms should only ever try and support the 80% case because otherwise you’re stretching too much into the sides. I have a personal theory that that’s because a lot of platforms aren’t flexible enough to stretch and especially when you’re going out and getting prepackaged solutions, they can’t do the things that are on your edges and so therefore they have to say 80%. But that’s just a little bit of a theory. I think the intention behind saying focus on the paved road, focus on the main use case is very important and I wouldn’t want to detract from that point of focus on the 80% first and then worry about the edges on the side. How do you bring them on board? So a lot of people will say, oh, you need your platform to be optional. You need people to want to use it or else you won’t get real feedback from it. And it’s like, okay, until you’re in a compliance heavy environment or until you’re in really any enterprise level organization that you need to have that consistency of access and things. And so it’s really hard to say optional. The only way I think that you could really talk about optionality is by making clear what good looks like at any level. And so where your optionality is the highest is in pre-production environments in those sort of explorer style teams within an organization or products within an organization. And where you’re at your lowest is your production, moneymaking, compliance heavy environments. Define what is required of each of those environments and make sure people have the ability to test themselves against those requirements at all times before they get started delivering something to that environment which will then enable them to make the best decisions for them. So if you make requirements in production of meeting PCI compliance because you’re taking credit card details or of having a certain level of scale that you can manage or other things, make sure those are things that are made visible to the application teams that these things are required. And if you use these provided platforms, you tick all the boxes off that. It’s just done. Those environments are going to be very hard to build your own and still tick those boxes. But if you don’t make those boxes visible, people are just going to look at that platform and go, I don’t want to use that. I can do it myself. I can make a VM. That’s all that thing really does. And they won’t realize it. So make visible what it is that you’re trying to abstract away so that people can make an informed decision on if they try and pave their own way. But also be acknowledging that if they’re trying something new, that’s okay. And that’s even encouraged because that’s the only way you learn how to progress your platform offerings is if other people try things. So if there’s a world where you have an exploration style product in a non-production, non-customer data kind of world, offer opportunities to let people try things out and just set other expectations. It needs to be tagged in a certain way so that we at least know that the infrastructure’s running. It needs to be scanned in a way so we know we’re not running insecure images. So certain base level security and compliance and cost efficiency kind of things are all built in and they might be able to do that using some brand new technology that isn’t on your platform and that’s okay. Make it visible and that’s the only way that you can make people accept and appreciate what it is your platform is actually offering.
Mandi Walls: Appreciate is such a good word for that, right? Yeah. We don’t want you to struggle with all these other things. We want you to acknowledge, appreciate, have a good time doing your job without being bogged down with all these other crazy bits that are also required. Yeah, we acknowledge you don’t have the expertise for and don’t need to because we’ve got the experts over here in this other team and they’re going to help you out. [inaudible 00:25:41]. Awesome. I’m thrilled. I learned so much more than what I was getting out of some of the hype stuff about platform engineering. So this has been great. The question we ask quite a bit on the show, are there any myths you’d like to bust about platform engineering? What do folks get wrong sort of out the bat that they’re just like, no man, that’s not what this means, this is not what we’re after?
Abby Bangser: One thing I have come to realize is that I don’t believe you can buy your platform, but I also don’t believe that you should build it all from scratch. So people will say, platform engineering is expensive. Those are some really high paid engineers and they’re not really getting your company any money. They’re not building your products that you’re selling. Why don’t you just buy the product? That’d be great if you could, but the reality is, the level of customization that you need is often really stifled by purchases that are off the shelf, right? Your business ends up having to bend itself to the products you purchased, and that’s okay when you’re at a certain scale, but when you’re small, when you’re just getting started, you’re sort of building your processes around the products that you’ve purchased. But if you’re at a certain scale and you’ve grown to there, you need to have those custom business processes that are part of what makes your business special. There’s a reason your business is special and not like your competitors. How do you enable that special sauce inside your organization? You have to build it. Now what ends up happening is people have this debate over, well, build versus buy. And I just don’t think that’s the answer because no, you also shouldn’t be building from scratch. Just like we don’t want our software teams building from plugging in the servers and all the way up, we also don’t want our platform teams building from plugging in servers and all the way up. We want them building on top of the shoulders of giants as well. Buy as many platforms as a service, software as a service, tools as you can possibly leverage and benefit from. But you need ways to be able to tie these together. And I think that this is this terminology that came out of, again, team topologies, a finished viable platform and also out of Gregor Hohpe’s idea of a floating platform. When I say you can’t buy your platform, I mean that you need to build the interfaces, the APIs that your engineers are going to speak to because otherwise you are tying your organization to the implementation of the tools you have purchased today. So build your APIs, build your platform with your logic, with your business processes in it, but that can be as thin as possible, as thin as you can make it. Make it only the bits that are unique to your organization and outsource the rest of it to tools if you can. So sort of a myth about needing to either build or buy your platform when in reality you have to build it, but you can build it really thin and buy the rest of it.
Mandi Walls: Yeah. Awesome. Excellent. That makes total sense. You mentioned the CNCF platform group and some other things there. We’ll put the links to those websites in the show notes and the maturity model and all that great stuff. Are there other places folks can be going to learn more?
Abby Bangser: On that thread, there’s going to be a co-located day at KubeCon in Paris in March. That’s going to be fantastic. We’ve had the highest number of submissions for a first time co-located day ever with 225. So it’s going to be a really exciting day with a lot of energy around it. So if you’re in Paris for CubeCon, please come to that. Other communities, there are a lot of kind of communities, but a lot of them are built around tools right now. So I don’t know if I have any strong recommendations that are as tool agnostic as the CNCF group. That doesn’t mean you shouldn’t be investigating the tool communities because they bring with them a lot of information and a lot of opportunity. But yeah, not really any other kind of agnostic ones to look into.
Mandi Walls: And where can folks find you online? Where are you living these days?
Abby Bangser: I’m all over. I’m on all the socials, so LinkedIn, Blue Sky, Twitter, Macedon, and I mentioned the CNCS Slack. If you’re in there, that’s a really great place to get ahold of me and have a bit of a chat about your experiences with platform engineering because I’d love to hear them.
Mandi Walls: Awesome. And we want to hear from folks out there too. If you’re interested in this, give us a shout. Yeah, this has been great. I’ve learned a lot. I hope our audience has learned a bunch as well. It’s been great having you on. Thanks for coming to the show.
Abby Bangser: Thanks for having me.
Mandi Walls: Awesome. For everyone else out there, we’ll wish you an uneventful day. We’ll be back again in a couple of weeks. That does it for another installment of Page It to the Limit. We’d like to thank our sponsor, PagerDuty, for making this podcast possible. Remember to subscribe to this podcast if you like what you’ve heard. You can find our show notes at pageittothelimit.com and you can reach us on Twitter @Pageittothelimit using the number two. Thank you so much for joining us and remember, uneventful days are beautiful days.
Abby is a Principal Engineer at Syntasso delivering Kratix, an open-source cloud-native framework for building internal platforms on Kubernetes. Her keen interest in supporting internal development comes from over a decade of experience in consulting and product delivery roles across platform, site reliability, and quality engineering.
Abby is an international keynote speaker, Team Topologies Advocate, and co-host of the #CoffeeOps London meetup. Outside of work, Abby spoils her pup Zino and enjoys playing team sports.
Mandi Walls is a DevOps Advocate at PagerDuty. For PagerDuty, she helps organizations along their IT Modernization journey. Prior to PagerDuty, she worked at Chef Software and AOL. She is an international speaker on DevOps topics and the author of the whitepaper “Building A DevOps Culture”, published by O’Reilly.