What is lifecycle management and why does it matter to platform teams?
Learn why software lifecycle management isn't just about speeding up Day 0 creation, but about the full upkeep and maintenance of Infrastructure and Security Lifecycle Management on Day 1-N.
» Transcript
Hey, excited to be here today talking about lifecycle management. Oftentimes when we think about building modern cloud applications, we’re thinking about how do I build the app and get it running in the cloud? And oftentimes that’s only, I’ll call it the easy part.
The other challenge, the other 80%, is the full lifecycle as we think about building and maintaining that application over a long period of time. And so today that’s what I want to talk about.
» What is lifecycle management?
What is lifecycle management? What are the concerns that we need to be thinking about and how does that play into application and infrastructure and security management? So with that, let’s jump right in.
When I think about lifecycle management, I like to think about it through the course of: what are the Day 0, Day 1, Day N things that need to be done? What do I mean by that?
» Day 0
If I start at Day 0, I’m building a new application, for example, and I want to deploy that thing into the cloud or provision resources or write code for it. The first thing I need to do on Day 0 is I actually need to create it.
Whether I’m talking about creating the application or creating the underlying infrastructure, that is a creation problem that sits at the heart of Day 0, and that gets me to Day 1.
» Day 1 and Day 2
Now I’ve created my thing. Almost invariably, unless we’re doing something very simple or it’s really only meant to be used one time for an event or something like that, there’s often a need to update it. And the updates could be driven by many different things.
The updates could be because we have a new feature we want to add. This might take us to Day 2. When we think about that transition from Day 1 to Day 2, there’s a lot of different things that could cause an update.
It could be, like I said, a new feature. We’re expanding something. It could be it was a bug fix, something is wrong. It could be we need to do some other form of update or change. Maybe we need to scale up or down. There’s a bunch of things that could happen that cause us to go and have to make an update.
But fundamentally, that’s how I think about the difference between Day 1 and Day 2 — we’re now past the initial creation. Now of course, Day 2 is really an ongoing thing. You can think about this as a perpetual process because obviously I might have more updates, more fixes, more capabilities, etc.
» Day N
But eventually over a long enough time horizon, you get to the final day, day N, terminal, and this is where you might be finally deleting it. Maybe it was an application that was only meant to be used for a certain period of time, or you’ve built a replacement for that application or it’s no longer needed.
So by the time you get over here, it could be that you’ve end-of-lifed the particular thing, it’s run its course. It could be that maybe this was actually only used in a development environment, so it was never meant to be a long-lived thing. I’m bringing it up, I’m doing some dev tests and then I’m tearing it down. And so this could be part of cleanup of unused resources.
There’s a lot of things that drive you through this a lifecycle. But fundamentally, you can see that we can think about a lifecycle for a lot of different things.
» Concerns of Day 1-N
What are the concerns in each of these phases? You’ll start to realize that they’re slightly different. At Day 0, I have nothing. I’m at the start. So what we have to deal with is nil. I don’t have a lot of concerns. My major concern is: how do I create this? And often, how quickly can I create this?
When I think about that concern from Day 0 to Day 1, oftentimes it’s really about enablement. How am I enabled to create this thing? How quickly can I go do that thing? I don’t have anything. So however quickly I can get to Day 1, that’s a win.
Once I get past that though, once I’ve done the initial creation, now I’m in this Day 1, Day 2, all the way to Day N land, my concerns actually start to evolve. They start to become a little bit different.
I do have an existing thing. Maybe I do have a bug I need to fix or there’s a security update I have to apply or if something’s not working, I have to scale it up. You can start to see those set of concerns are actually somewhat different.
You start to get into these operational concerns that span the rest of this curve here. And oftentimes these fall into a number of different categories.
» Cost
One of them could be cost. How much am I spending to run this application? Maybe I provisioned an extra large database, or I’m running 50 copies of this application when I really only needed five. Somewhere in Day 55, my boss emails me and says, “Why is this thing so expensive for us to run? It as four users?” So there might be a cost concern that forces a set of updates as we optimize the architecture.
» Risk
At the same time, almost invariably for most things, there’s a set of risk concerns and it comes up in different ways.
It could be we discovered we’re running a Rails application and there’s a critical vulnerability that was discovered that allows someone to do a SQL injection. Okay, that’s obviously bad. We don’t want that type of a security vulnerability in our application.
So that’s a risk concern of —How quickly can I update my application Day 50, Day 100 after I’ve deployed this application to patch that now known vulnerability.
Or, my app is doing really well, it’s going gangbusters, users keep logging in, and I have a scale problem. How quickly can I update it? You can think about security risk, it could be operational risk because the thing might go down under load, there’s different types of risk, but it falls into these two categories generally speaking.
» Clean up
Ultimately, even when you get to end of life and cleanup, that’s a cost thing. I’m turning it off because no one’s using it or it’s end-of-life. Why should I pay for a thing that nobody’s using? So you can think about it as a cost concern.
» Focus on post-Day 0
So Day 0 feels really good for a short period of time. And I think that’s something that I always try and remind people: oftentimes as you think about designing any sort of a process, there’s sometimes this natural tendency to focus on optimizing for the Day 0 experience. And that often makes sense. Can I make my app teams more productive? That’s obviously a thing everybody wants. But you don’t want to do it at the cost of Day 1, Day 2, or Day N. And the reason for that is Day 0 lasts for one day. Once it’s created, you’re in this world, and this world lasts for, in some cases, there’s applications that are still running decades after they were created.
So you really want to think about where I spend my time optimizing a lifecycle and where I optimize. Where my calories spent in this? It’s this part of the arc that tends to be high calorie. So how do I optimize this? I don’t want to do it at the cost of favoring this part of it.
This is a little bit abstract when we talk about lifecycle management, and obviously this is something that applies to lots of different things. Almost anything can described as having a a lifecycle. So I want to make it a little bit more concrete.
» Application Lifecycle Management
Let’s talk about, for example, an application that we want to deliver and what does that actually look like? The day zero of an application is relatively straightforward. I need to write the application, hopefully test it, and package it. But at some point I’d say, great, I’ve wrote that application, it’s ready to go. I deploy it out into a production environment and now it’s running.
Well, very quickly, I probably run into all of these things. My users maybe love it, and they’re like, “Great, I have this additional feature I need you to add” or “There’s a bug fix. Can you fix this thing? It doesn’t work with it.” Or like I said, I might want to scale up or down.
I deployed version one of my app and now I have version two, version three, version N. Until eventually, you get back to where you started: end-of-life. You can see an application very much follows a standard lifecycle. So again, I could get really, really good about how I scaffold and get to the MVP part of my application That’s the Day 0. But really, over time, what becomes the hard part is: hey, I’m doing an upgrade in production from version one to version two. I have actual users. How do I make sure that this doesn’t cause downtime?
That’s a great example of a problem you don’t have at Day 0 because you don’t have an app. So what would downtime mean? Only once I get into the later stages, does this become an issue.
Downtime you can think of as a form of operational risk. I don’t want to create an outage that’s an operational risk for my customers. Maybe that actually hurts my revenue because this is a revenue generating product. So it might actually have implications for other things.
You can start to see how these things drive Day 1 and Day. And what I want to spend my time optimizing is how I make it easy to deploy a new version and do it in a way that’s high confidence, that has minimum downtime, and that isn’t going to be super expensive because I’m bringing up a full parallel infrastructure or something that’s long running.
This becomes a classic example within the application line item. This is where you’ll start to get more sophisticated and about doing blue-green deploys, canary deploys, techniques like that. You might bring in more sophisticated monitoring tools to make sure you’re monitoring different metrics and SLOs, and it’s all in service of optimizing that arc of how you manage the lifecycle of an application.
» Infrastructure Lifecycle Management
You might think about the same thing for infrastructure. Oftentimes on Day 0 I have a blank cloud account, there’s nothing running in it. The first thing I need to do is create my first set of infrastructure. Maybe I’m creating a VM or a Kubernetes cluster. Maybe I’m defining a AWS Lambda function. It doesn’t really matter.
So I did a ‘create’, now I have some infrastructure, Invariably, much like my application, it needs to be updated at some point. I need to scale up my Kubernetes cluster. I need to change the version that’s running. I have to redefine the Lambda with a new definition. I’m scaling up and down. All of these same things that might happen at the app layer also might happen at the infrastructure layer.
The infrastructure layer also tends to have all of these same operational concerns. So maybe my update is not even related to my app. It might be that there was an Open SSL vulnerability and I have to patch the underlying image that the application is running, whether it’s a container or it’s a VM image. So I might be patching.
It might be scale-driven. But then oftentimes at this infrastructure layer as we’re evolving, we also think about things like compliance. Maybe I’m scaling my app up and down, but that’s relatively simple. What if I’m deploying new pieces of infrastructure and I need to be PCI compliant and I didn’t think about the fact that these things are actually within PCI scope and I’m doing things that are outside of my policy.
As you start to think about this process — Day 2, Day 3, Day N, it’s really around how do I think about baking in as many of these controls around how am I managing my infrastructure costs as I’m scaling up? Maybe instead of going from 1 VM to 10 VMs, I go from 1 VM to 5 VMs, but it’s also managing the risk of those other things.
Did I introduce a security problem? Did I not patch things? Am I out of compliance? All the way until ultimately, hopefully, we’re turning off that infrastructure when it’s end of life. I think oftentimes, this ends up being a huge element of waste in modern infrastructure because we see things that were meant to be dev test infrastructure, that were only meant to be used for a short period of time, that are still running months or years later versus properly thinking about a lifecycle and saying, “That dev environment is auto destroyed after 30 days.”
Those are examples of thinking about this lifecycle upfront. It’s usually easy to create the mess and it’s hard to make sure someone’s cleaning it up at the end of the day.
This [ALM] might be a concern for developers, this [ILM] might be a concern for operations. Our security teams have the same challenge.
» Security Lifecycle Management
If you think about a security lifecycle, there’s often a similar analogy. I might be writing an app and then deploying it on some infrastructure that’s created, but my app needs to talk to database. So the first thing I’m going to do is create a secret. I have a database username and password that’s allowing my application running on some infrastructure to talk to that database in this example.
But now this application has been running for a year and nobody’s changed that username and password, and all my developers have seen it. And by the way, we checked it into source code and someone pasted it into a Jira ticket. So it’s a secret, but that’s not really super secret. It’s in all sorts of different systems.
So when we start to think about the lifecycle management of this secret, at some point we should probably rotate it or update the secret because it’s been exposed in all these different systems. We’d like to make sure that we change it periodically, so if someone finds whatever was in Jira or sees that it was pasted in the source code, it’s not the same secret anymore, it’s been rotated in the meantime.
You can start to see how this creates dependencies. If I rotate at my database credential, that might actually require that I have to redeploy the application potentially for it to pick up that new secret. So these different lifecycles are interacting with one another. You can start to see how it becomes complex over time.
Ultimately, same thing, at the end of life when that application is no longer being used, we don’t want to have a secret that exists for that database because that’s just unnecessary risk. Someone might find that thing and use it to access the database and find user data in it, even if the application doesn’t even exist which needs that thing.
Ideally, at the end of the lifecycle of that credential, we’re also destroying the underlying password. So this also has a full continuum of creation, update, deletion, and along the way you probably have an inventory as well. How do I discover credentials that are in my environment such that I’m making sure that they’re managed? And same thing with asset inventory. Do I have an actual inventory of my various infrastructure? And ideally I have an inventory of all of my applications as well.
» Bringing everything under central management
Oftentimes there’s a whole set of things just out there in the infrastructure estate that maybe aren’t being managed or aren’t being tracked because someone created it, we lost track of it, we weren’t applying a proper lifecycle management. They’re orphaned, but they exist. You’re paying for it, it’s in your environment. You have secrets that were created, maybe no one’s using them anymore, but they’re still valid. You could still use them to log in to a database if you were able to find them.
There’s both the clean-up act of discovering and inventorying these things such that you can apply a proper lifecycle management to them, and that can apply to almost any of these things. Applications that were created that were abandoned, that are still running, but are unmaintained and haven’t been patched or updated in a while.
That’s the way I like to think about this. Ultimately, when you think about where people start, it’s Day 0. So there’s a focus on how do we get this done really, really quickly. I think what people often don’t realize is that, over time, most of your calories, most of your time spent is in the Day 1, Day N part of the curve. I think that’s where it becomes important to optimize over time and ideally have a consistent approach to how you’re doing it across applications, across infrastructure, across different elements of your security estate. Because otherwise, this becomes the really high drag on the business.
Day 0 feels really good for maybe the first 30, 60, 90 days, however long it takes to get past that, but Day 1 to Day N can feel bad forever, and I think that’s really the big difference. I hope this video was useful and gives some food for thought as you think about how to apply lifecycle management to different parts of the estate.