Skip to main content
HashiTalks 2025 Learn about unique use cases, homelab setups, and best practices at scale at our 24-hour virtual knowledge sharing event. Register
Demo

The Nomad Autoscaler

An autoscaler helps ops environments automatically respond to demand, reduce waste, and generally reduce manual work. See the new offical Nomad Autoscaler in action with this live demo.

Speakers

How do you ensure increases in user requests do not overload your applications while your platform runs as cost-effectively as possible? Autoscaling enables users to dynamically scale up or down their infrastructure based on the true load of their applications. It has become a critical capability for any workload orchestrator. This talk will discuss the architecture and features of the Nomad autoscaler, including a hands-on demo using the autoscaler in real-world scenarios.

Transcript

Hi, everyone. I hope you are well. I hope you're safe wherever you are joining us from around the world.

My name is James Rasell. You may know me as @jrasell. I am an engineer on the Nomad ecosystem team for HashiCorp. I'm here today to talk to you about the Nomad Autoscaler. This is a new project from the Nomad team.

Who is the Nomad ecosystem team? We currently consist of 4 engineers: Luiz Aoqui, Chris Baker, Jasmine Dahilig, and myself, and we are tasked with building and supporting Nomad integrations.

Nomad Integrations Enhance the Experience

What are these Nomad integrations? They are integrations that aren't vital to running Nomad itself, but they enhance.

Once you have your Nomad cluster running, maybe you have a few jobs. Some of the things that we're building will enhance your cluster and make it easier for you to run, and maybe more secure.

But included in these integrations, we do some work within the Nomad core. For that reason, we work quite closely with the core team. We review, we design, and we rubber duck with them very closely.

One of the core benefits of working on integrations is that we test these points quite heavily, the API and such, and so we can sometimes find interesting bugs. That's maybe in my case because I don't read the thoughts properly, and so I send some requests that maybe are full of garbage or something like that. So that's quite helpful.

What Nomad Solves

For those of you who are not familiar with Nomad, what exactly is it? It's quite a complex topic that Nomad tries to solve, but we package it and expose it simply.

It can run on multi-cloud, it can run on-prem. And there we have a sense of multi-region, multi-datacenter, federated clusters.

No matter what your internal team topology looks like, we can support a very wide variety of workloads. We have first-class support for Docker, Windows, Java, VMs; we have everything. Even in the community, there are some great plugins. There are some great drivers such as Podman, FreeBSD, Jails, and even a great new Windows driver.

And we have native integrations with much of the other HashiCorp tools. We have this Lego box-style approach: Take what you need at the time and then build.

Before you start a new project, we always need to look at why. Why do you need this new tool in this plethora of tools that you already have? And why autoscale?

I've put these in order of how I believe an operator or a manager might look at the importance.

  • Respond to demand: As your product becomes more popular, you're going to want to ensure that you meet the requests.

  • Reduce manual work: You shouldn't penalize users, and you shouldn't do any harm or any bad work that increases traffic. As well, if you have service-level agreements, you want to make sure they are met. You might be able to scale manually, so you might know some of your traffic; you might not. Responding to unpredictable loads is very important. Even if you try to scale to a predictable load, it's going to cause stress on your operators, your human beings that work there. Manual changes are often time-consuming, and they are often error-prone, and this can often make the changes worse than if you had done nothing.

  • Reduce waste: This is something that I've definitely dealt with before, and I'm sure a lot of you have as well. As your product or your application becomes more popular, maybe you add new features, maybe you add some new team members that help identify new features, data scientists, quality assurance, and so your running costs can increase quickly and they can increase quite silently in cloud environments. Once you get to that point, your manager comes across your desk and says, "We need to get this under control." It takes a fair amount of work. Cost focus is an important bolt of business maturity and especially in today's climate, it's very important to focus and make sure it's under control.

The 4 Types of Auto-Scaling

So now we've got the why. Next, how?

There are 4 types of auto-scaling:

  • Horizontal application scaling

  • Horizontal cluster scaling

  • Vertical application scaling

  • Vertical cluster scaling

Horizontal scaling is the most common. We have lots of prior art. We have things like Amazon's auto-scaling groups as your scale sets, and Google Cloud instance sets.

In traditional environments, this might be a 1:1 mapping. We might have 1 instance of your application per server. And as you scale that out or scale that in, you have a direct correlation. In a Nomad environment or any containerized environment, that starts to become differentiated and different.

Application scaling horizontal is where we take more instances of your application and raise or lower that based on the demands. What are you scaling those tasks groups on, in this case for Nomad? That's going to be some form of server.

If you keep scaling your application indefinitely, you might reach a point where you have nowhere to go. You can't scale anymore; there's no more CPU left. This is where horizontal cluster scaling comes into play. And it's the idea of adding servers or removing them from your scheduled resource pool.

And then we go into vertical auto-scaling. This is a lot less common, but it's becoming a lot more in vogue as a topic. And it has a lot more underlying complexity than horizontal auto-scaling.

When you run your application, it's very hard to understand what resources you need to run. You might pick some sensible defaults based on a bit of testing, and you might use these defaults across different types of applications because they use the same underlying language.

They're also fixed for the full application lifecycle. If your application needs a lot of memory when it starts, but then has a small memory footprint once that's finished, there's nothing you can do.

The same goes for server sizing. You'll probably base this on basic requirements and some estimated costs, but that's about it. And it's very hard once you have this in place to change it. It might stay that way for 2 years. And if you want to change it, it's very time-consuming.

The Nomad Autoscaler

This is where I'm going to introduce the Nomad Autoscaler. It's an official HashiCorp autoscaler for Nomad. The current focus is on horizontal application auto-scaling.

We're currently working hard to develop and design this, and we're taking a lot from the lessons that have been learned from existing auto-scaling offerings, both from the community and for more enterprise offerings that are used in the cloud.

Changes from Community-Driven Autoscalers

So, what did we change as part of this initial release? Previously community-driven autoscalers have integrated in several different ways.

Policies were in job parameters. Maybe policies were stored in states in scaler. And to scale, you had to read, modify, and then register the whole job back to Nomad. It was mildly insecure as well. You also had no way of knowing scaling events that happened unless the autoscaler had telemetry or states.

So one of the things we did was a new job specification stanza, which could handle scaling policies. Excellent. And as part of this, we introduced new endpoints. We have endpoints for reading policies, and we have endpoints for scaling and also seeing scaling events. This also makes it easy to build your own. If you have special requirements, then hopefully these endpoints reduce the amount of pain to integrate with Nomad.

The Nomad Autoscaler itself is a brand-new project. We have our own release cadence separate from the Nomad binary. There are some version requirements as we develop the Nomad API, as we make changes. We keep a version matrix in the readme for that.

But with the nightly builds, particularly, because we're iterating so fast and trying to add new features, if you see a feature that was added a couple of days ago, or even a day ago, you can use the nightly build to test it out.

We want to get feedback quick. We want to get feedback early on what we're building. We also deploy as a binary or a Docker image, a binary for multiple different OS architectures. Our preference is to run the Nomad Autoscaler on Nomad itself and Nomad can manage it. But if you have a reason why that is not appropriate, you can take one of these binaries and run it in however manner is applicable to your use case.

It's inherently difficult for us to support every use case that might exist. For this reason, we've architected the autoscaler around plugins at very distinct areas. We'll discuss that in a minute. We use the pretty widely used Go plugin package. This is used throughout HashiCorp in some of the major products. We want to make it so that we can include as many use cases as possible.

We also hope to make the most efficient use of Nomad as we can. We don't want an external tool to impact Nomad. At the very least, it might slow down deployments and that slows down requests. At the very worst, it could cause the cluster to crash, and it might impact all your running applications.

So we use blocking queries, internal caching, to make sure this doesn't happen. This has been one of the larger pieces of work we've focused on. And that's always still a focus of what we're working on when we're designing.

How the Autoscaler Works

How can you, as an operator, as a developer, as a sysadmin, control autoscaler? If we ignore all the conflict components that are used to configure the agent and the passing and stuff like that, we have 2 main components. We have the policy handling and the valuation, and the plugin management system. And that's pretty much it.

If we take a look at the evaluation flow, it's what we have in this slide. It's a closed-loop control system. It's similar to when you regulate the pressure of a tank in an industrial plant or to keep your car going at the same speed via cruise control, or even keeping the temperature in your house constant with the thermostat.

The input we have is the policy in the case of the Nomad Autoscaler. Think of a thermostat, and the policy would be your desired temperature, no matter what the conditions are outside. Then we have the controller, the yellow box in the diagram on screen that is the Nomad Autoscaler core or the controller. And then the purple, we have the target.

In the thermostat, this might be a radiator, or it might be an air conditioning unit. In the Nomad Autoscaler, particularly for the horizontal application scaling, this is the Nomad cluster and the Nomad task group counts. And even the target keeps feeding back information. What's the status? Give us some metrics so we can continually compare the current state and the desired state.

How do we configure this? This is a scaling policy. This one on screen is one that you could take; you can put this in one of your jobs right now and it will go and scale. We're explicitly configuring some fields just to give you a full view of what it might look like.

Now we'll go through every section, to try and explain a little bit so you understand more.

In the top section on screen, we've got these 3 values, enabled, min, and max.

Enabled lets you administratively disable or enable the scaling policy. Say your application is having problems. Or say you don't want the autoscaler to interact with it for a bit while you make a few changes or decide something. You can set this to false, and the autoscaler will skip this evaluation, giving you better control, and we won't interfere with it.

The minimum and maximum will always adhere to you by the autoscaler.

If the autoscaler takes this policy, it decides, "I want to scale to a max of 21." The autoscaler will look at that and say, "No, you can't do that; we're going to lower that value to 20." This just stops the situation where maybe you have a bad metric, maybe there's something that's misconfigured, misfiring. We just don't run and scale forever and cause more problems.

This is the only part that's validated by Nomad itself. So if you want to build your own autoscaler, you have a policy that you believe is more fit for your purpose, then this is the only section you really have to adhere to. Everything else you can make it up how you wish.

Setting the Cool-Down

Then we get into the policy section, the cool-down in the evaluation tool. These both take time durations, and that's pretty common in Nomad. The cool-down defaults to 5 minutes; that is a fairly standard unit for cool-down across most of the cloud providers and most of the autoscalers.

What do you need the requirement for cool-down for? When your application is starting because of scaling, you want your load balancer to start distributing traffic. You want your metrics to catch up with what's going on.

This just prevents thrashing. It prevents scaling too rapidly and not letting the system settle. Thrashing can cause high load on your Nomad servers. Thrashing can cause pretty bad instability in your applications.

And we have the evaluation tool. This defaults to 10 seconds. And this controls how often the Nomad Autoscaler will evaluate your policy, how often it will evaluate the current state to the desired state

These 2 values in combination control the overall policy and how it works.

If you have low values for this—1 second—you're going to get very rapid actions. They're going to change very often. It's going to be probably quite dangerous. If you have higher values, you're going to have more conservative scaling, slow, maybe steady, but maybe you will miss out on some traffic because you won't be reacting to it quickly enough.

The APM Section

Then we have the application performance management (APM) section. This is where we define what APM we would like to talk to, and we define the query that we're going to talk and gather back. In this sense, I don't think it's an actual proper form query; that would be too long.

But here, for example, we're looking for the number of concurrent sessions to our instances of our application. There is a current requirement that is, the query must return a single value. We have discussed internally loosening this and allowing an array.

The reason we're very reluctant to do this is that APMs such as Prometheus, such as Datadog, they're much better at performing these types of queries. They're built to do this. They're much more efficient.

We don't want the autoscaler to turn into its own APM, which then brings a whole new set of problems.

The Target Section

This target defaults to the Nomad target, and so you can omit this if you want to do your application auto-scaling. It's there for fulfillment to make it look good.

The target plugin: As I mentioned, in the controller, it has 2 responsibilities. It alters the number of running instances of your application, but it also supplies status information: How many are running? When was the last actual scaling event?

This is so we can capture things like outbound changes. Say you have a road program that is running and accidentally scaling when it shouldn't be. The autoscaler can look at that and omit scaling for that term. We don't want to impact what's going on externally.

Auto-Scaling Policy Strategy

Then we move to the strategy. The strategy is kind of the brains; it's where the math happens.

What we've got here, we have the target value. A target value aims to keep the specified or the supplied metric at the correct value that you put in that configuration block.

If we take this example, where we're looking at the number of connections that we have open to our application. If we have 2 instances running and we have, say, 15 current sessions returned, the autoscaler strategy you would recommend is moving to 3 instances.

This would take us from between 7 and 8 current sessions per instance to 5, which is what our target is. And if we take a holistic view of things, this is what it looks like. We have a single strategy, a single target plugin.

Then we have 2 APMs. Why is this? Well, Nomad is quite inefficient for this. Every time we do a lookup, we have to make 1+ angles, where N is the number of allocations you're running. And as I said, we don't want to turn the autoscaler into an APM.

This slide is a visual representation of where those conflict parameters in that policy fit in this plugin ecosystem. You can see they're pretty well divided up. This is so that, if you change your value, you know exactly what part of the system is going to be impacted.

Say you change what the cool-down is. That's going to be handled by Nomad Autoscaler core and you know where to look.

An Autoscaler Demo

But there's only really so much slides can teach us.

For me personally, seeing and doing are much better at helping understand the topic. So we're going to look at this demo, which aims to run through some real-world scenarios. It probably fundamentally follows what you might experience when you're thinking about introducing an autoscaler to your environment.

What happens when we don't have auto-scaling? How long do you want to introduce auto-scaling using this dry-run flag? The third point is, what might this look like when you have it, when you've tested, when you've deployed properly?

The final one is just going to be very brief, but we touch on scaling to 0 for real flexibility. The demo isn't geared to that, but I want to introduce the topic.

You'll see we're using Prometheus for all of the metrics rather than Nomad. Nomad can only supply CPU and memory. These are quite fickle to try and get consistent scaling from. This is why we're using Prometheus.

In the demo itself, what is actually the worst that could happen? Probably quite a lot of things, but we're going to go and take that risk anyway.

I'm using Shipyard to build all my infrastructure pieces locally: nice, simple, based on containers, should take less than a minute to stop.

What is it deploying? We're deploying Nomad and Consul. We are using a Unicorn Rails app to simulate Nginx, a cache database. It's just not static response as a single static page.

All services are using Consul Connect. Why is that? Well, we get some very interesting metrics from the invoice cycle, which we can use to scale. And also because Consul Connect is pretty cool, right?

This just takes a moment to start, brings up all our containers. Once that's done, we'll have everything locally. We've got Nomad, we've got Consul, and we've got a really good application with some great metrics loaded for us.

You can see that we have a Nomad server in a HashiConf-digital region in our Valencian datacenter. And we can go over and just take a little look at the Nomad UI, see what's starting. And you can see we've got our jobs, apart from Grafana. That's usually the more lethargic one to start. But we've got our monitoring database, Nginx, and our Unicorn application.

And so now we can go over to the service UI and see how our service looks. You can see the path our requests will take, and you can even see the duration of the request and the response code that we got back.

Now we can go and have a look at the Grafana dashboard. We've got some interesting tiles up. We've got the memory usage, we've got the number of running instances and the average requests per instance. And this is going to be the metric, the thing we're looking at.

We can also see the total requests, and we can see the response code. Obviously we want 200 response codes, but what are we going to actually get when this starts running?

You will see that we have no scaling policies running. This job will not scale. The Unicorn app cannot scale. Nomad doesn't know anything about it.

So we're going to start to simulate some customer loads. It's not too much; we're running it for 5 minutes with a concurrency of 2, but this will quickly overwhelm your application.

On the left, you can see quite quickly the number of requests is increasing. At the start we get 200s, but it doesn't take long for us to start getting 500 responses back. That's not good for your customers, right?

Looking at the graph, you've got about 50% 200 and 50% 500. So 50% of your requests are failing to your application. And that's just going to keep increasing, and it's probably going to stay pretty consistent.

We have instances where the average requests keep going up. It doesn't get any better. We don't have anything on our app to self-heal, to improve the situation. And if we cancel, we have 50-50. That's not the best, right?

What can we do to improve that? We can run the autoscaler. We have an autoscaler, and we can start testing out how auto-scaling might improve our environment, how it might improve our customer feedback.

What we're going to do is to run an autoscaler. As I said earlier, we're going to run it as a job on our Nomad cluster. The configuration is quite minimal for this.

If we take a look in the file, you can see that we're setting up our Nomad clients, so it points to the right address with the right region. We are going to instantiate a Prometheus client so that we get some good metrics. And we know where to gather our data from.

We're also loading a strategy, which is our target value plugin, the only one, unfortunately, we have available for us at the moment.

And then a couple of different parameters: the cool-down, just so that I can show you that you can change that and how it works, and also add bind address.

What's that bind address used for? The Nomad Autoscaler exposes a health check. You can see that we get a 200 OK back. So you can at least have some confidence the autoscaler was up, the autoscaler is running.

That can be used as a Consul health check, for example, in your job just to provide added security for you.

With that running, we're going to check the autoscaler there and have a look at the logs.

This is a nice way to do it through Nomad. We need to look at the standard error and to have a look at the agent logs.

You'll see that we've launched all the plugins that we were expecting, all 4 of them, all the defaults. And we have started our blocking query for efficiency on the policies.

We also have the open health server so that you can query, make sure it's OK. Then we're going to trigger a new allocation of our Unicorn application. We want to make sure that we are going to be able to scale this properly.

Autoscaler Dry Run

This job is configured with a dry run.

It can be quite scary to deploy an autoscaler in an application straight away. So we have a thing called "dry run." Dry run lets you deploy a scaling policy for auto-scaling and let it be evaluated without changing the ultimate counts of your application.

This slide shows the scaling policy of the job we've just deployed. You can see that we've got the min, the max, the enabled values.

You can see our policies. We're using this rather grotesque query, but this is effectively asking Prometheus: What is the average number of connections open to our Unicorn instances? And it gives us a single value. Let's say currently you have 10 open connections per instance of your application. That's what we want to look at.

Further down you can see that we've got this strategy. We're looking for 6 connections per instance. That's going to be our target value.

And then we have our target. We're using the Nomad target, but we have critically this dry-run flag. This is where the autoscaler will check this and it won't make changes to your underlying application.

We can have a look, see what's going on with our Unicorn application. You'll see that we've got a new allocation. That's great. That's expected to happen.

We've registered a new job and now we're going to add some load to get some customer data running through. Now we've got to wait a couple of minutes just to get Prometheus updated, but we're probably not going to change much from the response codes.

We're not actually scaling the job. This is just a dry run. We're probably still going to fail. But if we look at where Nomad looks, you'll see that the autoscaler looks at the load, looks at the metric, and does want to scale up because of this factor of 1.5. And it wants to scale up to 2. As you will see, the dry run is true.

This means the autoscaler is not going to submit that change, but it's there. If you have a look at the Unicorn status again, you'll see that nothing has changed. We haven't made any changes to your system; we're just informing you what might happen.

So this gives you a really nice feel to understand what's going to happen. Maybe you want to play around with metrics, play around with the accounts, just fine-tune before you let anything loose into your cluster, which might impact your customers or at least impact one of your environments.

Once you've done that, you might want to remove that flag and deploy this new job exactly the same, but with the dry run removed from it. You don't have to remove it altogether; you can just say it's a false, and it will still run.

And we have this cool-down of 1 minute, kind of sensible. Maybe you want more, maybe you want less, but for the purpose of the demo, it works beautifully.

Just a reminder: The default is 5 minutes, which might be a little bit long in this, but that's perfect for other reasons.

Submitting a Job

We're going to submit that job to our cluster, which will not give us a new allocation. We've done this change of flag of the scaling policy that doesn't need us to re-register, to update the job in a destructive way. It will give us a new version, as you can see.

With all of that done, we now have a job that is capable of scaling with some decent defaults, and we have an autoscaler that can do that.

Now what we're going to do is start incrementally increasing traffic into our application. This is more like what you might actually experience in your environment.

You may get blasts if you haven't fine-tuned. It's going to work, hopefully very well.

We start with some load here, and we start making requests. If we head over to the Grafana UI, it's slow, but already you can start to see the number of requests per instance increasing up; very helpful.

And you'll see that we're getting 200s. That's great. That's what we want. But as this load increases from what we've seen, you might struggle. But because we now have the dry run removed, Nomad Autoscaler will actually enact change.

You can see that we have designed account 3 because of the load that we have, and you will see there is no dry run. And if we take a look at the green line on the screen, which is the number of running allocations of our job, you can see that that starts to increase. That's moved up to 3.

So the autoscaler has submitted that job and we've scaled. And conversely, the number of requests per instance, number of active connections, has dropped down to an acceptable figure. And we haven't had any 500s.

We've managed to take that load and make it work and not perform any bad requests for our customers.

Let's go ahead and add some more load into that, add to the application. You've added a new feature. You've added maybe some new cat GIFs to your applications. That's going to be pretty popular, right?

What does the autoscaler want to do? It's come out of cool-down, it's looked at the metric, and it's decided, "We need more instances of this application to handle this load." We've requested 5 instances. And we have put ourselves into cool-down at 1 minute, just so that these new allocations have a time to start up.

The Nginx has time to distribute traffic, and then we can let the metrics settle down and then re-evaluate. And if you have a look at the Grafana, you can see that's exactly what's happening; we're kind of chainsawing these lines together. We have more instances coming online, and the requests per instance are lowering. The allocations are handling the traffic and, importantly, we're still not getting any 500s.

For the final bit, let's split this up and we can add a final bit of load to our cluster, just to make sure that this isn't a fluke, that we're not lying about this. Again, it might take a minute.

We are in a minute cool-down. We will take a couple of seconds for us to start doing anything yet. But you will see that just as I click here, we increase the number that we want to look at.

And critically you'll see that the autoscaler has wanted to run 9 instances, but has dropped it down to 8, to make sure it doesn't violate the maximum parameter, and then it told Nomad, "I want 8 allocations of this job."

If we go back to the dashboard, we'll still start seeing this, again, coalescing together. You'll see that we have now 8 applications of our application, and we're hitting 6 requests per allocation, which is exactly what we wanted for our target.

You'll see that, if we go along this graph, there are no 500s. We've incrementally increased that load as might happen with your website, and we haven't dropped any customer requests on the floor. We haven't returned any 500s.

It's ideal for what you want.

We will take off the load now; we don't need this anymore. We see a couple of 500s there. I'm not sure where the difference comes from. I'm not sure if this is maybe talking to Nginx when one of the backends is just wrong.

The Nginx has no configuration, so there's probably some tunings to be done to that.

And that's great. That's what you might deploy to your product environment, right? Now we have no load. You'll see that the autoscaler will gradually start to rein things in.

We don't need 8 allocations of the application running. So we'll start to gradually reduce that number as the load is dissipating away. This gives us this nice curve, which might be, if you're running an e-commerce site, when your most popular times are, maybe in the morning, at lunchtime, in the evening when people have more time.

We get these peaks and troughs, which is what you want. Here we can see our requests have dropped down to nothing, and we start to reduce the number of allocations we have running. This will continue for a few moments, as it just keeps dropping down. We should come out of our cool-down in a moment. Hopefully the autoscaler should take some action now.

And there we go. The autoscaler wanted it to go to 0. Because of our minimum threshold, it will change that to 1, to make sure that it doesn't violate 1. Then it submitted that to Nomad.

Now we should have 1 allocation of our application left running. From the autoscaler's standpoint and from your metric standpoint, that is enough to fulfill all the traffic you have coming into your application.

That's what you might have in your product environment.

A Look at Scaling to 0

The final bit of the demo, as I said, isn't really fit for this piece, but I want to show it off. There was a great talk done 2 to 3 weeks ago by HashiCorp developer advocate Erik Veld on cloud bursting using Nomad and Nomad Autoscaler. If you're interested in more about pulling to 0, I highly recommend it.

Scaling to 0 is quite useful, if you have an app consuming from a queue, or if you have physical hardware and then get an influx of requests. It's very hard to rack up a server, bring it online very quickly. This really helps those situations.

We're going to run this job. What we've changed in this job policy and the scaling policy is that the minimum count is now 0. This job is allowed to scale to 0. That's fine. This is probably not ideal if you're serving live requests. If you have a very good API or proxy, then maybe you could get away with this.

We're going to run that job. This will probably scale down to 0, and quite quickly; we are not in cool-down.

The next iteration, that’s going to happen, immediately. There's no metric, there's no data, we don't have any requests. So we submit to 0.

The UI is a bit misleading. The job isn't dead. I would class it more as in hibernation. But this is great.

From the final bit, you can see from the Grafana dashboard that we have scaled to 0 requests.

If we look across the final bit of the demo, you can see where you're coming from. We have no scaling. Then we tried scaling to see what would happen, and then we deployed it maybe to our production environment.

So you get this really nice increase in traffic, and you see that we managed to meet all of that, and that's perfect.

That's the end of the demo. The most important thing at the end of a demo: Destroy everything; make sure you don't have anything left behind.

Autoscaler's Future

That's enough of the present. What does the future of the autoscaler look like?

As a team, as a company, we have some features in progress. We also have some plans, but that's not to say that any feedback isn't really welcome, and it really may help shape the future of our development process.

That being said, if you have a very particular use case, something that is very specific to you, the plugin architecture means that it's quite easy and straightforward to build a plugin for your use case if you so wish and you have the resources. We're always happy to help advise and just provide feedback.

Scaling the application we demoed, that will get you so far. But what if your app becomes so popular that you run out of space on your cluster?

This is where we introduce horizontal cluster auto-scaling. We are targeting Amazon Web Services Auto Scaling Groups (ASGs) first, generally because of the popularity in the community.

This will allow us to scale out based on your allocated resources or any kind of constraints you have, such as needing more GPUs in your cluster.

The work on this plugin and this feature will also drive a lot of the core maturity. We tried to build the first iteration, the first releases, in a way that there aren't any assumptions that we're just doing horizontal application scaling, but that's not a guarantee.

This will help us smooth out any of that. Also, cluster scaling is a lot more complex, so this will really drive some maturity in the core offering.

The initial AWS target done may take a bit of time to design and implement. Implementing other cloud providers should be a lot quicker. A lot of the heavy work will be done initially through AWS and the design.

We'll be initially targeting potentially Google cloud instance groups for additional cloud provider cluster auto-scaling. As I said, if you have any feedback, if you think another provider is going to be more beneficial, just tell us; we'll definitely take that onboard.

The plugin nature allows us to build fairly quickly new plugins, and we know that the catalog we have is slightly limited. One of the biggest time sinks in building a new plugin is learning that provider, understanding, What core should we be making? What is the most efficient call to make?

We're initially targeting, potentially, Datadog as an APM, a PID controllers strategy, and even a scheduled strategy. By scheduled strategy, I mean: Say you have an application that only is used during office hours. The schedule can say, "Scale my application from 0 to 5 at 8:00 in the morning, and then at 6:00 at night, take the application back down to 0." Save myself some money.

If we take a look back, this is the ecosystem that we have right now. It's minimal, it's useful. But it is limited to application scaling.

What might this look like in a few months, in a year, in a few weeks?

This is the kind of landscape we're driving toward. It has flexibility to meet hopefully most use cases. And even in the unique use cases, it provides an opportunity to accommodate with some simple interfaces.

If you have your own core strategy, that might be there.

And whereas the picture focuses on the plugin ecosystem, this doesn't mean we're neglecting the Nomad Autoscaler core.

We'll be working on efficiency, high availability, observability. So, as an operator, what is happening? And even just the core auto-scaling feature set. Maybe there is an auto-scaling feature that Azure uses that is quite a neat feature. And you want that within the Nomad Autoscaler. We'll be looking at things like that as well.

If you're interested in learning more about Nomad or Nomad Autoscaler, these links will take you to the homepages so you can give them a go and understand a bit more.

If you have any questions or feedback, please use the Nomad Discuss forum. We're pretty active on there. We try to monitor it and respond as quickly as we can.

If you're interested in learning about auto-scaling more as an academic subject, this paper gives a really good look at the current state of auto-scaling and clouds. It takes a look at the main providers. It takes a look at that auto-scaling offerings and gives a really good dive down into what they are or how they work and how they compare.

The second link is a recent paper by Google, which discusses vertical application scaling, one of the cool, new, in vogue topics. It particularly focuses on what they call "autopilot," which is their Borg vertical application scaler. Very interesting papers. If you have some time, I'd recommend reading that.

Thank you for watching. Thank you for taking the time to listen to me today. I would like to thank everyone that has helped me, provided feedback for my demo, for my slides. Also everyone that's made this conference possible. It's a new challenge and it's been great to be part of.

If you have any questions or feedback, feel free to reach out at the Discuss forum, as I mentioned, or you'll be able to find me online with the @jrasell handle. Enjoy the rest of your conference. Goodbye.

More resources like this one

1/19/2023Presentation

10 Things I Learned Building Nomad-Packs

12/31/2022Presentation

All Hands on Deck: How We Share Our Work

12/31/2022Presentation

Launching the Fermyon Cloud with Nomad and WebAssembly

12/31/2022Presentation

Portable CD pipelines for Nomad with Vault and Dagger