Products & Technology

Cluster Scaling with the HashiCorp Nomad Autoscaler

Jul 09 2020Luiz Gustavo Ferraz Aoqui

Back in March, the HashiCorp Nomad team announced the tech preview release of our new project, the Nomad Autoscaler. It brought horizontal application autoscaling capabilities to your Nomad workloads, so you don’t have to worry about manually managing your task group count values anymore.

Today we are happy to announce the new release of the Nomad Autoscaler, which is now in beta.

The highlight of this release is the long-awaited horizontal cluster autoscaling capability. This feature allows you to automatically add or remove clients from your Nomad cluster as your load changes, with initial support for Autoscaling Groups in AWS. It’s built on top of the existing functionalities of the Nomad Autoscaler, so it’s easy to get started.

»Getting Started with Cluster Scaling

With horizontal application autoscaling, the scaling policy was defined in the jobspec itself, using the new scaling block. With cluster scaling, we don’t have a specific job to attach our policy to, so we added the ability to load policies from files. You can specify a directory where your policies are located using the -policy-dir flag or in the Nomad Autoscaler configuration file:

policy {
  dir = "..."
}

Scaling policies files are written using the HCL syntax, the same used to write Nomad jobspecs. Here is an example cluster scaling policy:

enabled = true
min 	= 1
max 	= 10

policy {
  cooldown        	= "2m"
  evaluation_interval = "1m"

  check "cpu_allocated_percentage" {
    source = "prometheus"
    query  = "scalar(sum(nomad_client_allocated_cpu/(nomad_client_unallocated_cpu + nomad_client_allocated_cpu))/count(nomad_client_allocated_cpu))"

    strategy "target-value" {
      target = 70
    }
  }

  check "mem_allocated_percentage" {
    source = "nomad_apm"
    query  = "cpu_high-memory"

    strategy "target-value" {
      target = 70
    }
  }

  target "aws-asg" {
    dry-run         	  = "false"
    aws_asg_name    	  = "hashistack-nomad_client"
    node_class      	  = "hashistack"
    node_drain_deadline = "5m"
  }
}

If you're familiar with Nomad's support for application scaling policy, this is similar to how a scaling block would look in a jobspec. However, cluster autoscaling brings a few changes to scaling policy that are worth mentioning.

First, each policy can now have one or more check blocks. Previously a policy could only look at a single metric value to make scaling decisions. This was very limiting for something that can be quite complex, such as deciding when to scale your cluster.

With multiple checks, you can now specify multiple queries targeted at retrieving the different metrics that are relevant to your infrastructure. The Nomad Autoscaler will run them and pick the result that is most appropriate for the current situation.

As before, you can use one of the available APM plugins to read your metrics from different sources. Currently we support Prometheus and native Nomad metrics. We are working on adding support for more sources, and external plugins can be easily deployed alongside the autoscaler.

The second important addition is the new aws-asg target plugin which, as the name suggests, is used to interact with an Autoscaling Group on AWS. When scaling your cluster, Nomad Autoscaler will take care of the labor intensive process of draining clients and adding/removing servers from your AWS ASG.

»Trying it Out For Yourself

We prepared a demo so you can try autoscaling a cluster for yourself. It uses HashiCorp Terraform and Packer to provision the entire infrastructure on AWS, so it’s easy to follow along.

»Let Us Know What You Think

As of today, the Nomad Autoscaler is out of tech preview and into its Beta cycle. We are always happy to hear from our community, so if you have any questions, comments, feature requests or any other type of feedback feel free to file an issue or find us at our discussion forum.

Cluster Scaling with the HashiCorp Nomad Autoscaler

»Getting Started with Cluster Scaling

»Trying it Out For Yourself

»Let Us Know What You Think

Sign up for the latest HashiCorp news

More blog posts like this one

Automating workload identity for Vault and Nomad with Terraform

Terraform ephemeral resources, Waypoint actions, and more at HashiDays 2025

Nomad 1.10 adds dynamic host volumes, extended OIDC support, and more