HashiCorp Nomad 0.2

We are proud to announce the release of Nomad 0.2. Nomad is a distributed, scalable and highly available cluster manager and scheduler designed for both microservice and batch workloads. The initial public release of Nomad was almost two months ago and we have been busy extending the system, improving user experience, and fixing bugs. Nomad 0.2 brings many new features including service discovery, a system scheduler, restart policies, new constraint types, numerous client improvements, and much more. Please see the full Nomad 0.2 CHANGELOG for more details. Download Nomad 0.2 here or read on to learn more about the major new features and improvements in Nomad 0.2.

Alex Dadgar

Nomad

Nov 18, 2015

Alex Dadgar

We are proud to announce the release of Nomad 0.2. Nomad is a distributed, scalable and highly available cluster manager and scheduler designed for both microservice and batch workloads.

The initial public release of Nomad was almost two months ago and we have been busy extending the system, improving user experience, and fixing bugs.

Nomad 0.2 brings many new features including service discovery, a system scheduler, restart policies, new constraint types, numerous client improvements, and much more. Please see the full Nomad 0.2 CHANGELOG for more details.

Download Nomad 0.2 here or read on to learn more about the major new features and improvements in Nomad 0.2.

»Service Discovery

A scheduler enables efficient placement of tasks on nodes, scaling of jobs and handling of task failures. This introduces a discovery challenge because a task's health, count and location is dynamic. Nomad 0.2 aims to solve that problem by integrating with Consul to provide service discovery and health checks.

We are integrating Consul first to have an immediate service discovery solution in Nomad. A future version of Nomad will expose an API that anyone can use to integrate a custom service discovery solution.

Tasks within a job can now be augmented with a service block that will be registered with Consul.

count = 5
task "redis" {
	...
	service {
		# name = "redis"
		tags = ["global", "cache"]
		port = "db"

		check {
			name = "alive"
			type = "tcp"
			interval = "10s"
			timeout = "2s"
		}
	}
	...
}

With the simple service block above, we are registering a "redis" service that will have a well known name on Consul, be discoverable by other tasks in the cluster and associating a health check with it so that traffic is only routed to healthy instances of the task. Further, a Consul cluster can include applications running inside of Nomad as well as those managed outside of Nomad. This makes it very easy for applications to discovery each other regardless of how they are orchestrated.

The Nomad Client is responsible for both registering and deregistering the service. Once it receives the task, it will be able to register the service based on the IP of the machine and the dynamic or reserved port used by the task. Once the task is no longer needed or is being moved to a different machine, the task will be deregistered.

More details are available in the service documentation.

»System Scheduler

The system scheduler is used to register jobs that should be run on all nodes in the cluster which meet the jobs constraints. As the cluster expands, the system scheduler will place new instances of previously registered system jobs onto the new nodes.

The system scheduler is a great way to deploy monitoring and logging tools like Logstash or Nagios that should be present on every node in the cluster. When these are run as system jobs, they benefit from the many features Nomad provides including declarative deployment, rolling updates, service discovery, monitoring, and more.

More details are available in the scheduler documentation.

»Restart Policies

Nomad will now restart failed tasks for all job types. A new restart block is introduced at the task group level which dictates how many times and at what frequency Nomad will restart a task. Very few tasks are immune to failure and the addition of restart policies recognizes that and allows users to rely on Nomad to keep the task running through transient failures.

restart {
	interval = "5m"
	attempts = 10
	delay = "25s"
}

For service and system workloads, Nomad assures the task is kept alive and will keep restarting failed tasks. Under a batch workload, the restart block interprets attempts as the maximum number of restarts allowed before a job is failed.

More details are available in the restart policy documentation.

»Improved Constraints

Nomad 0.2 constraint system is bolstered by the addition of regular expression, version, lexical ordering and distinct host constraints.

The most interesting addition is the new distinct_host constraint which can be specified at the job or task group level. The distinct_host constraint ensures that task groups within a job are placed on unique hosts. This enables a variety of applications in which colocation of task groups is unacceptable.

More details about each of the new constraints are available in the constraint documentation.

»Client Improvements

The Nomad Client has received many bug fixes and the following key improvements:

The ability to download and execute remote artifacts from a variety of sources including: Git, Mercurial, HTTP and Amazon S3.

Improved ability to restore state during restarting by reattaching and monitoring previous run tasks. This allows in place upgrading of the Nomad Client without the need to drain the node of all the currently running tasks.

Improved driver configuration interface which allows rich configuration blocks such as:

config {
	image = "redis:latest"

	port_map {
		"db": 6379
	}

	auth {
		username = "username"
		password = "password"
	}
}

»Roadmap

Nomad 0.2 is a big release that adds lots of new features, improvements, stability and bug fixes. As a result, we expect that there will be some new issues which will be address in point released following.

While the roadmap for 0.3 is still being determined, there are a few things that we know will be included:

A cron specification to run jobs at periodic rate.
Addition of job queuing to allow scheduling of more jobs than there are current resources, making Nomad available and resilient while under high resource contention.
Affinity and anti-affinities to other jobs and nodes to enable data gravity, provide tenancy constraints and minimize network latency between tasks.

Until then, we hope you enjoy Nomad 0.2 as much as we do! If you experience any issues, please report them on GitHub.