HashiConf 2018 Keynote: CircleCI and Nomad
CircleCI runs 7.5K concurrent jobs across 750 clients, scheduling 3K every minute. This is a keynote preview of CircleCI's talk on how they use Nomad to achieve this scale.
Speakers
- Robert ZuberCTO, CircleCI, CircleCI
Transcript
Thank you, Armon. Good morning, everybody. It’s a pleasure to be here. I’m Rob, I’m the CTO of CircleCI. We’ve been big fans of HashiCorp for a long time. I think we use every single tool that’s in the toolkit. Also, we share a lot of the same mission and vision as HashiCorp. We spend a lot of our time thinking about making sure that you, as developers, can focus on delivering your value to the market instead of spending your time on undifferentiated heavy lifting.
We’re focused in the delivery pipeline. We offer continuous integration, continuous deployment. We primarily do that in the cloud, although we also have a server-hosted offering that you can run yourself. Globally, within our cloud environment, we do that for 25,000-plus organizations, which drive, recently, 12 million builds per month and climbing.
What’s interesting about our use of Nomad is that it’s not about scheduling the services that we built, but we’ve baked Nomad into the heart of our product.
Within CircleCI, as an end user, you define a workflow, which is basically your definition of how you want your jobs on our platform to be executed. It’s a DAG, effectively—a directed acyclic graph—of dependencies between the jobs that you want to run on our platform. When you’ve defined one of those CircleCI jobs, we then take that and convert it into one or more Nomad jobs to be executed. We chose Nomad as the solution for this for a few reasons. First of all, performance. We needed something very fast. We’re in the delivery pipeline, and developers are waiting on their builds to be executed, so we need to get something scheduled onto our platform and executing as quickly as possible. And we wanted it to be simple. We need something that we can understand and operate within the core of our product, not that’s wrapped around the whole product and brings with it a lot of complexity and baggage and layers of tools on top of tools in order to do that scheduling.
We just need something that’s going to do what we need done and do it quickly and be understandable. We also, as I mentioned, we deliver this as a server offering to customers who take it and run it themselves. Built into that environment, we need it to be something that they can manage. We found Nomad to be perfect for this environment. We deployed Nomad as part of that in our cloud environment 2 years ago, and as of about last week at peak load, we’re running about 7,500 concurrent jobs scheduled through Nomad. That runs across about 750 clients, which are the hosts that are registered and taking those jobs.
What’s interesting is those are not steady-state jobs. They’re turning very quickly, so we’re scheduling about 3,000 jobs every minute and packing them into that large cluster of hosts, and packing them in very efficiently because we spend a lot of money on that large cluster of hosts as well. If you’re interested in learning more about Nomad, about our use case of it, and about why these numbers all doubled yesterday, I’ll be talking about that at 2:35 this afternoon, right here. I hope you have a great show. Enjoy everything that you do, and thank you very much.
See Ron Zuber's talk: Security and Scheduling Are Not My Core Competencies, And I Bet They Aren’t Yours Either