Background and Goals
With the onset of cluster schedulers we are leaving behind statically configured platforms. No longer do we know upfront where our workloads will be running. Instead, the scheduler is going to automatically distribute, scale and supervise the workloads to optimally utilise the available cluster resources. Nomad by HashiCorp is one of the latest additions to the scheduler space. It boosts some impressive features in terms of operational simplicity, scheduling speed and the ability to schedule different kinds of workloads. This opens up some interesting possibilities in the CI/CD space.
What if you could have an easy to operate continuous integration (CI) and deployment (CD) platform that will scale to any size and is capable of running heterogeneous workloads, while also being incredibly resource efficient and fast? Sounds too good to be true? Read on to find out how Jenkins and Nomad can be leveraged to make this a reality!
The high-level overview of the platform implemented in this blog looks like this:
The accompanying implementation of this platform and the instructions to do everything described in this blog can be found here.
It consists of a "static" part with a number of single long-running services and a "dynamic" part where services and application environments might (dis)appear at any time. Keep in mind that this is only a logical separation and that all services are in fact running within the same Nomad cluster.
Jenkins is a well-known CI tool that has the ability to do distributed builds. This means that it will send out build jobs to slave agents on different machines, in order to run them within a specific environment or to speed things up by parallelization. The Jenkins master is responsible for running and distributing the build jobs across (dynamically scheduled) slaves. It is also used to deploy application environments onto the platform.
Selenium is a browser automation tool. It is frequently used to run browser-based regression automation suites and tests. Just like Jenkins it is also capable of scheduling tests across a distributed build grid. This means that you can speed things up significantly by running tests against different browsers in parallel.
The Selenium Hub is the central coordination point for any browser nodes that are scheduled within the platform. It will proxy the Selenium tests to the right node based on their browser requirements.
Docker Registry and Consul
Some auxiliary services are also running in the platform that enable build artifact storage and service discovery. The Docker registry is used to store and serve image artifacts produced by the build jobs. Consul is responsible for tracking services within the platform. This ensures that services can find each other without any upfront configuration, since the scheduler is going to determine where the workload will run.
The dynamic part of the cluster will host the Selenium nodes and the Jenkins slaves. Each of these components can be scaled up or down on-demand. The Jenkins master will spin up new slaves when the build queue starts filling up. To achieve this I have created a custom Nomad cloud plugin. This plugin enables dedicated slave configurations, which can be used as a restriction in the job configuration:
When a new build slave is needed, Jenkins will send a request to Nomad to schedule an additional slave within the same cluster as where the Jenkins master is running.
The application tests are responsible for starting and stopping any required Selenium browser nodes. Again, Nomad will ultimately be responsible for the actual scheduling.
Different application environments can be started within the platform to actually use or test the application. These environments could be your traditional integration, test and acceptance environments or they might serve a different purpose altogether (feature-branch, demo, etc.). These environments are again just logical partitions within the cluster, as Nomad is actually using the same cluster resources for all of them. They are however fully independent from each other, meaning that a component in the test environment will not use a component in a different environment.
The pipeline described here is the workflow that is used in the accompanying platform. Please take a look at the Jenkins jobs for the exact implementation details:
Builds are performed either on the Jenkins master itself (for simple deployment scripts) or they are performed within a dedicated build container. Using a build container has big advantages for development teams as they can have full control over the build dependencies that their application needs. Jobs can be restricted to run in a specific build environment only. The build jobs are just calling a dedicated script in the application repository that takes care of all the build steps, thereby reducing complexity in the Jenkins configuration. Once the build is finished, the build container (Jenkins slave) will be removed, freeing up the resources and leaving the test results for further inspection.
The test job also calls out to a dedicated script in the application repository which takes care of spinning up the required browser nodes, installing dependencies and starting the Selenium tests. It is sending a Selenium node job request to Nomad so that the same cluster resources can utilized. After the browser tests have finished the nodes will be torn down, leaving only the test report for further inspection. Of course the Selenium Hub will remain available to process new tests.
There are two jobs available to start and stop an application environment. Again, the steps are defined in a dedicated application deploy script and not in the Jenkins job configuration. The deploy script performs a simple substitution to define the target environment for the Nomad jobs. This reduces maintenance of the Jenkins configuration and puts the deploy logic closer to the application. It will ask for a target environment name and spin up the application components specifically for that environment.
The beauty of this platform is, that everything is running within the same Nomad cluster. This gives us one consistent interface to run all of our services, whether they are long-running services or one-off (batch) jobs. Additionally, Nomad will schedule the workloads to optimally utilize the available resources and as an added bonus, it will also do it very fast. So to summarize, the key takeaways from this platform setup are:
- Nomad runs both the Jenkins Master and and the Jenkins slaves on the same cluster, using the Nomad cloud plugin
- The Jenkins slaves are in fact one-off containers that can be fully customized by the development teams to provide all the necessary build dependencies
- Nomad deploys both Consul for service discovery and a Docker Registry to host the images within the same cluster
- Nomad runs a fully distributed Selenium test grid within the same cluster
I hope this platform setup will be an inspiration to further explore the possibilities of scheduler-based CI/CD setups.
Together with my colleagues at Xebia, we'll be showcasing a full cloud-based, production-ready, setup of this platform at the HashiConf EU conference in Amsterdam, so come find me in the HashiCorner to learn more and discuss the possibilities!