In the last couple years Docker and other container technologies have seen a lot of interest and adoption. They provide a simple interface and API for creating self contained applications that once built run pretty much anywhere. Conceptually, a container isn’t much different from a static binary or a super jar. It’s a bundle of files and configurations necessary to run one or more processes, though as implemented today does provide a fair bit of isolation necessary to prevent accidental interference between applications. With this ease of packaging and running applications has come an increase in the speed with which developers expect to move both when creating software and deploying it. This desire to move faster in production has led to a number of cluster management systems focused on deploying Docker containers.
Before we dive deeper, if you’re just interested in seeing the code, head over to Github and checkout out ecs_state.
While there are many cluster management systems on the market today, I’ll be focusing on EC2 Container Service(ECS) from AWS. I’m very familiar with it as I was a founding member of the team and wrote large portions of the cluster management and task placement systems. It is a “shared state” cluster management system informed by the learnings shared by Google in their Omega paper. The core concept of a system built in this manner is that the cluster manager will share what is happening in the cluster to anyone or anything that asks. In ECS, this means that anyone, or anything which calls the List and Describe APIs can inspect what is happening not just with their own resources, but with all resources available in the cluster. Armed with this information, a decision can then be made about how to modify this state, either by starting or stopping a Docker container somewhere in the cluster. A human performing these actions can be a bit time consuming, but a machine performing these actions is referred to as a “scheduler”. The scheduler reads in data about cluster state, and then stops and starts Docker containers based on a set of rules. In the case where multiple schedulers race for the same resources, the cluster manager must choose a winner and reject the other request for placement in the cluster.
The most fundamental way to place a task in an ECS cluster is to use the List and Describe APIs, apply some logic, and then call StartTask which takes as arguments a TaskDefinition(a manifest containing the Docker image to run, the resources to use, and other configuration), as well as the identifier for the machine where it should be started. By allowing for direct placement, even applications with complex requirements that may only be capable of running on a very specific machine can pick that spot through this process. Many applications however don’t need this form of control, and ECS provides us with some concrete examples of what a scheduler might look like for these applications. In ECS, there is an API called RunTask
which is described in the documentation as “Start a task using random placement and the default Amazon ECS scheduler.” When calling this API we can see it returns quickly with details about the task that was placed, or provides us with an error like reason: RESOURCE:CPU
meaning it could not find a location for the task because the CPU resource constraint could not be satisfied. This is currently a simple scheduler that inspects the state of the cluster, finds which machines in the cluster can accept the task placement, and then randomly places up to 10 tasks on the available machines. It’s simple, but for a quick job like a build task, or image processing, the location of placement may not matter as much as just quickly finding somewhere for the task to run.
Another scheduler that is part of the ECS APIs is the Service Scheduler. It is built to run long running tasks within the cluster, restart them if they stop(possibly if the machine they’re running on crashes), and optionally manage the lifecycle necessary to place those tasks behind an Elastic Load Balancer. It even allows for rolling zero downtime updates to your containers while properly draining connections from the load balancers. There’s really no magic involved though, and nothing is happening that couldn’t be done by a user external to ECS. When starting a service, ECS expects a role to be provided which allows for describing and registering machines behind an Elastic Load Balancer. Currently, if you start a service with a load balancer and have Cloudtrail Logs enabled, you can see the DescribeInstanceHealth and RegisterInstancesWithLoadBalancer calls being made by the ECS scheduler as tasks are started and stopped. Just as with the RunTask
API, the Service Scheduler is inspecting cluster state, and then making decisions about where and when to start and stop tasks, the logic is more complex, but is still just List, Describe, Start, and Stop.
One of the benefits of this shared state model is that each of these schedulers can be developed and released independently. In previous types of cluster management systems, the state was often stored in a single location which would then add more and more specialized logic culminating in a giant ball of spaghetti that was slow to modify as new customer requests arrived. One drawback however is that this logic to inspect state and query it is now duplicated in many separate schedulers. In order to reduce this effort I’m releasing ecs_state which is a small Go library that uses the ECS List and Describe APIs to store information about running tasks and available resources in memory in sqlite. There are a set of APIs to allow control over when to refresh state, as well as an API to search for machines with the resources available to accept the task. Further logic and filtering can then be applied in memory before finally calling the StartTask
or StopTask
APIs. The ECS forums have seen quite a few requests for schedulers that run a task once on every machine, or run tasks at specific times like cron, I’m hoping with a little bit of a headstart that these schedulers and others will become simpler and quicker to create.