This article aims to provide an overview of some of the problems encountered when running containers in production.
Containers isolate applications by providing separate user-spaces (rather than entirely separate operating system instances, as in full virtualisation). This can yield benefits in security, repeatability1 and efficient resource utilisation.
I know about Docker, but what is an ‘orchestration system’, and why would I need one?
An orchestration system helps you run production services, in containers, as part of clusters. They can be thought of as the next layer up in the operational stack from manual container usage.
Service Registration and Health Checks
A service might consist of one or more container definitions2 which define the container images that are to be run as well as additional metadata such as CPU and memory limits and storage attachments.
Container Orchestration systems allow registering containers as part of a service, which acts as a logical unit for autoscaling and load balancing. Services are composed of a set of containers, with the goal being to maintain a desired number of containers running. The individual containers should be considered ephemeral (a good practice in general when running server applications3) as they can be terminated and replaced at any time. The adage containers should be cattle, not pets encapsulates this philosophy.
Exposing an interface for the orchestration system to check your containers’ health is crucial for many features to work effectively. A simple HTTP endpoint can be used to check if a container responds in a timely manner with a
200 OK, indicating it is able to service user requests.
Service Discovery may also be integrated to allow your applications to find each other easily in the cluster without additional tooling.
Placement strategies allow schedulers to decide which servers4 your containers will run on.
These can vary depending on the goals of your service. You may want to spread containers as diffusely as possible across the available server pool to minimise the impact of a crashed server. Or you might want to bin pack containers into as few servers as possible to reduce costs.
Deployments & Upgrades
Real applications need to be deployed more than once. Container orchestration systems often provide mechanisms for:
- Automated blue/green5 redeployments of services, including verifying that the new containers are working before terminating all the old ones by integrating with health checks.
- Automatic restarting of crashed containers (if a whole server has crashed, for example, Docker’s built-in restart is not sufficient)
- Connection draining from old containers to avoid interruptions to user sessions.
- Rapid rollbacks if needed.
One of the big advantages of cloud computing is the ability to elastically adjust capacity based on demand, bringing cost savings in troughs and meeting demand at peak times. For container clusters, this involves adding or removing containers as well as the underlying servers which provide the resources.
Automatic scaling actions may be defined based on:
- CPU/Memory Usage - what resources are the containers actually using?
- CPU/Memory Reservation - what do the container definitions say that the containers need?
- Time schedules - if your demand is predictable you can preemptively ‘warm up’ more containers to increase service capacity.
Grouping of Containers
It is often useful to group a set of containers with different definitions together to work as a whole, for example, having a web server container and a log drain container running side-by-side. A Kubernetes pod (services are collections of pods) and an Amazon ECS task definition can both group multiple container definitions.
Notes on Software and Providers
I wrote this article as part of research into available options and am not intimately familiar with all of these products. If you spot anything I’ve written which seems incorrect, please let me know. I have used ECS most heavily out of the following.
|Kubernetes||At the heart of many other offerings, seems like a solid bet for portability is probably the most popular tool in its class.||Open Source||Google Borg|
|Docker Swarm (now part of Docker engine as of 1.12)||Open Source|
|Google Container Engine||Hosted Kubernetes with additional integrations with Google Cloud||Flat fee per cluster hour + compute||Kubernetes|
|Amazon ECS||Largely proprietary (open source ecs-agent) - heavily integrated with other AWS products (ALB, IAM, ASG)||Compute Usage Hours (EC2)||Host agent is open-source (ecs-agent)|
|Microsoft Azure Container Service||Compute Usage Hours||Docker Swarm, DC/OS, or Kubernetes|
|Apache Mesos||Not specific to containers - pitched as a ‘distributed systems kernel’ for co-ordinating compute resources generically.||Open Source|
|Marathon||Container orchestration built on Mesos.|
|Mesosphere||Makers of DC/OS (Data Center Operating System) which uses Mesos.||Enterprise (support plans & deployment footprint based)||Apache Mesos|
|Rancher||Open source with multiple base options - seems to bear some similarity to a self-hosted Azure Container Service.||Open Source & Premium Support||Kubernetes, Swarm, Mesos|
I’ve outlined some of the problems that this plethora of tools (many of which you may have heard of) are trying to solve. The feature sets are broadly similar across several of them, so I would simply advise reading the docs thoroughly and evaluating the risk of vendor lock-in when choosing how to invest your time.
- The full runtime environment of your application is defined in one place, rather than being an accumulation of scripting and manual changes to servers over time. [return]
- In Amazon ECS, these are called Task Definitions. [return]
- https://12factor.net/disposability [return]
- In Amazon ECS, these are called Container Instances. [return]
- https://martinfowler.com/bliki/BlueGreenDeployment.html [return]
A Brief Overview of Container Orchestration - Comments