A Brief Overview of Container Orchestration

4 March 2017

This article aims to provide an overview of some of the problems encountered when running containers in production.

Containers isolate applications by providing separate user-spaces (rather than entirely separate operating system instances, as in full virtualisation). This can yield benefits in security, repeatability1 and efficient resource utilisation.

I know about Docker, but what is an ‘orchestration system’, and why would I need one?

An orchestration system helps you run production services, in containers, as part of clusters. They can be thought of as the next layer up in the operational stack from manual container usage.

Service Registration and Health Checks

A service might consist of one or more container definitions2 which define the container images that are to be run as well as additional metadata such as CPU and memory limits and storage attachments.

Container Orchestration systems allow registering containers as part of a service, which acts as a logical unit for autoscaling and load balancing. Services are composed of a set of containers, with the goal being to maintain a desired number of containers running. The individual containers should be considered ephemeral (a good practice in general when running server applications3) as they can be terminated and replaced at any time. The adage containers should be cattle, not pets encapsulates this philosophy.

Exposing an interface for the orchestration system to check your containers’ health is crucial for many features to work effectively. A simple HTTP endpoint can be used to check if a container responds in a timely manner with a 200 OK, indicating it is able to service user requests.

Service Discovery may also be integrated to allow your applications to find each other easily in the cluster without additional tooling.

Scheduling

Placement strategies allow schedulers to decide which servers4 your containers will run on.

These can vary depending on the goals of your service. You may want to spread containers as diffusely as possible across the available server pool to minimise the impact of a crashed server. Or you might want to bin pack containers into as few servers as possible to reduce costs.

Deployments & Upgrades

Real applications need to be deployed more than once. Container orchestration systems often provide mechanisms for:

  • Automated blue/green5 redeployments of services, including verifying that the new containers are working before terminating all the old ones by integrating with health checks.
  • Automatic restarting of crashed containers (if a whole server has crashed, for example, Docker’s built-in restart is not sufficient)
  • Connection draining from old containers to avoid interruptions to user sessions.
  • Rapid rollbacks if needed.

Auto Scaling

One of the big advantages of cloud computing is the ability to elastically adjust capacity based on demand, bringing cost savings in troughs and meeting demand at peak times. For container clusters, this involves adding or removing containers as well as the underlying servers which provide the resources.

Automatic scaling actions may be defined based on:

  • CPU/Memory Usage - what resources are the containers actually using?
  • CPU/Memory Reservation - what do the container definitions say that the containers need?
  • Time schedules - if your demand is predictable you can preemptively ‘warm up’ more containers to increase service capacity.

Grouping of Containers

It is often useful to group a set of containers with different definitions together to work as a whole, for example, having a web server container and a log drain container running side-by-side. A Kubernetes pod (services are collections of pods) and an Amazon ECS task definition can both group multiple container definitions.

Notes on Software and Providers

I wrote this article as part of research into available options and am not intimately familiar with all of these products. If you spot anything I’ve written which seems incorrect, please let me know. I have used ECS most heavily out of the following.

Product Notes Billing Related
Kubernetes At the heart of many other offerings, seems like a solid bet for portability is probably the most popular tool in its class. Open Source Google Borg
Docker Swarm (now part of Docker engine as of 1.12) Open Source
Google Container Engine Hosted Kubernetes with additional integrations with Google Cloud Flat fee per cluster hour + compute Kubernetes
Amazon ECS Largely proprietary (open source ecs-agent) - heavily integrated with other AWS products (ALB, IAM, ASG) Compute Usage Hours (EC2) Host agent is open-source (ecs-agent)
Microsoft Azure Container Service Compute Usage Hours Docker Swarm, DC/OS, or Kubernetes
Apache Mesos Not specific to containers - pitched as a ‘distributed systems kernel’ for co-ordinating compute resources generically. Open Source
Marathon Container orchestration built on Mesos.
Mesosphere Makers of DC/OS (Data Center Operating System) which uses Mesos. Enterprise (support plans & deployment footprint based) Apache Mesos
Rancher Open source with multiple base options - seems to bear some similarity to a self-hosted Azure Container Service. Open Source & Premium Support Kubernetes, Swarm, Mesos

Conclusion

I’ve outlined some of the problems that this plethora of tools (many of which you may have heard of) are trying to solve. The feature sets are broadly similar across several of them, so I would simply advise reading the docs thoroughly and evaluating the risk of vendor lock-in when choosing how to invest your time.


  1. The full runtime environment of your application is defined in one place, rather than being an accumulation of scripting and manual changes to servers over time. [return]
  2. In Amazon ECS, these are called Task Definitions. [return]
  3. https://12factor.net/disposability [return]
  4. In Amazon ECS, these are called Container Instances. [return]
  5. https://martinfowler.com/bliki/BlueGreenDeployment.html [return]

A Brief Overview of Container Orchestration - Comments

Enable remote SSH access to Ubuntu 14.04 LTS Live

18 October 2015

Steps to enable remote SSH access to a computer running Ubuntu 14.04 Live. Useful for helping non-technical people remotely:

# Press windows key or click the top left, type 'Term'. Open 'Terminal'

sudo -i

apt-get update -y && apt-get -y install openssh-server
passwd root

# Type a password, press enter. Retype it, press enter

sed -i 's/PermitRootLogin .*/PermitRootLogin yes/g' /etc/ssh/sshd_config

service ssh restart

# Get their IP
curl ifconfig.co

# Setup port forwarding on their router to get access
# ssh [email protected]
# Enable public key auth only, create a new user and disable root login when you have gained access

Enable remote SSH access to Ubuntu 14.04 LTS Live - Comments

OpenVPN with DNS AdBlocking using Docker

18 October 2015

OpenVPN and DNS AdBlocking is a useful way to block ads on your smartphone without having to root it. This post describes how to setup such a service on your own server.

The idea is to set a DNS server in your OpenVPN DHCP options to push to clients. The DNS server runs in another Docker container and uses hosts files to block ads, trackers etc.

  1. See https://www.digitalocean.com/community/tutorials/how-to-set-up-an-openvpn-server-on-ubuntu-14-04 as an example of how to set up an OpenVPN Docker container on a Ubuntu VPS. At the ovpn_genconfig step, set -n 8.8.8.8 so there is only a single placeholder DNS server to overwrite later on. Otherwise your settings will fallback to Google’s secondary DNS.

  2. Setup the DNS container, this uses dnsmasq to block the bad hosts:

    git clone https://github.com/arthurkay/sagittarius-A && cd sagittarius-A && ./build.sh
    
  3. Run the dnsmasq container:

    docker rm saga-dns; docker run --restart=always --name=saga-dns --expose 53 --cap-add=NET_ADMIN arthurkay/sagittarius-a &
    

We expose port 53 explicitly as the file does not currently contain an EXPOSE directive.

  1. Run the OpenVPN container, linking to the saga-dns container:

    docker rm openvpn; docker run --restart=always --volumes-from ovpn-data --name openvpn --link saga-dns:saga-dns -p 1194:1194/udp --cap-add=NET_ADMIN kylemanna/openvpn bash -c 'sed -i -E "s/(push dhcp-option DNS).*/\1 $SAGA_DNS_PORT_53_TCP_ADDR/" /etc/openvpn/openvpn.conf && ovpn_run' &
    

This updates the saga-dns container’s IP in the OpenVPN config before running OpenVPN.

(Hopefully) enjoy much faster browsing and less tracking on your mobile devices.


OpenVPN with DNS AdBlocking using Docker - Comments

Delete fdupes duplicates by directory

24 May 2015

A quick script to process fdupes output and allow interactive selection of files to delete. Differs from the built-in fdupes prompts in that you can select directories to condemn.


Delete fdupes duplicates by directory - Comments

Fixing php5-fpm and Apache hanging with WordPress

23 March 2015

I had issues with Apache periodically hanging (failing to deliver a response body to any requests) on all my vhosts. This turned out to be solved by restarting php5-fpm. I enabled the slowlog in php5-fpm to try and find out which scripts were stalling:

sudo mkdir -p /var/log/php5-fpm
sudo vim /etc/php5/fpm/pool.d/www.conf
; The log file for slow requests
; Default Value: not set
; Note: slowlog is mandatory if request_slowlog_timeout is set
slowlog = /var/log/php5-fpm/$pool.log.slow

; The timeout for serving a single request after which a PHP backtrace will be
; dumped to the 'slowlog' file. A value of '0s' means 'off'.
; Available units: s(econds)(default), m(inutes), h(ours), or d(ays)
; Default Value: 0
request_slowlog_timeout = 5s

After a day or so I read the logs and found lots of slow requests to xmlrpc.php for WordPress vhosts.

A crude but effective solution is to block requests to the XML-RPC and Trackback APIs. These features are sometimes targeted by bots for brute force login attempts. I do not use them so I don’t mind disabling them entirely.

Edit your Apache vhost configuration (or .htaccess if you don’t have access to this):

<FilesMatch "^(xmlrpc\.php|wp-trackback\.php)">
Order Deny,Allow
Deny from all
#Allow from x.x.x.x
</FilesMatch>

I noticed considerably lower latency when serving requests to PHP pages after this change.


Fixing php5-fpm and Apache hanging with WordPress - Comments