WIP: Up the learning curve on containers

This is a Work In Progress (WIP) post. I explain what I’m working on that is related to automating software development, or related to software development in general.

This post was edited to improve clarity. 2016-05-06.

The usual use cases that illustrate why containers and container technologies are important and useful are production use cases, deployment and operations use cases. But I’m interested in using container technologies for development and testing. I love the idea that I could spin up project-specific build and test environments using small and lightweight containers rather than slow and heavyweight virtual machines.

On the learning curve

I’m currently employed as a Windows developer and I use OS X for anything outside of that. Containers are not native to either of those platforms; containers are native to Linux and a few other Unix variants. So a Linux virtual machine (VM) is required when using a Windows or OS X host. Which is sad because that generally means the container host will be constrained to fewer CPUs and less RAM than is available on my PC (Windows) or Mac (OS X). I use devlopment systems with 32GB RAM and at least 8 CPU cores. In contrast, the Linux VMs I use to host containers are generally configured with 2 cores and 4GB RAM.

I’ve used Linux off and on for many years, but don’t have deep knowledge of the platform. So that means I’m on the container learning curve and the Linux learning curve.

Thankfully, the fine folks at Docker have developed the Docker Toolbox that makes it relatively easy for Windows and OS X developers to get a container environment hosted on a Linux VM and work with it from a Windows or OS X command line. So that’s what I’m currently doing.

Note: Docker Toolbox uses VirtualBox to provide VMs. But you can use Parallels (on OS X) and Hyper-V (on Windows). I haven’t tried running a Docker VM in Parallels yet, but I do use Hyper-V VMs. See my article about how to get the Docker Toolbox working with Hyper-V.

Resources

I’m working through The Docker Book, by James Turnbull. It’s quite good.

Because I’m interested in using containers to host development tools and do development testing, I’m also looking at this walk-through of provisioning a Node.js development environment using containers: Lessons from Building a Node App in Docker.

Some progress

In Lessons from Building a Node App in Docker, Dr. Lees-Miller references a more gentle introduction to containers and Docker, a slide deck that he put together a couple years ago. I’ve started working through that introduction before delving into the Node App walk-through; I thought I’d share a bit of that.

Here’s the Dockerfile used in the first example from the gentle introduction presentation (from the example code referenced in the slide deck):

FROM ubuntu:14.04

MAINTAINER overleaf <team@overleaf.io>

RUN apt-get update && apt-get -y upgrade

RUN apt-get install -y ruby

RUN gem install sinatra

RUN mkdir /app

ADD hello_world.rb /app/

ENV PORT 3000
ENV RACK_ENV production

A Dockerfile is a recipe for building a container image. See the DockerBook and the Docker documentation to learn about Dockerfiles.

If you build an image using that Dockerfile today, you’ll likely see errors and warnings.

Here is what I did to improve the Dockerfile and produce no warnings. The comments should be sufficient to see what I changed and why. This could be further optimized.

# Base images can be found here: https://hub.docker.com/_/ubuntu/
# Ubuntu 14.04 LTS (Trusty Tahr) is supported through 2019.
# Ubuntu 16.04 LTS (Xenial Xerus) has just been released and probably should
#   not be used yet (because dependent software may need to be updated and
#   made compatible).
#
FROM ubuntu:14.04.4

MAINTAINER  Dave Hein <jenesuispasdave@gmail.com>

# To void the cache beyond the base image, change REFRESHED_AT to current time
#
ENV REFRESHED_AT 2016-05-04T20:35-0500

# Setting this environment variable prevents errors during package installs
# that look like:
#
# debconf: unable to initialize frontend: Dialog
# debconf: (TERM is not set, so the dialog frontend is not usable.)
# debconf: falling back to frontend: Readline
#
# As per: http://stackoverflow.com/a/35976127/1392864
#
ARG DEBIAN_FRONTEND=noninteractive

# Update apt package info and upgrade installed packages (base image
# has some packages installed)
#
RUN apt-get update && apt-get -y upgrade

# Install Ruby and Sinatra (latest of each)
#
# Note: need ruby-dev and make to build the native extensions for rdoc
# Note: need rdoc to avoid errors when installing sinatra, errors like:
#
#   unable to convert "\xC3" to UTF-8 in conversion from ASCII-8BIT to UTF-8 to US-ASCII for README.rdoc, skipping
#
# Alternatively we could "gem install sinatra --no-rdoc --no-ri". That would also
# take less time (sinatra is pretty big as it is).
#
RUN apt-get install -y ruby
RUN apt-get install -y ruby-dev
RUN apt-get install -y make
RUN gem update
RUN gem install rdoc && gem rdoc --all --overwrite
RUN gem install sinatra

# Add the web app to the image
#
RUN mkdir /app
ADD hello_world.rb /app/

# Set some environment variables
#
ENV PORT 8000
ENV RACK_ENV production

Performance issue with image building

When I built the image using that Dockerfile, I found that the step to install sinatra took a very long time and used a lot of CPU. When I looked at the activity on my system, I saw that the Terminal window I was running the docker build command from was running at 100% CPU.

That was strange because I’d have expected the VirtualBox VM process to be using all the CPU. But instead it seems that the docker client process was using all the CPU.

I want to investigate that CPU usage issue further. In particular, I will try building the image directly on the Docker VM, rather than using a remote docker client.

More when I know it …