Fixing the Problems with Docker Images

I recently wrote about the challenges in securing Docker container contents, and in particular with keeping up-to-date with security patches from all over the Internet.

Today I want to fix that.

Besides security, there is a second problem: the common way of running things in Docker pretends to provide a traditional POSIX API and environment, but really doesn’t. This is a big deal.

Before diving into that, I want to explain something: I have often heard it said the Docker provides single-process containers. This is unambiguously false in almost every case. Any time you have a shell script inside Docker that calls cp or even ls, you are running a second process. Web servers from Apache to whatever else use processes or threads of various types to service multiple connections at once. Many Docker containers are single-application, but a process is a core part of the POSIX API, and very little software would work if it was limited to a single process. So this is my little plea for more precise language. OK, soapbox mode off.

Now then, in a traditional Linux environment, besides your application, there are other key components of the system. These are usually missing in Docker containers.

So today, I will fix this also.

In my docker-debian-base images, I have prepared a system that still has only 11MB RAM overhead, makes minimal changes on top of Debian, and yet provides a very complete environment and API. Here’s what you get:

  • A real init system, capable of running standard startup scripts without modification, and solving the nasty Docker zombie reaping problem.
  • Working syslog, which can either export all logs to Docker’s logging infrastructure, or keep them within the container, depending on your preferences.
  • Working real schedulers (cron, anacron, and at), plus at least the standard logrotate utility to help prevent log files inside the container from becoming huge.

The above goes into my “minimal” image. Additional images add layers on top of it, and here are some of the features they add:

  • A real SMTP agent (exim4-daemon-light) so that cron and friends can actually send you mail
  • SSH client and server (optionally exposed to the Internet)
  • Automatic security patching via unattended-upgrades and needsrestart

All of the above, including the optional features, has an 11MB overhead on start. Not bad for so much, right?

From here, you can layer on top all your usual Dockery things. You can still run one application per container. But you can now make sure your disk doesn’t fill up from logs, run your database vacuuming commands at will, have your blog download its RSS feeds every few minutes, etc — all from within the container, as it should be. Furthermore, you don’t have to reinvent the wheel, because Debian already ships with things to take care of a lot of this out of the box — and now those tools will just work.

There is some popular work done in this area already by phusion’s baseimage-docker. However, I made my own for these reasons:

  • I wanted something based on Debian rather than Ubuntu
  • By using sysvinit rather than runit, the OS default init scripts can be used unmodified, reducing the administrative burden on container builders
  • Phusion’s system is, for some reason, not auto-built on the Docker hub. Mine is, so it will be automatically revised whenever the underlying Debian system, or the Github repository, is.

Finally a word on the choice to use sysvinit. It would have been simpler to use systemd here, since it is the default in Debian these days. Unfortunately, systemd requires you to poke some holes in the Docker security model, as well as mount a cgroups filesystem from the host. I didn’t consider this acceptable, and sysvinit ran without these workarounds, so I went with it.

With all this, Docker becomes a viable replacement for KVM for various services on my internal networks. I’ll be writing about that later.

5 thoughts on “Fixing the Problems with Docker Images

  1. I just couldn’t get my head around what Docker is trying to do, providing tutorials on how to build images but not how to follow packaged security updates. Snip pages of ranting on “Worse is better”.

    I appreciate this showing all the tools you’re using. I need to have a look at debsecan and debian-security-support.

    There’s several similar articles (including on Planet Debian). What would bother me is that in trying to colonize Docker Hub, you pit your own opinions against the opinions embodied in Docker. At best, you end up with a README entitled A Declaration of Independence. And the compromise about not being able to use systemd is a red flag for me.

    Myself, I ran out of steam trying to decide how to trigger re-building dependent images. (This was locally; I wasn’t building on hub.docker.com). More recently I’ve been playing with systemd-nspawn containers.

    Since you’re a Debian fan, have you looked at LXC containers?

    Are there specific features in Docker containers, that you like to take advantage of?

    Reply

    John Goerzen Reply:

    I think Docker has made a very interesting tool, that has a lot of good to it. But I simultaneously think that many people using it are trying to throw away 40 years of engineering in the POSIX space. My point is that this throwing out isn’t necessary, and Docker can actually work well providing a full environment.

    I have tried LXC. I have found it generally somewhat buggy and half-baked. Docker has been very solid, carefully documented, and doing what it says it will do, precisely. Add in all the orchestration for building images and Docker is pretty nice. Fundamentally they build the same kinds of containers with the same kinds of tools, though.

    Reply

  2. You may (or may not) be interested in my putative attempts to address some of joeyh’s initial concerns re docker (and their debian images): http://github.com/jmtd/debian-docker (joeyh blog post: http://joeyh.name/blog/entry/docker_run_debian/)

    However I have not attempted what are you trying here wrt init, logging, rotation, cron etc. Personally I don’t think they belong inside every container but I appreciate that docker et al need to offer comprehensive solutions (within the microservices philosophy) to the problems that you are trying to address.

    Reply

    John Goerzen Reply:

    Ah, very nice, debootstrapping directly. One other option I’ve thought of would be to download the official OpenStack images – but they’re much larger, I think.

    My complaint is that syslog, cron, logrotate, etc account for — all together — less than 10MB of RAM. So the cost of providing a full API is very small.

    Reply

  3. Hello John,
    I must admit that I’m quite new to Docker, and therefore my question may not be adequate to your blog. However I trust in you to get a qualified response to it.
    So, my use case for Docker is to get scalable microservices; the first microservice I need in my environment is a reverse proxy with SSL termination. Currently I use HAProxy installed in LXC, and I want to “migrate” this service to Docker.
    Question:
    Do I need to create a container based on Debian as documented here: https://docs.docker.com/engine/examples/apt-cacher-ng/?
    I was assuming that I can run HAProxy as a service in Docker w/o any Debian or Ubuntu.
    THX

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *