Administering Dozens of Debian Servers

At work, we have quite a few Debian servers. We have a few physical machines, then a number of virtual machines running under Xen. These servers are split up mainly along task-oriented lines: DNS server, LDAP server, file server, print server, mail server, several web app servers, ERP system, and the like.

In the past, we had fewer virtual instances and combined more services into a single OS install. This led to some difficulties, especially with upgrades. If we wanted to upgrade the OS for, say, the file server, we’d have to upgrade the web apps and test them along with it at the same time. This was not a terribly sustainable approach, hence the heavier reliance on smaller virtual environments.

All these virtual environments have led to their own issues. One of them is getting security patches installed. At present, that’s a mainly manual task. In the past, I used cron-apt a bit, but it seemed to be rather fragile. I’m wondering what people are using to get security updates onto servers in an automated fashion these days.

The other issue is managing the configuration of these things. We have some bits of configuration that are pretty similar between servers — the mail system setup, for instance. Most of them are just simple SMTP clients that need to be able to send out cron reports and the like. We had tried using cfengine2 for this, but it didn’t work out well. I don’t know if it was our approach or not, but we found that hacking cfengine2 after making changes on systems was too time-consuming, and so that task slipped and eventually cfengine2 wasn’t doing what it should anymore. And that even with taking advantage of it being able to do things like put the local hostname in the right places.

I’ve thought a bit about perhaps providing some locally-built packages that establish these config files, or load them up with our defaults. That approach has worked out well for me before, though it also means that pushing out changes isn’t a simple hack of a config file somewhere anymore.

It seems like a lot of the cfengine2/bcfg tools are designed for environments where servers are more homogenous than ours. bcfg2, in particular, goes down that road; it makes it difficult to be able to log on to a web server, apt-get install a few PHP modules that we need for a random app, and just proceed.

Any suggestions?

19 thoughts on “Administering Dozens of Debian Servers

  1. I’m a fan of FAI softupdates. While my FAI configuration includes some cfengine scripts, I can also use its native lists of packages to install, copy files verbatim with fcopy, etc. As an added bonus, if a system breaks, i can just reinstall it with FAI and the configuration will automatically be up-to-date again.

  2. Hi,

    At work we have some hundred debian-boxes and using packages for the default-settings works very well.
    Using cron-apt for security-upgrades works very well also. As long as you have some sort of staging.

    1. Puppet is the way I would go as well. It is pretty simple to set up and get going. After that it is simply a matter of taking one service at a time and making it standard.

  3. I’m going to add another plug for puppet. We’re using cfengine2 to maintain the configuration of nearly 40 servers, and it works but the configuration is quite obscure. For example: we’re using it for uploading scripts that do the dirty job, instead of using cfengine2 directly. Puppet is as potent as cfengine2, minus the complexity (but not that it’s not complex too, be warned). It’s more user friendly, so to speak. We’re starting to deploy it, firstly at the lab, but I think that you should look into it at least. Have a look specially at how easy is to maintain certain packages installed or uninstalled in the system, and probably that’ll hook you enough for trying more complex things.

  4. We use our own package repository that has meta-packages, (a few of our servers have the same configuration) We use nagios to monitor our servers, one of the checks is an apt update checker. It distinguishes between security updates and normal upgrades.

    If necessery, we then manually upgrade machines. This is not that hard, escpecially when using the cssh (cluster ssh) client. It’s a type-once-send-to-all ssh terminal.

  5. For a time I also had more and more problems to have our hundred of Debian servers up to date and searched for a solution.

    I found “apt-dater” which was the tool I need (see ibh.de/apt-dater or sf.net..).

    I also packaged it for Debian but while Lenny was froozen in this time, it is just in Sid.. Bad luck.

    If you are interested I also have backports for Etch and Lenny, both i386 and amd64.

  6. You use cfengine2 in a wrong way when you hack cfengine after making changes. You should hack cfengine first and then run it to make changes. The same applies to the puppet as well. I did the same mistake in the beginning and found cfengine very frustrating to use. Now it is a breeze to manage 20+ physical servers (as a part time duty ;-) with FAI and cfengine.

    1. The problem with that is that it leads to a significant reduction in agility. If I go to a system, I can apt-get install a package, answer debconf questions, and it’s ready to use. If I use cfengine, I’d have to have a test environment first, install something there, see if it asks questions, figure out the correct way to handle it if it does, etc.

      That’s a lot of work for one-off packages. Maybe it makes sense for things like Exim that are everywhere… but then if it’s everywhere, that’s not the sort of problem I’m talking about.

  7. Hi,

    It depends on what you want to do. You can automate things with puppet (written in ruby) or bcfg2 (in python), or you can do some manual stuff with python pexpect (and pxssh), and cssh( cluster ssh).

    You might also want to track the configuration files with a scm like git.

  8. Here’s my solution:
    * One Admin-Server, especially secured, with a ssh-key with no passphrase (seems most of you have this setup already).
    * List of managed Servers on this Admin-Server.
    * script: Every 5 minutes (cron), ssh (in parallel) to all servers in the list, execute a few commands (df, free, uptime, ..) and make a few remote checks (ping, http). If something changes, report this in ONE email for all servers.
    * script: Once a day (cron), ssh to alle servers (sequential ) and perform a apt-get update && apt-get -d upgrade, parse output, and only if updates are waiting report this in ONE email for all servers. Also execute a few other checks like getting the debian version, the location of the server, if its on hardware or a virtual one and so on. The results are saved in the server list.

    Script for manually updating all servers:
    For each server in the server list, do a “ssh -t $SERVER $UPDATE”, where $UPDATE is apt-get update, aptitude update or aptitude safe-upgrade depending on the debian version. With “ssh -t $SERVER” you can perform interactive things on each server in a row.

    To manage the configuration or installation of special things first i document them in a format i can make a script of. Something like:
    aptitude install xyz
    CONFIG=/etc/xyz.conf
    test -f $CONFIG.original || cp -a $CONFIG $CONFIG.original
    echo -e “1c1
    OPTION=on” | patch $CONFIG

    If i need this configuration frequently, i put this in a script which i can execute remotely in the form:
    scp $PATHTO$SCRIPT $SERVER:/tmp/ && ssh -t $SERVER /tmp/$SCRIPT

    With this method i have full control over heterogeneous installations while doing a lot automated but doing changes only while i’m sitting in front of.

  9. Bcfg2 is actually designed for non-homogeneous environments, but this has caused some of the processes to be somewhat heavyweight. One of the things that has changed in bcfg2 since the discussion last year is that bcfg2 has gotten a lot more capable at pulling changes from clients. This allows you to perform manual administration on clients, and let bcfg2 detect the changes. Then you can associate those configuration changes with the appropriate client or sets of clients.

  10. We use some Debian machines (virtual and physical) and use apticron for updates and a svn repository with shared configuration files and etckeeper for local changes. Some small manual work is needed from time to time but we are satisfied with the process.

  11. I’m running along with puppet with configuration management & capistrano for running commands on hosts. It may sound funny to use capistrano to do some system administration but it does its job. Grab some keyboardcast fun to, when a few machines are involved ;)

  12. I was wondering if you’ve found a solution yet? I was talking with someone yesterday evening about config management, primarily for 40+ Debian boxes (primarily VMs), but with some RHEL 3-5, SLES, OpenSuSE, Solaris 7-10, and AIX boxes thrown in for flavoring, and they directed me here, saying “john is smart so i’d do whatever he does”.

    I’ve been contemplating puppet, cfengine, and/or bcfg2, but haven’t moved past the navel-gazing aspects yet. I do know that the manual process and local “meta” packages for the Debian side is untenable going forward, as it’s already getting difficult to deal with updates and especially the etch -> lenny transition. A lot of the machines are web stuff, but like you, several have one-off packages needed for specific apps.

    So have you come to any decisions yet, or is it still a matter of nothing quite meeting your needs?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.