Category Archives: Uncategorized

Beauty Breaks Through

IMG_8795_v1

Two years ago, I was in the middle of the forest in rural southern Indiana. It was a time of hope – of defeating racism, sexism, xenophobia. Hope for affordable health care, for peace, for care for the young and the old. Then I woke up, in that beautiful place, to the news that Donald Trump would be president. Trump. President.

A few days later, I wrote Morning In The Skies, which included, in part:

Not long after the election, I got in a plane, pushed in the throttle, and started the takeoff roll down a runway in the midst of an Indiana forest. The skies were the best kind of clear blue, and pretty soon I lifted off and could see for miles. Off in the distance, I could see the last cottony remnants of the morning’s fog, lying still in the valleys, surrounding the little farms and houses as if to give them a loving hug. Wow.

Sometimes the flight is bumpy. Sometimes the weather doesn’t cooperate, and it doesn’t happen at all. Sometimes you can fly across four large states and it feels as smooth as glass the whole way.

Whatever happens, at the end of the day, the magic flying carpet machine gets locked up again. We go home, rest our heads on our soft pillows, and if we so choose, remember the beauty we experienced that day.

Really, this post is not about being a pilot. This post is a reminder to pay attention to all that is beautiful in this world. It surrounds us; the smell of pine trees in the forest, the delight in the faces of children, the gentle breeze in our hair, the kind word from a stranger, the very sunrise.

I hope that more of us will pay attention to the moments of clear skies and wind at our back. Even at those moments when we pull the hangar door shut.

For two years, I have often reflected on the bittersweet memories of that trip to Indiana. But for some reason, I hadn’t shared that photo until today. That beautiful valley-hugging fog is what you see above.

These last two years have been — well, full. Full of hate, even of death in the wake of several racist murders. But that’s not all. These years have also been full of an awakening, a swelling of people that care. People that care enough to do something. All across the country, people have risen up to send the message: “Trumpism is not American.” My own family did something we never had before: joined a protest, against families being separated. I and many others knocked on doors and made phone calls for the first time. Millions of Americans care and are doing something. We have seen the true colors of what the GOP has become, and it’s ugly, but people care. What’s more, we’ve won the first battle. Here in what the media often calls “deep-red Kansas”, we will have a Democratic governor. Racism and vote suppression has been sent packing, here in Kansas.

We have a powerful reminder that part of what makes this world beautiful is its people. People that go knock on doors in the cold. People that drive people to voting places. People that care about health care for others, about food for others, about education, intact families, refugees, and the earth itself. People that know the fight has just begun and are going to be there fighting for what is right and just for years to come. People that make the world beautiful.

Syncing with a memory: a unique use of tar –listed-incremental

I have a Nextcloud instance that various things automatically upload photos to. These automatic folders sync to a directory on my desktop. I wanted to pull things out of that directory without deleting them, and only once. (My wife might move them out of the directory on her computer, and I might arrange them into targets on my end.)

In other words, I wanted to copy a file from a source to a destination, but remember what had been copied before so it only ever copies once.

rsync doesn’t quite do this. But it turns out that tar’s listed-incremental feature can do exactly that. Ordinarily, it would delete files that were deleted on the source. But if we make the tar file with the incremental option, but extract it without, it doesn’t try to delete anything at extract time.

Here’s my synconce script:

#!/bin/bash

set -e

if [ -z "$3" ]; then
    echo "Syntax: $0 snapshotfile sourcedir destdir"
    exit 5
fi

SNAPFILE="$(realpath "$1")"
SRCDIR="$2"
DESTDIR="$(realpath "$3")"

cd "$SRCDIR"
if [ -e "$SNAPFILE" ]; then
    cp "$SNAPFILE" "${SNAPFILE}.new"
fi
tar "--listed-incremental=${SNAPFILE}.new" -cpf - . | \
    tar -xf - -C "$DESTDIR"
mv "${SNAPFILE}.new" "${SNAPFILE}"

Just have the snapshotfile be outside both the sourcedir and destdir and you’re good to go!

Running Digikam inside Docker

After my recent complaint about AppImage, I thought I’d describe how I solved my problem. I needed a small patch to Digikam, which was already in Debian’s 5.9.0 package, and the thought of rebuilding the AppImage was… unpleasant.

I thought – why not just run it inside Buster in Docker? There are various sources on the Internet for X11 apps in Docker. It took a little twiddling to make it work, but I did.

My Dockerfile was pretty simple:

FROM debian:buster
MAINTAINER John Goerzen 

RUN apt-get update && \
    apt-get -yu dist-upgrade && \
    apt-get --install-recommends -y install firefox-esr digikam digikam-doc \
         ffmpegthumbs imagemagick minidlna hugin enblend enfuse minidlna pulseaudio \
         strace xterm less breeze && \
    apt-get clean && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
RUN adduser --disabled-password --uid 1000 --gecos "John Goerzen" jgoerzen && \
    rm -r /home/jgoerzen/.[a-z]*
RUN rm /etc/machine-id
CMD /usr/bin/docker

RUN mkdir -p /nfs/personalmedia /run/user/1000 && chown -R jgoerzen:jgoerzen /nfs /run/user/1000

I basically create the container and my account in it.

Then this script starts up Digikam:

#!/bin/bash

set -e

# This will be unnecessary with docker 18.04 theoretically....  --privileged see
# https://stackoverflow.com/questions/48995826/which-capabilities-are-needed-for-statx-to-stop-giving-eperm
# and https://bugs.launchpad.net/ubuntu/+source/docker.io/+bug/1755250

docker run -ti \
       -v /tmp/.X11-unix:/tmp/.X11-unix -v "/run/user/1000/pulse:/run/user/1000/pulse" -v /etc/machine-id:/etc/machine-id \
       -v /etc/localtime:/etc/localtime \
       -v /dev/shm:/dev/shm -v /var/lib/dbus:/var/lib/dbus -v /var/run/dbus:/var/run/dbus -v /run/user/1000/bus:/run/user/1000/bus  \
       -v "$HOME:$HOME" -v "/nfs/personalmedia/Pictures:/nfs/personalmedia/Pictures" \
     -e DISPLAY="$DISPLAY" \
     -e XDG_RUNTIME_DIR="$XDG_RUNTIME_DIR" \
     -e DBUS_SESSION_BUS_ADDRESS="$DBUS_SESSION_BUS_ADDRESS" \
     -e LANG="$LANG" \
     --user "$USER" \
     --hostname=digikam \
     --name=digikam \
     --privileged \
     --rm \
     jgoerzen/digikam "$@"  /usr/bin/digikam

The goal here was not total security isolation; if it had been, then all the dbus mounting and $HOME mounting was a poor idea. But as an alternative to AppImage — well, it worked perfectly. I could even get security updates if I wanted.

Remembering Tom Wallis, The System Administrator That Made The World Better

I never asked Tom why he hired me.

I was barely 17 at the time – already a Debian developer, full of more enthusiasm than experience, and Tom offered me a job. It was my first real sysadmin job, and to my delight, I got to work with Unix. For two years, I was the part-time assistant systems administrator for the Computer Science department at Wichita State University. And Tom was my boss, mentor, and old friend. Tom was walking proof that a system administrator can make the world a better place.

That amazing time was two decades ago now. And in the time since, every so often Tom and I would exchange emails. I enjoyed occasionally dropping by his office at work and surprising him.

So it was a shock to get an email this week that Tom had married for the first time at age 54, and passed away four days later due to a boating accident while on his honeymoon.

Tom was a man with a big laugh and an even bigger heart. When I started a Linux Users Group (LUG) on campus, there was Tom – helping to arrange a place to meet, Internet access when we needed it, and gave his evenings to simply be present and a supporter.

I had (and still have) a passion for Free/Open Source software. Linux was just new at the time, and was barely present in the department when I started. I was fortunate that CS was the “little dept. that could” back then, with wonderful people but not a lot of money, so a free operating system helped with a lot of problems. Tom supported me in my enthusiasm to introduce Debian all over the place. His trust meant much, and brought out the best in me.

I learned a lot from Tom, and more than just technology. A state university can be heavily bureaucratic place at times. Tom was friends with every “can-do” person on campus, it seemed, and they all managed to pull through and get things done – sometimes working around policies that were a challenge.

I have sometimes wondered if I am doing enough, making a big enough difference in the world. Does a programmer really make a difference in people’s lives?

Tom Wallis is proof that the answer is yes. From the stories I heard at his funeral today, I can only guess how many other lives he touched.

This week, Tom gave me one final gift: a powerful reminder that sysadmins and developers can make the world a better place, can touch people’s lives. I hope Tom knew how much I appreciated him. If I find a way to make a difference in someone’s life — maybe an intern I’ve hired, or someone I take flying — than I will have found a way to pass on Tom’s gift to another, and I hope I can.

tom-wallis-penguin

(This penguin was sitting out on the table of memorabilia from Tom today. I remember it from a shelf in his office.)

The Eclipse

Highway US-81 in northern Kansas and southern Nebraska is normally a pleasant, sleepy sort of drive. It was upgraded to a 4-lane road not too long ago, but as far as 4-lane roads go, its traffic is typically light. For drives from Kansas to South Dakota, it makes a pleasant route.

Yesterday was eclipse day. I strongly suspect that highway 81 had more traffic that day than it ever has before, or ever will again. For nearly the entire 3-hour drive to Geneva, NE, it was packed — though mostly still moving at a good speed. And for our entire drive back, highway 81 and every other southbound road we used was so full it felt like rush hour in Dallas. (Well, not quite. Traffic was still moving.) I believe scenes like this were played out across the continent.

I’ve been taking a lot of photos, and writing about our new baby Martha lately. Now it’s time to write a bit about some more adventures with Jacob and Oliver – they’re now in third and fifth grades in school.

We had been planning to fly, and airports I called were either full, or were planning to park planes in the grass, or even shut down some runways to use for parking. The airport in the little town of Beatrice, NE (which I had visited twice before) was even going to have a temporary FAA control tower. At the last minute, due to some storm activity near home at departure time, we unloaded the plane and drove instead.

The atmosphere at the fairgrounds in Geneva was festive. One family had brought bubbles for their kids — and extras to share.

IMG_20170821_113229

I had bought the boys a book about the eclipse, which they were reading before and during the event. They were both great, safe users of their eclipse glasses.

IMG_20170821_124809

Jacob caught a toad, and played with it for awhile. He wanted to bring it home with us, but I convinced him to let me take a picture of him with his toad friend instead.

IMG_20170821_124553

While we were waiting for totality, a number of buses from the local school district arrived. So by the time the big moment arrived, we could hear the distant roar of delight and applause from the school children gathered at the far end of the field, plus all the excitement nearby. Both boys were absolutely ecstatic to be witnessing it (and so was I!) “Wow!” “Awesome!” And simple cackles of delight were heard. On the drive home, they both kept talking about how amazing it was, and it was “once in a lifetime.”

We enjoyed our “eclipse neighbors” – the woman from San Antonio next to us, the surprise discovery of another family from just a few miles from us parked two cars down, even running into relatives at a restaurant on the way home. The applause from all around when it started – and when it ended. And the feeling, which is hard to describe, of awe and amazement at the wonders of our world and our universe.

There are many problems with the world right now, but somehow there’s something right about people coming together from all over to enjoy it.

First Experiences with Stretch

I’ve done my first upgrades to Debian stretch at this point. The results have been overall good. On the laptop my kids use, I helped my 10-year-old do it, and it worked flawlessly. On my workstation, I got a kernel panic on boot. Hmm.

Unfortunately, my system has to use the nv drivers, which leaves me with an 80×25 text console. It took some finagling (break=init in grub, then manually insmoding the appropriate stuff based on modules.dep for nouveau), but finally I got a console so I could see what was breaking. It appeared that init was crashing because it couldn’t find liblz4. A little digging shows that liblz4 is in /usr, and /usr wasn’t mounted. I’ve filed the bug on systemd-sysv for this.

I run root on ZFS, and further digging revealed that I had datasets named like this:

  • tank/hostname-1/ROOT
  • tank/hostname-1/usr
  • tank/hostname-1/var

This used to be fine. The mountpoint property of the usr dataset put it at /usr without incident. But it turns out that this won’t work now, unless I set ZFS_INITRD_ADDITIONAL_DATASETS in /etc/default/zfs for some reason. So I renamed them so usr was under ROOT, and then the system booted.

Then I ran samba not liking something in my bind interfaces line (to be fair, it did still say eth0 instead of br0). rpcbind was failing in postinst, though a reboot seems to have helped that. More annoying was that I had trouble logging into my system because resolv.conf was left empty (despite dns-* entries in /etc/network/interfaces and the presence of resolvconf). I eventually repaired that, and found that it kept removing my “search” line. Eventually I removed resolvconf.

Then mariadb’s postinst was silently failing. I eventually discovered it was sending info to syslog (odd), and /etc/init.d/apparmor teardown let it complete properly. It seems like there may have been an outdated /etc/apparmor.d/cache/usr.sbin.mysql out there for some reason.

Then there was XFCE. I use it with xmonad, and the session startup was really wonky. I had to zap my sessions, my panel config, etc. and start anew. I am still not entirely sure I have it right, but I at do have a usable system now.

Is there any way to truly secure Docker container contents?

There is much to like about Docker. Much has been written about it, and about how secure the containerization is.

This post isn’t about that. This is about keeping what’s inside each container secure. I believe we have a fundamental problem here.

Earlier this month, a study on security vulnerabilities on Docker Hub came out, and the picture isn’t pretty. One key finding:

Over 80% of the :latest versions of official images contained at least on high severity vulnerability!

And it’s not the only one raising questions.

Let’s dive in and see how we got here.

It’s hard to be secure, but Debian makes it easier

Let’s say you want to run a PHP application like WordPress under Apache. Here are the things you need to keep secure:

  • WordPress itself
  • All plugins, themes, customizations
  • All PHP libraries it uses (MySQL, image-processing, etc.)
  • MySQL
  • Apache
  • All libraries MySQL or Apache use: OpenSSL, libc, PHP itself, etc.
  • The kernel
  • All containerization tools

On Debian (and most of its best-known derivatives), we are extremely lucky to have a wonderful security support system. If you run a Debian system, the combination of unattended-updates, needrestart, debsecan, and debian-security-support will help one keep a Debian system secure and verify it is. When the latest OpenSSL bug comes out, generally speaking by the time I wake up, unattended-updates has already patched it, needrestart has already restarted any server that uses it, and I’m protected. Debian’s security team generally backports fixes rather than just say “here’s the new version”, making it very safe to automatically apply patches. As long as I use what’s in Debian stable, all layers mentioned above will be protected using this scheme.

This picture is much nicer than what we see in Docker.

Problems

We have a lot of problems in the Docker ecosystem:

  1. No built-in way to know when a base needs to be updated, or to automatically update it
  2. Diverse and complicated vendor security picture
  3. No way to detect when intermediate libraries need to be updated
  4. Complicated final application security picture

Let’s look at them individually.

Problem #1: No built-in way to know when a base needs to be updated, or to automatically update it

First of all, there is nothing in Docker like unattended-updates. Although a few people have suggested ways to run unattended-updates inside containers, there are many reasons that approach doesn’t work well. The standard advice is to update/rebuild containers.

So how do you know when to do that? It is not all that obvious. Theoretically, official OS base images will be updated when needed, and then other Docker hub images will detect the base update and be rebuilt. So, if a bug in a base image is found, and if the vendors work properly, and if you are somehow watching, then you could be protected. There is work in this area; tools such as watchtower help here.

But this can lead to a false sense of security, because:

Problem #2: Diverse and complicated vendor security picture

Different images can use different operating system bases. Consider just these official images, and the bases they use: (tracking latest tag on each)

  • nginx: debian:stretch-slim (stretch is pre-release at this date!)
  • mysql: debian:jessie
  • mongo: debian:wheezy-slim (previous release)
  • apache httpd: debian:jessie-backports
  • postgres: debian:jessie
  • node: buildpack-deps:jessie, eventually depends on debian:jessie
  • wordpress: php:5.6-apache, eventually depends on debian:jessie

And how about a few unofficial images?

  • oracle/openjdk: oraclelinux:latest
  • robotamer/citadel: debian:testing (dangerous, because testing is an alias for different distros at different times)
  • docker.elastic.co/kibana: ubuntu of some sort

The good news is that Debian jessie seems to be pretty popular here. The bad news is that you see everything from Oracle Linux, to Ubuntu, to Debian testing, to Debian oldstable in just this list. Go a little further, and you’ll see Alpine Linux, CentOS, and many more represented.

Here’s the question: what do you know about the security practices of each of these organizations? How well updated are their base images? Even if it’s Debian, how well updated is, for instance, the oldstable or the testing image?

The attack surface here is a lot larger than if you were just using a single OS. But wait, it gets worse:

Problem #3: No way to detect when intermediate libraries need to be updated

Let’s say your Docker image is using a base that is updated immediately when a security problem is found. Let’s further assume that your software package (WordPress, MySQL, whatever) is also being updated.

What about the intermediate dependencies? Let’s look at the build process for nginx. The Dockerfile for it begins with Debian:stretch-slim. But then it does a natural thing: it runs an apt-get install, pulling in packages from both Debian and an nginx repo.

I ran the docker build across this. Of course, the apt-get command brings in not just the specified packages, but also their dependencies. Here are the ones nginx brought in:

fontconfig-config fonts-dejavu-core gettext-base libbsd0 libexpat1 libfontconfig1 libfreetype6 libgd3 libgeoip1 libicu57 libjbig0 libjpeg62-turbo libpng16-16 libssl1.1 libtiff5 libwebp6 libx11-6 libx11-data libxau6 libxcb1 libxdmcp6 libxml2 libxpm4 libxslt1.1 nginx nginx-module-geoip nginx-module-image-filter nginx-module-njs nginx-module-xslt ucf

Now, what is going to trigger a rebuild if there’s a security fix to libssl1.1 or libicu57? (Both of these have a history of security holes.) The answer, for the vast majority of Docker images, seems to be: nothing automatic.

Problem #4: Complicated final application security picture

And that brings us to the last problem: Let’s say you want to run an application in Docker. exim, PostgreSQL, Drupal, or maybe something more obscure. Who is watching for security holes in it? If you’re using Debian packages, the Debian security team is. If you’re using a Docker image, well, maybe it’s the random person that contributed it, maybe it’s the vendor, maybe it’s Docker, maybe it’s nobody. You have to take this burden on yourself, to validate the security support picture for each image you use.

Conclusion

All this adds up to a lot of work, which is not taken care of for you by default in Docker. It is no surprise that many Docker images are insecure, given this picture. The unfortunate reality is that many Docker containers are running with known vulnerabilities that have known fixes, but just aren’t, and that’s sad.

I wonder if there are any practices people are using that can mitigate this better than what the current best-practice recommendations seem to be?

How git-annex replaces Dropbox + encfs with untrusted providers

git-annex has been around for a long time, but I just recently stumbled across some of the work Joey has been doing to it. This post isn’t about it’s traditional roots in git or all the features it has for partial copies of large data sets, but rather for its live syncing capabilities like Dropbox. It takes a bit to wrap your head around, because git-annex is just a little different from everything else. It’s sort of like a different-colored smell.

The git-annex wiki has a lot of great information — both low-level reference and a high-level 10-minute screencast showing how easy it is to set up. I found I had to sort of piece together the architecture between those levels, so I’m writing this all down hoping it will benefit others that are curious.

Ir you just want to use it, you don’t need to know all this. But I like to understand how my tools work.

Overview

git-annex lets you set up a live syncing solution that requires no central provider at all, or can be used with a completely untrusted central provider. Depending on your usage pattern, this central provider could require only a few MBs of space even for repositories containing gigabytes or terabytes of data that is kept in sync.

Let’s take a look at the high-level architecture of the tool. Then I’ll illustrate how it works with some scenarios.

Three Layers

Fundamentally, git-annex takes layers that are all combined in Dropbox and separates them out. There is the storage layer, which stores the literal data bytes that you are interested in. git-annex indexes the data in storage by a hash. There is metadata, which is for things like a filename-to-hash mapping and revision history. And then there is an optional layer, which is live signaling used to drive the real-time syncing.

git-annex has several modes of operation, and the one that enables live syncing is called the git-annex assistant. It runs as a daemon, and is available for Linux/POSIX platforms, Windows, Mac, and Android. I’ll be covering it here.

The storage layer

The storage layer simply is blobs of data. These blobs are indexed by a hash, and can be optionally encrypted at rest at remote backends. git-annex has a large number of storage backends; some examples include rsync, a remote machine with git-annex on it that has ssh installed, WebDAV, S3, Amazon Glacier, removable USB drive, etc. There’s a huge list.

One of the git-annex features is that each client knows the state of each storage repository, as well as the capability set of each storage repository. So let’s say you have a workstation at home and a laptop you take with you to work or the coffee shop. You’d like changes on one to be instantly recognized on another. With something like Dropbox or OwnCloud, every file in the set you want synchronized has to reside on a server in the cloud. With git-annex, it can be configured such that the server in the cloud only contains a copy of a file until every client has synced it up, at which point it gets removed. Think about it – that is often what you want anyhow, so why maintain an unnecessary copy after it’s synced everywhere? (This behavior is, of course, configurable.) git-annex can also avoid storing in the cloud entirely if the machines are able to reach each other directly at least some of the time.

The metadata layer

Metadata about your files includes a mapping from the file names to the storage location (based on hashes), change history, and information about the status of each machine that participates in the syncing. On your clients, git-annex stores this using git. This detail is very useful to some, and irrelevant to others.

Some of the git-annex storage backends can support only storage (S3, for instance). Some can support both storage and metadata (rsync, ssh, local drives, etc.) You can even configure a backend to support only metadata (more on why that may be useful in a bit). When you are working with a git-backed repository for git-annex, it can hold data, metadata, or both.

So, to have a working sync system, you must have a way to transport both the data and the metadata. The transport for the metadata is generally rsync or git, but it can also be XMPP in which Git changesets are basically wrapped up in XMPP presence messages. Joey says, however, that there are some known issues with XMPP servers sometimes dropping or reordering some XMPP messages, so he doesn’t encourage that method currently.

The live signaling layer

So once you have your data and metadata, you can already do syncs via git annex sync --contents. But the real killer feature here will be automatic detection of changes, both on the local and the remote. To do that, you need some way of live signaling. git-annex supports two methods.

The first requires ssh access to a remote machine where git-annex is installed. In this mode of operation, when the git-annex assistant fires up, it opens up a persistent ssh connection to the remote and runs the git-annex-shell over there, which notifies it of changes to the git metadata repository. When a change is detected, a sync is initiated. This is considered ideal.

A substitute can be XMPP, and git-annex actually converts git commits into a form that can be sent over XMPP. As I mentioned above, there are some known reliability issues with this and it is not the recommended option.

Encryption

When it comes to encryption, you generally are concerned about all three layers. In an ideal scenario, the encryption and decryption happens entirely on the client side, so no service provider ever has any details about your data.

The live signaling layer is encrypted pretty trivially; the ssh sessions are, of course, encrypted and TLS support in XMPP is pervasive these days. However, this is not end-to-end encryption; those messages are decrypted by the service provider, so a service provider could theoretically spy on metadata, which may include change times and filenames, though not the contents of files themselves.

The data layer also can be encrypted very trivially. In the case of the “dumb” backends like S3, git-annex can use symmetric encryption or a gpg keypair and all that ever shows up on the server are arbitrarily-named buckets.

You can also use a gcrypt-based git repository. This can cover both data and metadata — and, if the target also has git-annex installed, the live signalling layer. Using a gcrypt-based git repository for the metadata and live signalling is the only way to accomplish live syncing with 100% client-side encryption.

All of these methods are implemented in terms of gpg, and can support symmetric of public-key encryption.

It should be noted here that the current release versions of git-annex need a one-character patch in order to fix live syncing with a remote using gcrypt. For those of you running jessie, I recommend the version in jessie-backports, which is presently 5.20151208. For your convenience, I have compiled an amd64 binary that can drop in over /usr/bin/git-annex if you have this version. You can download it and a gpg signature for it. Note that you only need this binary on the clients; the server can use the version from jessie-backports without issue.

Putting the pieces together: some scenarios

Now that I’ve explained the layers, let’s look at how they fit together.

Scenario 1: Central server

In this scenario, you might have a workstation and a laptop that sync up with each other by way of a central server that also has a full copy of the data. This is the scenario that most closely resembles Dropbox, box, or OwnCloud.

Here you would basically follow the steps in the git-assistant screencast: install git-annex on a server somewhere, and point your clients to it. If you want full end-to-end encryption, I would recommend letting git-annex generate a gpg keypair for you, which you would then need to copy to both your laptop and workstation (but not the server).

Every change you make locally will be synced to the server, and then from the server to your other PC. All three systems would be configured in the “client” transfer group.

Scenario 1a: Central server without a full copy of the data

In this scenario, everything is configured the same except the central server is configured with the “transfer” transfer group. This means that the actual data synced to it is deleted after it has been propagated to all clients. Since git-annex can verify which repository has received a copy of which data, it can easily enough delete the actual file content from the central server after it has been copied to all the clients. Many people use something like Dropbox or OwnCloud as a multi-PC syncing solution anyhow, so once the files have been synced everywhere, it makes sense to remove them from the central server.

This is often a good ideal for people. There are some obvious downsides that are sometimes relevant. For instance, to add a third sync client, it must be able to initially copy down from one of the existing clients. Or, if you intend to access the data from a device such as a cell phone where you don’t intend for it to have a copy of all data all the time, you won’t have as convenient way to download your data.

Scenario 1b: Split data/metadata central servers

Imagine that you have a shell or rsync account on some remote system where you can run git-annex, but don’t have much storage space. Maybe you have a cheap VPS or shell account somewhere, but it’s just not big enough to hold your data.

The answer to this would be to use this shell or rsync account for the metadata, but put the data elsewhere. You could, for instance, store the data in Amazon S3 or Amazon Glacier. These backends aren’t capable of storing the git-annex metadata, so all you need is a shell or rsync account somewhere to sync up the metadata. (Or, as below, you might even combine a fully distributed approach with this.) Then you can have your encrypted data pushed up to S3 or some such service, which presumably will grow to whatever size you need.

Scenario 2: Fully distributed

Like git itself, git-annex does not actually need a central server at all. If your different clients can reach each other directly at least some of the time, that is good enough. Of course, a given client will not be able to do fully automatic live sync unless it can reach at least one other client, so changes may not propagate as quickly.

You can simply set this up by making ssh connections available between your clients. git-annex assistant can automatically generate appropriate ~/.ssh/authorized_keys entries for you.

Scenario 2a: Fully distributed with multiple disconnected branches

You can even have a graph of connections available. For instance, you might have a couple machines at home and a couple machines at work with no ability to have a direct connection between them (due to, say, firewalls). The two machines at home could sync with each other in real-time, as could the two machines at work. git-annex also supports things like USB drives as a transport mechanism, so you could throw a USB drive in your pocket each morning, pop it in to one client at work, and poof – both clients are synced up over there. Repeat when you get home in the evening, and you’re synced there. The USB drive’s repository can, of course, be of the “transport” type so data is automatically deleted from it once it’s been synced everywhere.

Scenario 3: Hybrid

git-annex can support LAN sync even if you have a central server. If your laptop, say, travels around but is sometimes on the same LAN as your PC, git-annex can easily sync directly between the two when they are reachable, saving a round-trip to the server. You can assign a cost to each remote, and git-annex will always try to sync first to the lowest-cost path that is available.

Drawbacks of git-annex

There are some scenarios where git-annex with the assistant won’t be as useful as one of the more traditional instant-sync systems.

The first and most obvious one is if you want to access the files without the git-annex client. For instance, many of the other tools let you generate a URL that you can email to people, and then they can download files without any special client software. This is not directly possible with git-annex. You could, of course, make something like a public_html directory be managed with git-annex, but it wouldn’t provide things like obfuscated URLs, password-protected sharing, time-limited sharing, etc. that you get with other systems. While you can share your repositories with others that have git-annex, you can’t share individual subdirectories; for a given repository, it is all or nothing.

The Android client for git-annex is a pretty interesting thing: it is mostly a small POSIX environment, providing a terminal, git, gpg, and the same web interface that you get on a standalone machine. This means that the git-annex Android client is fully functional compared to a desktop one. It also has a quick setup process for syncing off your photos/videos. On the other hand, the integration with the Android ecosystem is poor compared to most other tools.

Other git-annex features

git-annex has a lot to offer besides the git-annex assistant. Besides the things I’ve already mentioned, any given git-annex repository — including your client repository — can have a partial copy of the full content. Say, for instance, that you set up a git-annex repository for your music collection, which is quite large. You want some music on your netbook, but don’t have room for it all. You can tell git-annex to get or drop files from the netbook’s repository without deleting them remotely. git-annex has quite a few ways to automate and configure this, including making sure that at least a certain number of copies of a file exist in your git-annex ecosystem.

Conclusion

I initially started looking at git-annex due to the security issues with encfs, and the difficulty with setting up ecryptfs in this way. (I had been layering encfs atop OwnCloud). git-annex certainly ticks the box for me security-wise, and obviously anything encrypted with encfs wasn’t going to be shared with others anyhow. I’ll be using git-annex more in the future, I’m sure.

Update 2016-06-27: I had some issues with git-annex in this configuration.

Amtrak Airlines

I came downstairs this morning and found a surprise waiting for me. Chairs from all over had been gathered up and arranged in rows, airline style. Taped to the wall was a “food court” sign. At the front was a picture of an airplane, decked out with the Amtrak logo of all things, and a timetable taped to our dining room table.

IMG_6123

Jacob soon got out string to be seatbelts, too. And, using his copy machine, printed out a picture of a wing to tape to the side of the “airplane”.

IMG_6128

And here is the “food court” sign Oliver made:

IMG_6126

This plane was, according to the boys, scheduled to leave at 9:30. It left a fashionable 2 hours late or so. They told me I would be the pilot, and had me find headphones to be my “headset”. (I didn’t wear my real headset on the grounds that then I wouldn’t be able to hear them.) Jacob decided he would be a flight attendant, his grandma would be the co-pilot, and Oliver would be the food court worker. The food court somehow seemed to travel with the plane.

Oliver made up a menu for the food court. It consisted of, and I quote: “trail mix, banana, trail mix, half banana, trail mix, trail mix, trail mix”. He’s already got the limited selection of airport food down pat, I can see.

Jacob said the flight would be from Chicago to Los Angeles, and so it was. Since it was Amtrak Airlines, we were supposed to pretend to fly over the train tracks the whole way.

If it’s not Christmas yet, we just invent some fun, eh? Pretty clever.