Backing Up and Archiving to Removable Media: dar vs. git-annex

July 11, 2023Softwarearchiving, backup, backups, dar, git-annexJohn Goerzen

This is the fourth in a series about archiving to removable media (optical discs such as BD-Rs and DVD+Rs or portable hard drives). Here are the first three parts:

In part 1, I laid out my goals for the project, and considered a number of tools before determining dar and git-annex were my leading options.
In part 2, I took a deep dive into git-annex and simulated using it for this project.
In part 3, I did the same with dar.
And in this part, I want to put it together to come up with an initial direction to pursue.

I want to state at the outset that this is not a general review of dar or git-annex. This is an analysis of how those tools stack up to a particular use case. Neither tool focuses on this use case, and I note it is particularly far from the more common uses of git-annex. For instance, both tools offer support for cloud storage providers and special support for ssh targets, but neither of those are in-scope for this post.

Comparison Matrix

As part of this project, I made a comparison matrix which includes not just dar and git-annex, but also backuppc, bacula/bareos, and borg. This may give you some good context, and also some reference for other projects in this general space.

Reviewing the Goals

I identified some goals in part 1. They are all valid. As I have thought through the project more, I feel like I should condense them into a simpler ordered list, with the first being the most important. I omit some things here that both dar and git-annex can do (updates/incrementals, for instance; see the expanded goals list in part 1). Here they are:

The tool must not modify the source data in any way.
It must be simple to create or update an archive. Processes that require a lot of manual work, are flaky, or are difficult to do correctly, are unlikely to be done correctly and often. If it’s easy to do right, I’m more likely to do it. Put another way: an archive never created can never be restored.
The chances of a successful restore by someone that is not me, that doesn’t know Linux, and is at least 10 years in the future, should be maximized. This implies a simple toolset, solid support for dealing with media errors or missing media, etc.
Both a partial point-in-time restore and a full restore should be possible. The full restore must, at minimum, provide a consistent directory tree; that is, deletions, additions, and moves over time must be accurately reflected. Preserving modification times is a near-requirement, and preserving hard links, symbolic links, and other POSIX metadata is a significant nice-to-have.
There must be a strategy to provide redundancy; for instance, a way for one set of archive discs to be offsite, another onsite, and the two to be periodically swapped.
Use storage space efficiently.

Let’s take a look at how the two stack up against these goals.

Goal 1: Not modifying source data

With dar, this is accomplished. dar --create does not modify source data (and even has a mode to avoid updating atime) so that’s done.

git-annex normally does modify source data, in that it typically replaces files with symlinks into its hash-indexed storage directory. It can instead use hardlinks. In either case, you will wind up with files that have identical content (but may have originally been separate, non-linked files) linked together with git-annex. This would cause me trouble, as well as run the risk of modifying timestamps. So instead of just storing my data under a git-annex repo as is its most common case, I use the directory special remote with importtree=yes to sort of “import” the data in. This, plus my desire to have the repos sensible and usable on non-POSIX operating systems, accounts for a chunk of the git-annex complexity you see here. You wouldn’t normally see as much complexity with git-annex (though, as you will see, even without the directory special remote, dar still has less complexity).

Winner: dar, though I demonstrated a working approach with git-annex as well.

Goal 2: Simplicity of creating or updating an archive

Let us simply start by recognizing this:

Number of commands to create a first dar archive, including all splits: 1
Number of commands to create a first git-annex archive, with just the first two splits: 58
Number of commands to create a dar incremental: 1
Number of commands to update the last git-annex drive: 10
Number of commands to do a full restore of all slices and both archives with dar: 2 (1 if dar_manager used)
Number of commands to do a full restore of just the first two drive with git-annex: 9 (but my process may not be correct)

Both tools have a lot of power, but I must say, it is easier to wrap my head around what dar is doing than what git-annex is doing. Everything dar does is with files: here are the files to archive, here is an archive file, here is a detached (isolated) catalog. It is very straightforward. It took me far less time to develop my dar page than my git-annex page, despite having existing familiarity with both tools. As I pointed out in part 2, I still don’t fully understand how git-annex syncs metadata. Unsolved mysteries from that post include why the two git-annex drives had no idea what was on the other drives, and why the export operation silenty did nothing. Additionally, for the optical disc case, I had to create a restricted-size filesystem/dataset for git-annex to write into in order to get the desired size limit.

Looking at the optical disc case, dar has a lot of nice infrastructure built in. With –pause and –execute, it can very easily be combined with disc burning operations. –slice will automatically limit the size of a given slice, regardless of how much disk space is free, meaning that the git-annex tricks of creating smaller filesystems/datasets are unnecessary with dar.

To create an initial full backup with dar, you just give it the size of the device, and it will automatically split up the archive, with hooks to integrate for burning or changing drives. About as easy as you could get.

With git-annex, you would run the commands to have it fill up the initial filesystem, then burn the disc (or remove the drive), then run the commands to create another repo on the second filesystem, and so forth.

With hard drives, with git-annex you would do something similar; let it fill up a repo on a drive, and if it exits with a space error, swap in the next. With dar, you would slice as with an optical disk. Dar’s slicing is less convenient in this case, though, as it assumes every drive is the same size — and yours may not be. You could work around that by using a slice size no bigger than the smallest drive, and putting multiple slices on larger drives if need be. If a single drive is large enough to hold your entire data set, though, you need not worry about this with either tool.

Here’s a warning about git-annex: it won’t store anything beneath directories named .git. My use case doesn’t have many of those. If your use case does, you’re going to have to figure out what to do about it. Maybe rename them to something else while the backup runs? In any case, it is simply a fact that git-annex cannot back up git repositories, and this cuts against being able to back up things correctly.

Another point is that git-annex has scalability concerns. If your archive set gets into the hundreds of thousands of files, you may need to split it into multiple distinct git-annex repositories. If this occurs — and it will in my case — it may serve to dull the shine of some of git-annex’s features such as location tracking.

A detour down the update strategies path

Update strategies get a little more complicated with both. First, let’s consider: what exactly should our update strategy be?

For optical discs, I might consider doing a monthly update. I could burn a disc (or more than one, if needed) regardless of how much data is going to go onto it, because I want no more than a month’s data lost in any case. An alternative might be to spool up data until I have a disc’s worth, and then write that, but that could possibly mean months between actually burning a disc. Probably not good.

For removable drives, we’re unlikely to use a new drive each month. So there it makes sense to continue writing to the drive until it’s full. Now we have a choice: do we write and preserve each month’s updates, or do we eliminate intermediate changes and just keep the most recent data?

With both tools, the monthly burn of an optical disc turns out to be very similar to the initial full backup to optical disc. The considerations for spanning multiple discs are the same. With both tools, we would presumably want to keep some metadata on the host so that we don’t have to refer to a previous disc to know what was burned. In the dar case, that would be an isolated catalog. For git-annex, it would be a metadata-only repo. I illustrated both of these in parts 2 and 3.

Now, for hard drives. Assuming we want to continue preserving each month’s updates, with dar, we could just write an incremental to the drive each month. Assuming that the size of the incremental is likely far smaller than the size of the drive, you could easily enough do this. More fancily, you could look at the free space on the drive and tell dar to use that as the size of the first slice. For git-annex, you simply avoid calling drop/dropunused. This will cause the old versions of files to accumulate in .git/annex. You can get at them with git annex commands. This may imply some degree of elevated risk, as you are modifying metadata in the repo each month, which with dar you could chmod a-w or even chattr +i the archive files once written. Hopefully this elevated risk is low.

If you don’t want to preserve each month’s updates, with dar, you could just write an incremental each month that is based on the previous drive’s last backup, overwriting the previous. That implies some risk of drive failure during the time the overwrite is happening. Alternatively, you could write an incremental and then use dar to merge it into the previous incremental, creating a new one. This implies some degree of extra space needed (maybe on a different filesystem) while doing this. With git-annex, you would use drop/dropunused as I demonstrated in part 2.

The winner for goal 2 is dar. The gap is biggest with optical discs and more narrow with hard drives, thanks to git-annex’s different options for updates. Still, I would be more confident I got it right with dar.

Goal 3: Greatest chance of successful restore in the distant future

If you use git-annex like I suggested in part 2, you will have a set of discs or drives that contain a folder structure with plain files in them. These files can be opened without any additional tools at all. For sheer ability to get at raw data, git-annex has the edge.

When you talk about getting a consistent full restore — without multiple copies of renamed files or deleted files coming back — then you are going to need to use git-annex to do that.

Both git-annex and dar provide binaries. Dar provides a win64 version on its Sourceforge page. On the author’s releases site, you can find the win64 version in addition to a statically-linked x86_64 version for Linux. The git-annex install page mostly directs you to package managers for your distribution, but the downloads page also lists builds for Linux, Windows, and Mac OS X. The Linux version is dynamic, but ships most of its .so files alongside. The Windows version requires cygwin.dll, and all versions require you to also install git itself. Both tools are in package managers for Mac OS X, Debian, FreeBSD, and so forth. Let’s just say that you are likely to be able to run either one on a future Windows or Linux system.

There are also GUI frontends for dar, such as DARGUI and gdar. This can increase the chances of a future person being able to use the software easily. git-annex has the assistant, which is based on a different use case and probably not directly helpful here.

When it comes to doing the actual restore process using software, dar provides the easier process here.

For dealing with media errors and the like, dar can integrate with par2. While technically you could use par2 against the files git-annex writes, that’s more cumbersome to manage to the point that it is likely not to be done. Both tools can deal reasonably with missing media entirely.

I’m going to give the edge on this one to git-annex; while dar does provide the easier restore and superior tools for recovering from media errors, the ability to access raw data as plain files without any tools at all is quite compelling. I believe it is the most critical advantage git-annex has, and it’s a big one.

Goal 4: Support high-fidelity partial and full restores

Both tools make it possible to do a full restore reflecting deletions, additions, and so forth. Dar, as noted, is easier for this, but it is possible with git-annex. So, both can achieve a consistent restore.

Part of this goal deals with fidelity of the restore: preserving timestamps, hard and symbolic links, ownership, permissions, etc. Of these, timestamps are the most important for me.

git-annex can’t do any of that. dar does all of it.

Some of this can be worked around using mtree as I documented in part 2. However, that implies a need to also provide mtree on the discs for future users, and I’m not sure mtree really exists for Windows. It also cuts against the argument that git-annex discs can be used without any tools. It is true, they can, but all you will get is filename and content; no accurate date. Timestamps are often highly relevant for everything from photos to finding an elusive document or record.

Winner: dar.

Goal 5: Supporting backup strategies with redundancy

My main goal here is to have two separate backup sets: one that is offsite, and one that is onsite. Depending on the strategy and media, they might just always stay that way, or periodically rotate. For instance, with optical discs, you might just burn two copies of every disc and store one at each place. For hard drives, since you will be updating the content of them, you might swap them periodically.

This is possible with both tools. With both tools, if using the optical disc scheme I laid out, you can just burn two identical copies of each disc.

With the hard drive case, with dar, you can keep two directories of isolated catalogs, one for each drive set. A little identifier file on each drive will let you know which set to use.

git-annex can track locations itself. As I demonstrated in part 2, you can make each drive its own repo, add all drives from a given drive set to a git-annex group. When initializing a drive, you tell git-annex what group it’s a prt of. From then on, git-annex knows what content is in each group and will add whatever a given drive’s group needs to that drive.

It’s possible to do this with both, but the winner here is git-annex.

Goal 6: Efficient use of storage

Here are situations in which one or the other will be more efficient:

Lots of small files: dar, due to reduced filesystem overhead
Compressible data: dar (git-annex doesn’t support compression)
Renamed files: git-annex (it will detect the sha256 match and avoid storing a duplicate copy)
Identical files: git-annex, unless they are hardlinked already (again, detects the sha256 match)
Small modifications to files (eg, ID3 tags on MP3s, EXIF data on photos, etc): dar (it supports rsync-style binary deltas)

The winner depends on your particular situation.

Other notes

While not part of the goals above, dar is capable of using tapes directly. While not as common, they are often used in communities of people that archive lots of data.

Conclusions

Overall, dar is the winner for me. It is simpler in most areas, easier to get correct, and scales very well.

git-annex does, however, have some quite compelling points. Being able to access files as plain files is huge, and its location tracking is nicer than dar’s, even when using dar_manager.

Both tools are excellent and I recommend them both – and for more than the particular scenario shown here. Both have fantastic and responsive authors.

Using dar for Data Archiving

June 16, 2023Software2, archiving, backup, backups, darJohn Goerzen

This is the third post in a series about data archiving to removable media (optical discs and hard drives). In the first, I explained the difference between backing up and archiving, established goals for the project, and said I’d evaluate git-annex and dar. The second post evaluated git-annex, and now it’s time to look at dar. The series will conclude with a post comparing git-annex with dar.

What is dar?

I could open with the same thing I did with git-annex, just changing the name of the program: “[dar] is a fantastic and versatile program that does… well, it’s one of those things that can do so much that it’s a bit hard to describe.” It is, fundamentally, an archiver like tar or zip (makes one file representing a bunch of other files), but it goes far beyond that. dar’s homepage lays out a comprehensive list of features, which I will try to summarize here.

Dar itself is both a library (with C++ and Python bindings) for interacting with data, and a CLI tool (dar itself).
Alongside this, there is an ecosystem of tools around dar, including GUIs for multiple platforms, backup scripts, and FUSE implementations.
Dar is like tar in that it can read and write files sequentially if desired. Dar archives can be streamed, just like tar archives. But dar takes it further; if you have dar_slave on the remote end, random access is possible over ssh (dramatically speeding up certain operations).
Dar is like zip in that a dar archive contains a central directory (called a catalog) which permits random access to the contents of an archive. In other words, you don’t have to read an entire archive to extract just one file (assuming the archive is on disk or something that itself permits random access). Also, dar can compress each file individually, rather than the tar approach of compressing the archive as a whole. This increases archive performance (dar knows not to try to compress already-compressed data), boosts restore resilience (corruption of one part of an archive doesn’t invalidate the entire rest of it), and boosts restore performance (permitting random access).
Dar can split an archive into multiple pieces called slices, and it can even split member files among the slices. The catalog contains information allowing you to know which slice(s) a given file is saved in.
The catalog can also be saved off in a file of its own (dar calls this an “isolated catalog”). Isolated catalogs record just metadata about files archived.
dar_manager can assemble a database by reading archives or isolated catalogs, letting you know where files are stored and facilitating restores using the minimal number of discs.
Dar supports differential/incremental backups, which record changes since the last backup. These backups record not just additions, but also deletions. dar can optionally use rsync-style binary deltas to minimize the space needed to record changes. Dar does not suffer from GNU tar’s data loss bug with incrementals.
Dar can “slice and dice” archives like Perl does strings. The usage notes page shows how you can merge archives, create decremental archives (where the full backup always reflects the current state of the system, and incrementals go backwards in time instead of forwards), etc. You can change the compression algorithm on an existing archive, re-slice it, etc.
Dar is extremely careful about preserving all metadata: hard links, sparse files, symlinks, timestamps (including subsecond resolution), EAs, POSIX ACLs, resource forks on Mac, detecting files being modified while being read, etc. It makes a nice way to copy directories, sort of similar to rsync -avxHAXS.

So to tie this together for this project, I will set up a 400MB slice size (to mimic what I did with git-annex), and see how dar saves the data and restores it.

Isolated cataloges aren’t strictly necessary for this, but by using them (and/or dar_manager), we can build up a database of files and locations and thus directly compare dar to git-annex location tracking.

Walkthrough: Creating the first archive

As with the git-annex walkthrough, I’ll set some variables to make it easy to remember:

$SOURCEDIR is the directory being backed up
$DRIVE is the directory for backups to be stored in. Since dar can split by a specified size, I don’t need to make separate filesystems to simulate the separate drive experience as I did with git-annex.
$CATDIR will hold isolated catalogs
$DARDB points to the dar_manager database

OK, we can run the backup immediately. No special setup is needed. dar supports both short-form (single-character) parameters and long-form ones. Since the parameters probably aren’t familiar to everyone, I will use the long-form ones in these examples.

Here’s how we create our initial full backup. I’ll explain the parameters below:

$ dar \ --verbose \ --create $DRIVE/bak1 \ --on-fly-isolate $CATDIR/bak1 \ --slice 400M \ --min-digits 2 \ --pause \ --fs-root $SOURCEDIR

Let’s look at each of these parameters:

–verbose does what you expect
–create selects the operation mode (like tar -c) and gives the archive basename
–on-fly-isolate says to write an isolated catalog as well, right while making the archive. You can always create an isolated catalog later (which is fast, since it only needs to read the last bits of the last slice) but it’s more convenient to do it now, so we do. We give the base name for the isolated catalog also.
–slice 400M says to split the archive, and create slices 400MB each.
–min-digits 2 pertains to naming files. Without it, dar would create files named bak1.dar.1, bak1.dar.2, bak1.dar.10, etc. dar works fine with this, but it can be annoying in ls. This is just convenience for humans.
–pause tells dar to pause after writing each slice. This would let us swap drives, burn discs, etc. I do this for demonstration purposes only; it isn’t strictly necessary in this situation. For a more powerful option, dar also supports –execute, which can run commands after each slice.
–fs-root gives the path to actually back up.

This same command could have been written with short options as:

$ dar -v -c $DRIVE/bak1 -@ $CATDIR/bak1 -s 400M -9 2 -p -R $SOURCEDIR

What does it look like while running? Here’s an excerpt:

... Adding file to archive: /acrypt/no-backup/jgoerzen/testdata/[redacted] Finished writing to file 1, ready to continue ? [return = YES | Esc = NO] ... Writing down archive contents... Closing the escape layer... Writing down the first archive terminator... Writing down archive trailer... Writing down the second archive terminator... Closing archive low layer... Archive is closed.

-------------------------------------------- 581 inode(s) saved including 0 hard link(s) treated 0 inode(s) changed at the moment of the backup and could not be saved properly 0 byte(s) have been wasted in the archive to resave changing files 0 inode(s) with only metadata changed 0 inode(s) not saved (no inode/file change) 0 inode(s) failed to be saved (filesystem error) 0 inode(s) ignored (excluded by filters) 0 inode(s) recorded as deleted from reference backup -------------------------------------------- Total number of inode(s) considered: 581 -------------------------------------------- EA saved for 0 inode(s) FSA saved for 581 inode(s) -------------------------------------------- Making room in memory (releasing memory used by archive of reference)... Now performing on-fly isolation... ...

That was easy! Let’s look at the contents of the backup directory:

$ ls -lh $DRIVE total 3.7G -rw-r--r-- 1 jgoerzen jgoerzen 400M Jun 16 19:27 bak1.01.dar -rw-r--r-- 1 jgoerzen jgoerzen 400M Jun 16 19:27 bak1.02.dar -rw-r--r-- 1 jgoerzen jgoerzen 400M Jun 16 19:27 bak1.03.dar -rw-r--r-- 1 jgoerzen jgoerzen 400M Jun 16 19:27 bak1.04.dar -rw-r--r-- 1 jgoerzen jgoerzen 400M Jun 16 19:28 bak1.05.dar -rw-r--r-- 1 jgoerzen jgoerzen 400M Jun 16 19:28 bak1.06.dar -rw-r--r-- 1 jgoerzen jgoerzen 400M Jun 16 19:28 bak1.07.dar -rw-r--r-- 1 jgoerzen jgoerzen 400M Jun 16 19:28 bak1.08.dar -rw-r--r-- 1 jgoerzen jgoerzen 400M Jun 16 19:29 bak1.09.dar -rw-r--r-- 1 jgoerzen jgoerzen 156M Jun 16 19:33 bak1.10.dar

And the isolated catalog:

$ ls -lh $CATDIR total 37K -rw-r--r-- 1 jgoerzen jgoerzen 35K Jun 16 19:33 bak1.1.dar

The isolated catalog is stored compressed automatically.

Well this was easy. With one command, we archived the entire data set, split into 400MB chunks, and wrote out the catalog data.

Walkthrough: Inspecting the saved archive

Can dar tell us which slice contains a given file? Sure:

$ dar --list $DRIVE/bak1 --list-format=slicing | less Slice(s)|[Data ][D][ EA ][FSA][Compr][S]|Permission| Filemane --------+--------------------------------+----------+----------------------------- ... 1 [Saved][ ] [-L-][ 0%][X] -rwxr--r-- [redacted] 1-2 [Saved][ ] [-L-][ 0%][X] -rwxr--r-- [redacted] 2 [Saved][ ] [-L-][ 0%][X] -rwxr--r-- [redacted] ...

This illustrates the transition from slice 1 to slice 2. The first file was stored entirely in slice 1; the second stored partially in slice 1 and partially in slice 2, and third solely in slice 2. We can get other kinds of information as well.

$ dar --list $DRIVE/bak1 | less [Data ][D][ EA ][FSA][Compr][S]| Permission | User | Group | Size | Date | filename --------------------------------+------------+-------+-------+---------+-------------------------------+------------ [Saved][ ] [-L-][ 0%][X] -rwxr--r-- jgoerzen jgoerzen 24 Mio Mon Mar 5 07:58:09 2018 [redacted] [Saved][ ] [-L-][ 0%][X] -rwxr--r-- jgoerzen jgoerzen 16 Mio Mon Mar 5 07:58:09 2018 [redacted] [Saved][ ] [-L-][ 0%][X] -rwxr--r-- jgoerzen jgoerzen 22 Mio Mon Mar 5 07:58:09 2018 [redacted]

These are the same files I was looking at before. Here we see they are 24MB, 16MB, and 22MB in size, and some additional metadata. Even more is available in the XML list format.

Walkthrough: updates

As with git-annex, I’ve made some changes in the source directory: moved a file, added another, and deleted one. Let’s create an incremental backup now:

$ dar \ --verbose \ --create $DRIVE/bak2 \ --on-fly-isolate $CATDIR/bak2 \ --ref $CATDIR/bak1 \ --slice 400M \ --min-digits 2 \ --pause \ --fs-root $SOURCEDIR

This command is very similar to the earlier one. Instead of writing an archive and catalog named bak1, we write one named bak2. What’s new here is --ref $CATDIR/bak1. That says, make an incremental based on an archive of reference. All that is needed from that archive of reference is the detached catalog. --ref $DRIVE/bak1 would have worked equally well here.

Here’s what I did to the $SOURCEDIR:

Renamed a file to file01-unchanged
Deleted a file
Copied /bin/cp to a file named cp

Let’s see if dar’s command output matches this:

... Adding file to archive: /acrypt/no-backup/jgoerzen/testdata/file01-unchanged Saving Filesystem Specific Attributes for /acrypt/no-backup/jgoerzen/testdata/file01-unchanged Adding file to archive: /acrypt/no-backup/jgoerzen/testdata/cp Saving Filesystem Specific Attributes for /acrypt/no-backup/jgoerzen/testdata/cp Adding folder to archive: [redacted] Saving Filesystem Specific Attributes for [redacted] Adding reference to files that have been destroyed since reference backup... ... -------------------------------------------- 3 inode(s) saved including 0 hard link(s) treated 0 inode(s) changed at the moment of the backup and could not be saved properly 0 byte(s) have been wasted in the archive to resave changing files 0 inode(s) with only metadata changed 578 inode(s) not saved (no inode/file change) 0 inode(s) failed to be saved (filesystem error) 0 inode(s) ignored (excluded by filters) 2 inode(s) recorded as deleted from reference backup -------------------------------------------- Total number of inode(s) considered: 583 -------------------------------------------- EA saved for 0 inode(s) FSA saved for 3 inode(s) -------------------------------------------- ...

Yes, it does. The rename is recorded as a deletion and an addition, since dar doesn’t directly track renames. So the rename plus the deletion account for the two deletions. The rename plus the addition of cp count as 2 of the 3 inodes saved; the third is the modified directory from which files were deleted and moved out.

Let’s see the files that were created:

$ ls -lh $DRIVE/bak2* -rw-r--r-- 1 jgoerzen jgoerzen 18M Jun 16 19:52 /acrypt/no-backup/jgoerzen/dar-testing/drive/bak2.01.dar $ ls -lh $CATDIR/bak2* -rw-r--r-- 1 jgoerzen jgoerzen 22K Jun 16 19:52 /acrypt/no-backup/jgoerzen/dar-testing/cat/bak2.1.dar

What does –list look like now?

Slice(s)|[Data ][D][ EA ][FSA][Compr][S]|Permission| Filemane --------+--------------------------------+----------+----------------------------- [ ][ ] [---][-----][X] -rwxr--r-- [redacted] 1 [Saved][ ] [-L-][ 0%][X] -rwxr--r-- file01-unchanged ... [--- REMOVED ENTRY ----][redacted] [--- REMOVED ENTRY ----][redacted]

Here I show an example of:

A file that was not changed from the initial backup. Its presence was simply noted, but because we’re doing an incremental, the data wasn’t saved.
A file that is saved in this incremental, on slice 1.
The two deleted files

Walkthrough: dar_manager

As we’ve seen above, the two archives (or their detached catalog) give us a complete picture of what files were present at the time of the creation of each archive, and what files were stored in a given archive. We can certainly continue working in that way. We can also use dar_manager to build a comprehensive database of these archives, to be able to find what media is necessary to restore each given file. Or, with dar_manager’s –when parameter, we can restore files as of a particular date.

Let’s try it out. First, we create our database:

$ dar_manager --create $DARDB $ dar_manager --base $DARDB --add $DRIVE/bak1 Auto detecting min-digits to be 2 $ dar_manager --base $DARDB --add $DRIVE/bak2 Auto detecting min-digits to be 2

Here we created the database, and added our two catalogs to it. (Again, we could have as easily used $CATDIR/bak1; either the archive or its isolated catalog will work here.) It’s important to add the catalogs in order.

Let’s do some quick experimentation with dar_manager:

$ dar_manager -v --base $DARDB --list Decompressing and loading database to memory...


dar path         :

dar options      :

database version : 6

compression used : gzip

compression level: 9
archive #   |    path      |    basename

------------+--------------+---------------

        1       /acrypt/no-backup/jgoerzen/dar-testing/drive    bak1

        2       /acrypt/no-backup/jgoerzen/dar-testing/drive    bak2

$ dar_manager --base $DARDB --stat archive # | most recent/total data | most recent/total EA --------------+-------------------------+----------------------- 1 580/581 0/0 2 3/3 0/0

The –list option shows the correlation between dar_manager archive number (1, 2) with filenames (bak1, bak2). It is coincidence here that 1/bak1 and 2/bak2 correlate; that’s not necessarily the case. Most dar_manager commands operate on archive number, while dar commands operate on archive path/basename.

Now let’s see just what files are saved in archive #2, the incremental:

$ dar_manager --base $DARDB --used 2 [ Saved ][ ] [redacted] [ Saved ][ ] file01-unchanged [ Saved ][ ] cp

Now we can also where a file is stored. Here’s one that was saved in the full backup and unmodified in the incremental:

$ dar_manager --base $DARDB --file [redacted] 1 Fri Jun 16 19:15:12 2023 saved absent 2 Fri Jun 16 19:15:12 2023 present absent

(The absent at the end refers to extended attributes that the file didn’t have)

Similarly, for files that were added or removed, they’ll be listed only at the appropriate place.

Walkthrough: Restoration

I’m not going to repeat the author’s full restoration with dar page, but here are some quick examples.

A simple way of doing everything is using incrementals for the whole series. To do that, you’d have bak1 be full, bak2 based on bak1, bak3 based on bak2, bak4 based on bak3, etc. To restore from such a series, you have two options:

Use dar to simply extract each archive in order. It will handle deletions, renames, etc. along the way.
Use dar_manager with the backup database to do manage the process. It may be somewhat more efficient, as it won’t bother to restore files that will later be modified or deleted.

If you get fancy — for instance, bak2 is based on bak1, bak3 on bak2, bak4 on bak1 — then you would want to use dar_manager to ensure a consistent restore is completed. Either way, the process is nearly identical. Also, I figure, to make things easy, you can save a copy of the entire set of isolated catalogs before you finalize each disc/drive. They’re so small, and this would let someone with just the most recent disc build a dar_manager database without having to go through all the other discs.

Anyhow, let’s do a restore using just dar. I’ll make a $RESTOREDIR and do it that way.

$ dar \ --verbose \ --extract $DRIVE/bak1 \ --fs-root $RESTOREDIR \ --no-warn \ --execute "echo Ready for slice %n. Press Enter; read foo"

This –execute lets us see how dar works; this is an illustration of the power it has (above –pause); it’s a snippet interpreted by /bin/sh with %n being one of the dar placeholders. If memory serves, it’s not strictly necessary, as dar will prompt you for slices it needs if they’re not mounted. Anyhow, you’ll see it first reading the last slice, which contains the catalog, then reading from the beginning.

Here we go:

Auto detecting min-digits to be 2 Opening archive bak1 ... Opening the archive using the multi-slice abstraction layer... Ready for slice 10. Press Enter ... Loading catalogue into memory... Locating archive contents... Reading archive contents... File ownership will not be restored du to the lack of privilege, you can disable this message by asking not to restore file ownership [return = YES | Esc = NO] Continuing... Restoring file's data: [redacted] Restoring file's FSA: [redacted] Ready for slice 1. Press Enter ... Ready for slice 2. Press Enter ... -------------------------------------------- 581 inode(s) restored including 0 hard link(s) 0 inode(s) not restored (not saved in archive) 0 inode(s) not restored (overwriting policy decision) 0 inode(s) ignored (excluded by filters) 0 inode(s) failed to restore (filesystem error) 0 inode(s) deleted -------------------------------------------- Total number of inode(s) considered: 581 -------------------------------------------- EA restored for 0 inode(s) FSA restored for 0 inode(s) --------------------------------------------

The warning is because I’m not doing the extraction as root, which limits dar’s ability to fully restore ownership data.

OK, now the incremental:

$ dar \ --verbose \ --extract $DRIVE/bak2 \ --fs-root $RESTOREDIR \ --no-warn \ --execute "echo Ready for slice %n. Press Enter; read foo" ... Ready for slice 1. Press Enter ... Restoring file's data: /acrypt/no-backup/jgoerzen/dar-testing/restore/file01-unchanged Restoring file's FSA: /acrypt/no-backup/jgoerzen/dar-testing/restore/file01-unchanged Restoring file's data: /acrypt/no-backup/jgoerzen/dar-testing/restore/cp Restoring file's FSA: /acrypt/no-backup/jgoerzen/dar-testing/restore/cp Restoring file's data: /acrypt/no-backup/jgoerzen/dar-testing/restore/[redacted directory] Removing file (reason is file recorded as removed in archive): [redacted file] Removing file (reason is file recorded as removed in archive): [redacted file]

This all looks right! Now how about we compare the restore to the original source directory?

$ diff -durN $SOURCEDIR $RESTOREDIR

No changes – perfect.

We could instead do this restore via a single dar_manager command, though annoyingly, we’d have to pass all top-level files/directories to dar_manager –restore. But still, it’s one command, and basically automates and optimizes the dar restores shown above.

Conclusions

Dar makes it extremely easy to just Do The Right Thing when making archives. One command makes a backup. It saves things in simple files. You can make an isolated catalog if you want, and it too is saved in a simple file. You can query what is in the files and where. You can restore from all or part of the files. You can simply play the backups forward, in order, to achieve a full and consistent restore. Or you can load data about them into dar_manager for an optimized restore.

A bit of scripting will be necessary to make incrementals; finding the most recent backup or catalog. If backup files are named with care — for instance, by date — then this should be a pretty easy task.

I haven’t touched on resiliency yet. dar comes with tools for recovering archives that have had portions corrupted or lost. It can also rebuild the catalog if it is corrupted or lost. It adds “tape marks” (or “escape sequences”) to the archive along with the data stream. So every entry in the catalog is actually stored in the archive twice: once alongside the file data, and once at the end in the collected catalog. This allows dar to scan a corrupted file for the tape marks and reconstruct whatever is still intact, even if the catalog is lost. dar also integrates with tools like sha256sum and par2 to simplify archive integrity testing and restoration.

This balances against the need to use a tool (dar, optionally with a GUI frontend) to restore files. I’ll discuss that more in the next post.

Using git-annex for Data Archiving

June 15, 2023Technology1, 2, archiving, backup, backups, git-annexJohn Goerzen

In my recent post about data archiving to removable media, I laid out the difference between backing up and archiving, and also said I’d evaluate git-annex and dar. This post evaluates git-annex. The next will look at dar, and then I’ll make a comparison post.

What is git-annex?

git-annex is a fantastic and versatile program that does… well, it’s one of those things that can do so much that it’s a bit hard to describe. Its homepage says:

git-annex allows managing large files with git, without storing the file contents in git. It can sync, backup, and archive your data, offline and online. Checksums and encryption keep your data safe and secure. Bring the power and distributed nature of git to bear on your large files with git-annex.

I think the particularly interesting features of git-annex aren’t actually included in that list. Among the features of git-annex that make it shine for this purpose, its location tracking is key. git-annex can know exactly which device has which file at which version at all times. Combined with its preferred content settings, this lets you very easily say things like:

“I want exactly 1 copy of every file to exist within the set #1 of backup drives. Here’s a drive in that set; copy to it whatever needs to be copied to satisfy that requirement.”
“Now I have another set of backup drives. Periodically I will swap sets offsite. Copy whatever is needed to this drive in the second set, making sure that there is 1 copy of every file within this set as well, regardless of what’s in the first set.”
“Here’s a directory I want to use to track the status of everything else. I don’t want any copies at all here.”

git-annex can be set to allow a configurable amount of free space to remain on a device, and it will fill it up with whatever copies are necessary up until it hits that limit. Very convenient!

git-annex will store files in a folder structure that mirrors the origin folder structure, in plain files just as they were. This maximizes the ability for a future person to access the content, since it is all viewable without any special tool at all. Of course, for things like optical media, git-annex will essentially be creating what amounts to incrementals. To obtain a consistent copy of the original tree, you would still need to use git-annex to process (export) the archives.

git-annex challenges

In my prior post, I related some challenges with git-annex. The biggest of them – quite poor performance of the directory special remote when dealing with many files – has been resolved by Joey, git-annex’s author! That dramatically improves the git-annex use scenario here! The fixing commit is in the source tree but not yet in a release.

git-annex no doubt may still have performance challenges with repositories in the 100,000+-range, but in that order of magnitude it now looks usable. I’m not sure about 1,000,000-file repositories (I haven’t tested); there is a page about scalability.

A few other more minor challenges remain:

git-annex doesn’t really preserve POSIX attributes; for instance, permissions, symlink destinations, and timestamps are all not preserved. Of these, timestamps are the most important for my particular use case.
If your data set to archive contains Git repositories itself, these will not be included.

I worked around the timestamp issue by using the mtree-netbsd package in Debian. mtree writes out a summary of files and metadata in a tree, and can restore them. To save:

mtree -c -R nlink,uid,gid,mode -p /PATH/TO/REPO -X <(echo './.git') > /tmp/spec

And, after restoration, the timestamps can be applied with:

mtree -t -U -e < /tmp/spec

Walkthrough: initial setup

To use git-annex in this way, we have to do some setup. My general approach is this:

There is a source of data that lives outside git-annex. I'll call this $SOURCEDIR.
I'm going to name the directories holding my data $REPONAME.
There will be a "coordination" git-annex repo. It will hold metadata only, and no data. This will let us track where things live. I'll call it $METAREPO.
There will be drives. For this example, I'll call their mountpoints $DRIVE01 and $DRIVE02. For easy demonstration purposes, I used a ZFS dataset with a refquota set (to observe the size handling), but I could have as easily used a LVM volume, btrfs dataset, loopback filesystem, or USB drive. For optical discs, this would be a staging area or a UDF filesystem.

Let's get started! I've set all these shell variables appropriately for this example, and REPONAME to "testdata". We'll begin by setting up the metadata-only tracking repo.

$ REPONAME=testdata $ mkdir "$METAREPO" $ cd "$METAREPO" $ git init $ git config annex.thin true

There is a sort of complicated topic of how git-annex stores files in a repo, which varies depending on whether the data for the file is present in a given repo, and whether the file is locked or unlocked. Basically, the options I use here cause git-annex to mostly use hard links instead of symlinks or pointer files, for maximum compatibility with non-POSIX filesystems such as NTFS and UDF, which might be used on these devices. thin is part of that.

Let's continue:

$ git annex init 'local hub' init local hub ok (recording state in git...) $ git annex wanted . "include=* and exclude=$REPONAME/*" wanted . ok (recording state in git...)

In a bit, we are going to import the source data under the directory named $REPONAME (here, testdata). The wanted command says: in this repository (represented by the bare dot), the files we want are matched by the rule that says eveyrthing except what's under $REPONAME. In other words, we don't want to make an unnecessary copy here.

Because I expect to use an mtree file as documented above, and it is not under $REPONAME/, it will be included. Let's just add it and tweak some things.

$ touch mtree $ git annex add mtree add mtree ok (recording state in git...) $ git annex sync git-annex sync will change default behavior to operate on --content in a future version of git-annex. Recommend you explicitly use --no-content (or -g) to prepare for that change. (Or you can configure annex.synccontent) commit [main (root-commit) 6044742] git-annex in local hub 1 file changed, 1 insertion(+) create mode 120000 mtree ok $ ls -l total 9 lrwxrwxrwx 1 jgoerzen jgoerzen 178 Jun 15 22:31 mtree -> .git/annex/objects/pX/ZJ/...

OK! We've added a file, and it got transformed into a symlink. That's the thing I said we were going to avoid, so:

git annex adjust --unlock-present adjust Switched to branch 'adjusted/main(unlockpresent)' ok $ ls -l total 1 -rw-r--r-- 2 jgoerzen jgoerzen 0 Jun 15 22:31 mtree

You'll notice it transformed into a hard link (nlinks=2) file. Great! Now let's import the source data. For that, we'll use the directory special remote.

$ git annex initremote source type=directory directory=$SOURCEDIR importtree=yes \ encryption=none initremote source ok (recording state in git...) $ git annex enableremote source directory=$SOURCEDIR enableremote source ok (recording state in git...) $ git config remote.source.annex-readonly true $ git config annex.securehashesonly true $ git config annex.genmetadata true $ git config annex.diskreserve 100M $ git config remote.source.annex-tracking-branch main:$REPONAME

OK, so here we created a new remote named "source". We enabled it, and set some configuration. Most notably, that last line causes files from "source" to be imported under $REPONAME/ as we wanted earlier. Now we're ready to scan the source.

$ git annex sync

At this point, you'll see git-annex computing a hash for every file in the source directory.

I can verify with du that my metadata-only repo only uses 14MB of disk space, while my source is around 4GB.

Now we can see what git-annex thinks about file locations:

$ git-annex whereis | less whereis mtree (1 copy) 8aed01c5-da30-46c0-8357-1e8a94f67ed6 -- local hub [here] ok whereis testdata/[redacted] (0 copies) The following untrusted locations may also have copies: 9e48387e-b096-400a-8555-a3caf5b70a64 -- [source] failed ... many more lines ...

So remember we said we wanted mtree, but nothing under testdata, under this repo? That's exactly what we got. git-annex knows that the files under testdata can be found under the "source" special remote, but aren't in any git-annex repo -- yet. Now we'll start adding them.

Walkthrough: removable drives

I've set up two 500MB filesystems to represent removable drives. We'll see how git-annex works with them.

$ cd $DRIVE01 $ df -h . Filesystem Size Used Avail Use% Mounted on acrypt/no-backup/annexdrive01 500M 1.0M 499M 1% /acrypt/no-backup/annexdrive01 $ git clone $METAREPO Cloning into 'testdata'... done. $ cd $REPONAME $ git config annex.thin true $ git annex init "test drive #1" $ git annex adjust --hide-missing --unlock adjust Switched to branch 'adjusted/main(hidemissing-unlocked)' ok $ git annex sync

OK, that's the initial setup. Now let's enable the source remote and configure it the same way we did before:

$ git annex enableremote source directory=$SOURCEDIR enableremote source ok (recording state in git...) $ git config remote.source.annex-readonly true $ git config remote.source.annex-tracking-branch main:$REPONAME $ git config annex.securehashesonly true $ git config annex.genmetadata true $ git config annex.diskreserve 100M

Now, we'll add the drive to a group called "driveset01" and configure what we want on it:

$ git annex group . driveset01 $ git annex wanted . '(not copies=driveset01:1)'

What this does is say: first of all, this drive is in a group named driveset01. Then, this drive wants any files for which there isn't already at least one copy in driveset01.

Now let's load up some files!

$ git annex sync --content

As the messages fly by from here, you'll see it mentioning that it got mtree, and then various files from "source" -- until, that is, the filesystem had less than 100MB free, at which point it complained of no space for the rest. Exactly like we wanted!

Now, we need to teach $METAREPO about $DRIVE01.

$ cd $METAREPO $ git remote add drive01 $DRIVE01/$REPONAME $ git annex sync drive01 git-annex sync will change default behavior to operate on --content in a future version of git-annex. Recommend you explicitly use --no-content (or -g) to prepare for that change. (Or you can configure annex.synccontent) commit On branch adjusted/main(unlockpresent) nothing to commit, working tree clean ok merge synced/main (Merging into main...) Updating d1d9e53..817befc Fast-forward (Merging into adjusted branch...) Updating 7ccc20b..861aa60 Fast-forward ok pull drive01 remote: Enumerating objects: 214, done. remote: Counting objects: 100% (214/214), done. remote: Compressing objects: 100% (95/95), done. remote: Total 110 (delta 6), reused 0 (delta 0), pack-reused 0 Receiving objects: 100% (110/110), 13.01 KiB | 1.44 MiB/s, done. Resolving deltas: 100% (6/6), completed with 6 local objects. From /acrypt/no-backup/annexdrive01/testdata * [new branch] adjusted/main(hidemissing-unlocked) -> drive01/adjusted/main(hidemissing-unlocked) * [new branch] adjusted/main(unlockpresent) -> drive01/adjusted/main(unlockpresent) * [new branch] git-annex -> drive01/git-annex * [new branch] main -> drive01/main * [new branch] synced/main -> drive01/synced/main ok

OK! This step is important, because drive01 and drive02 (which we'll set up shortly) won't necessarily be able to reach each other directly, due to not being plugged in simultaneously. Our $METAREPO, however, will know all about where every file is, so that the "wanted" settings can be correctly resolved. Let's see what things look like now:

$ git annex whereis | less whereis mtree (2 copies) 8aed01c5-da30-46c0-8357-1e8a94f67ed6 -- local hub [here] b46fc85c-c68e-4093-a66e-19dc99a7d5e7 -- test drive #1 [drive01] ok whereis testdata/[redacted] (1 copy) b46fc85c-c68e-4093-a66e-19dc99a7d5e7 -- test drive #1 [drive01]

The following untrusted locations may also have copies: 9e48387e-b096-400a-8555-a3caf5b70a64 -- [source] ok

If I scroll down a bit, I'll see the files past the 400MB mark that didn't make it onto drive01. Let's add another example drive!

Walkthrough: Adding a second drive

The steps for $DRIVE02 are the same as we did before, just with drive02 instead of drive01, so I'll omit listing it all a second time. Now look at this excerpt from whereis:

whereis testdata/[redacted] (1 copy) b46fc85c-c68e-4093-a66e-19dc99a7d5e7 -- test drive #1 [drive01]


  The following untrusted locations may also have copies:

        9e48387e-b096-400a-8555-a3caf5b70a64 -- [source]

ok

whereis testdata/[redacted] (1 copy)

        c4540343-e3b5-4148-af46-3f612adda506 -- test drive #2 [drive02]

The following untrusted locations may also have copies: 9e48387e-b096-400a-8555-a3caf5b70a64 -- [source] ok

Look at that! Some files on drive01, some on drive02, some neither place. Perfect!

Walkthrough: Updates

So I've made some changes in the source directory: moved a file, added another, and deleted one. All of these were copied to drive01 above. How do we handle this?

First, we update the metadata repo:

$ cd $METAREPO $ git annex sync $ git annex dropunused all

OK, this has scanned $SOURCEDIR and noted changes. Let's see what whereis says:

$ git annex whereis | less ... whereis testdata/cp (0 copies) The following untrusted locations may also have copies: 9e48387e-b096-400a-8555-a3caf5b70a64 -- [source] failed whereis testdata/file01-unchanged (1 copy) b46fc85c-c68e-4093-a66e-19dc99a7d5e7 -- test drive #1 [drive01]

The following untrusted locations may also have copies: 9e48387e-b096-400a-8555-a3caf5b70a64 -- [source] ok

So this looks right. The file I added was a copy of /bin/cp. I moved another file to one named file01-unchanged. Notice that it realized this was a rename and that the data still exists on drive01.

Well, let's update drive01.

$ cd $DRIVE01/$REPONAME $ git annex sync --content

Looking at the testdata/ directory now, I see that file01-unchanged has been renamed, the deleted file is gone, but cp isn't yet here -- probably due to space issues; as it's new, it's undefined whether it or some other file would fill up free space. Let's work along a few more commands.

$ git annex get --auto $ git annex drop --auto $ git annex dropunused all

And now, let's make sure metarepo is updated with its state.

$ cd $METAREPO $ git annex sync

We could do the same for drive02. This is how we would proceed with every update.

Walkthrough: Restoration

Now, we have bare files at reasonable locations in drive01 and drive02. But, to generate a consistent restore, we need to be able to actually do an export. Otherwise, we may have files with old names, duplicate files, etc. Let's assume that we lost our source and metadata repos and have to restore from scratch. We'll make a new $RESTOREDIR. We'll begin with drive01 since we used it most recently.

$ mv $METAREPO $METAREPO.disabled $ mv $SOURCEDIR $SOURCEDIR.disabled $ git clone $DRIVE01/$REPONAME $RESTOREDIR $ cd $RESTOREDIR $ git config annex.thin true $ git annex init "restore" $ git annex adjust --hide-missing --unlock

Now, we need to connect the drive01 and pull the files from it.

$ git remote add drive01 $DRIVE01/$REPONAME $ git annex sync --content

Now, repeat with drive02:

$ git remote add drive02 $DRIVE02/$REPONAME $ git annex sync --content

Now we've got all our content back! Here's what whereis looks like:

whereis testdata/file01-unchanged (3 copies) 3d663d0f-1a69-4943-8eb1-f4fe22dc4349 -- restore [here] 9e48387e-b096-400a-8555-a3caf5b70a64 -- source b46fc85c-c68e-4093-a66e-19dc99a7d5e7 -- test drive #1 [origin] ok ...

I was a little surprised that drive01 didn't seem to know what was on drive02. Perhaps that could have been remedied by adding more remotes there? I'm not entirely sure; I'd thought would have been able to do that automatically.

Conclusions

I think I have demonstrated two things:

First, git-annex is indeed an extremely powerful tool. I have only scratched the surface here. The location tracking is a neat feature, and being able to just access the data as plain files if all else fails is nice for future users.

Secondly, it is also a complex tool and difficult to get right for this purpose (I think much easier for some other purposes). For someone that doesn't live and breathe git-annex, it can be hard to get right. In fact, I'm not entirely sure I got it right here. Why didn't drive02 know what files were on drive01 and vice-versa? I don't know, and that reflects some kind of misunderstanding on my part about how metadata is synced; perhaps more care needs to be taken in restore, or done in a different order, than I proposed. I initially tried to do a restore by using git annex export to a directory special remote with exporttree=yes, but I couldn't ever get it to actually do anything, and I don't know why.

These two cut against each other. On the one hand, the raw accessibility of the data to someone with no computer skills is unmatched. On the other hand, I'm not certain I have the skill to always prepare the discs properly, or to do a proper consistent restore.

Recommendations for Tools for Backing Up and Archiving to Removable Media

May 29, 2023Linux, Online Life, Softwarearchiving, backup, backups, dar, git-annexJohn Goerzen

I have several TB worth of family photos, videos, and other data. This needs to be backed up — and archived.

Backups and archives are often thought of as similar. And indeed, they may be done with the same tools at the same time. But the goals differ somewhat:

Backups are designed to recover from a disaster that you can fairly rapidly detect.

Archives are designed to survive for many years, protecting against disaster not only impacting the original equipment but also the original person that created them.

Reflecting on this, it implies that while a nice ZFS snapshot-based scheme that supports twice-hourly backups may be fantastic for that purpose, if you think about things like family members being able to access it if you are incapacitated, or accessibility in a few decades’ time, it becomes much less appealing for archives. ZFS doesn’t have the wide software support that NTFS, FAT, UDF, ISO-9660, etc. do.

This post isn’t about the pros and cons of the different storage media, nor is it about the pros and cons of cloud storage for archiving; these conversations can readily be found elsewhere. Let’s assume, for the point of conversation, that we are considering BD-R optical discs as well as external HDDs, both of which are too small to hold the entire backup set.

What would you use for archiving in these circumstances?

Establishing goals

The goals I have are:

Archives can be restored using Linux or Windows (even though I don’t use Windows, this requirement will ensure the broadest compatibility in the future)
The archival system must be able to accommodate periodic updates consisting of new files, deleted files, moved files, and modified files, without requiring a rewrite of the entire archive dataset
Archives can ideally be mounted on any common OS and the component files directly copied off
Redundancy must be possible. In the worst case, one could manually copy one drive/disc to another. Ideally, the archiving system would automatically track making n copies of data.
While a full restore may be a goal, simply finding one file or one directory may also be a goal. Ideally, an archiving system would be able to quickly tell me which discs/drives contain a given file.
Ideally, preserves as much POSIX metadata as possible (hard links, symlinks, modification date, permissions, etc). However, for the archiving case, this is less important than for the backup case, with the possible exception of modification date.
Must be easy enough to do, and sufficiently automatable, to allow frequent updates without error-prone or time-consuming manual hassle

I would welcome your ideas for what to use. Below, I’ll highlight different approaches I’ve looked into and how they stack up.

Basic copies of directories

The initial approach might be one of simply copying directories across. This would work well if the data set to be archived is smaller than the archival media. In that case, you could just burn or rsync a new copy with every update and be done. Unfortunately, this is much less convenient with data of the size I’m dealing with. rsync is unavailable in that case. With some datasets, you could manually design some rsyncs to store individual directories on individual devices, but that gets unwieldy fast and isn’t scalable.

You could use something like my datapacker program to split the data across multiple discs/drives efficiently. However, updates will be a problem; you’d have to re-burn the entire set to get a consistent copy, or rely on external tools like mtree to reflect deletions. Not very convenient in any case.

So I won’t be using this.

tar or zip

While you can split tar and zip files across multiple media, they have a lot of issues. GNU tar’s incremental mode is clunky and buggy; zip is even worse. tar files can’t be read randomly, making it extremely time-consuming to extract just certain files out of a tar file.

The only thing going for these formats (and especially zip) is the wide compatibility for restoration.

dar

Here we start to get into the more interesting tools. Dar is, in my opinion, one of the best Linux tools that few people know about. Since I first wrote about dar in 2008, it’s added some interesting new features; among them, binary deltas and cloud storage support. So, dar has quite a few interesting features that I make use of in other ways, and could also be quite helpful here:

Dar can both read and write files sequentially (streaming, like tar), or with random-access (quick seek to extract a subset without having to read the entire archive)
Dar can apply compression to individual files, rather than to the archive as a whole, faciliting both random access and resilience (corruption in one file doesn’t invalidate all subsequent files). Dar also supports numerous compression algorithms including gzip, bzip2, xz, lzo, etc., and can omit compressing already-compressed files.
The end of each dar file contains a central directory (dar calls this a catalog). The catalog contains everything necessary to extract individual files from the archive quickly, as well as everything necessary to make a future incremental archive based on this one. Additionally, dar can make and work with “isolated catalogs” — a file containing the catalog only, without data.
Dar can split the archive into multiple pieces called slices. This can best be done with fixed-size slices (–slice and –first-slice options), which let the catalog regord the slice number and preserves random access capabilities. With the –execute option, dar can easily wait for a given slice to be burned, etc.
Dar normally stores an entire new copy of a modified file, but can optionally store an rdiff binary delta instead. This has the potential to be far smaller (think of a case of modifying metadata for a photo, for instance).

Additionally, dar comes with a dar_manager program. dar_manager makes a database out of dar catalogs (or archives). This can then be used to identify the precise archive containing a particular version of a particular file.

All this combines to make a useful system for archiving. Isolated catalogs are tiny, and it would be easy enough to include the isolated catalogs for the entire set of archives that came before (or even the dar_manager database file) with each new incremental archive. This would make restoration of a particular subset easy.

The main thing to address with dar is that you do need dar to extract the archive. Every dar release comes with source code and a win64 build. dar also supports building a statically-linked Linux binary. It would therefore be easy to include win64 binary, Linux binary, and source with every archive run. dar is also a part of multiple Linux and BSD distributions, which are archived around the Internet. I think this provides a reasonable future-proofing to make sure dar archives will still be readable in the future.

The other challenge is user ability. While dar is highly portable, it is fundamentally a CLI tool and will require CLI abilities on the part of users. I suspect, though, that I could write up a few pages of instructions to include and make that a reasonably easy process. Not everyone can use a CLI, but I would expect a person that could follow those instructions could be readily-enough found.

One other benefit of dar is that it could easily be used with tapes. The LTO series is liked by various hobbyists, though it could pose formidable obstacles to non-hobbyists trying to aceess data in future decades. Additionally, since the archive is a big file, it lends itself to working with par2 to provide redundancy for certain amounts of data corruption.

git-annex

git-annex is an interesting program that is designed to facilitate managing large sets of data and moving it between repositories. git-annex has particular support for offline archive drives and tracks which drives contain which files.

The idea would be to store the data to be archived in a git-annex repository. Then git-annex commands could generate filesystem trees on the external drives (or trees to br burned to read-only media).

In a post about using git-annex for blu-ray backups, an earlier thread about DVD-Rs was mentioned.

This has a few interesting properties. For one, with due care, the files can be stored on archival media as regular files. There are some different options for how to generate the archives; some of them would place the entire git-annex metadata on each drive/disc. With that arrangement, one could access the individual files without git-annex. With git-annex, one could reconstruct the final (or any intermediate) state of the archive appropriately, handling deltions, renames, etc. You would also easily be able to know where copies of your files are.

The practice is somewhat more challenging. Hundreds of thousands of files — what I would consider a medium-sized archive — can pose some challenges, running into hours-long execution if used in conjunction with the directory special remote (but only minutes-long with a standard git-annex repo).

Ruling out the directory special remote, I had thought I could maybe just work with my files in git-annex directly. However, I ran into some challenges with that approach as well. I am uncomfortable with git-annex mucking about with hard links in my source data. While it does try to preserve timestamps in the source data, these are lost on the clones. I wrote up my best effort to work around all this.

In a forum post, the author of git-annex comments that “I don’t think that CDs/DVDs are a particularly good fit for git-annex, but it seems a couple of users have gotten something working.” The page he references is Managing a large number of files archived on many pieces of read-only medium. Some of that discussion is a bit dated (for instance, the directory special remote has the importtree feature that implements what was being asked for there), but has some interesting tips.

git-annex supplies win64 binaries, and git-annex is included with many distributions as well. So it should be nearly as accessible as dar in the future. Since git-annex would be required to restore a consistent recovery image, similar caveats as with dar apply; CLI experience would be needed, along with some written instructions.

Bacula and BareOS

Although primarily tape-based archivers, these do also also nominally support drives and optical media. However, they are much more tailored as backup tools, especially with the ability to pull from multiple machines. They require a database and extensive configuration, making them a poor fit for both the creation and future extractability of this project.

Conclusions

I’m going to spend some more time with dar and git-annex, testing them out, and hope to write some future posts about my experiences.

Martha the Pilot

May 3, 2023Aviation, FamilyJohn Goerzen

Martha, now 5, can’t remember a time when she didn’t fly periodically. She’s come along in our airplane in short flights to a nearby restaurant and long ones to Michigan and South Dakota. All this time, she’s been riding in the back seat next to Laura.

Martha has been talking excitedly about riding up front next to me. She wants to “be my co-pilot”. I promised to give her an airplane wing pin when she did — one I got from a pilot of a commercial flight when I was a kid. Of course, safety was first, so I wanted to be sure she was old enough to fly there without being a distraction.

Last weekend, the moment finally arrived. She was so excited! She brought along her “Claire bear” aviator, one that I bought for her at an airport a little while back. She buckled in two of her dolls in the back seat.

Martha's dolls

And then up we went!

Martha in the airplane

Martha was so proud when we landed! We went to Stearman Field, just a short 10-minute flight away, and parked the plane right in front of the restaurant.

We flew back, and Martha thought we should get a photo of her standing on the wing by the door. Great idea!

Martha standing on the wing

She was happily jabbering about the flight all the way home. She told us several times about the pin she got, watching out the window, watching all the screens in the airplane, and also that she didn’t get sick at all despite some turbulence.

And, she says, “Now just you and I can go flying!”

Yes, that’s something I’m looking forward to!

Easily Accessing All Your Stuff with a Zero-Trust Mesh VPN

April 13, 2023Online Life, SoftwareInternet, mesh, tailscale, vpn, yggdrasilJohn Goerzen

Probably everyone is familiar with a regular VPN. The traditional use case is to connect to a corporate or home network from a remote location, and access services as if you were there.

But these days, the notion of “corporate network” and “home network” are less based around physical location. For instance, a company may have no particular office at all, may have a number of offices plus a number of people working remotely, and so forth. A home network might have, say, a PVR and file server, while highly portable devices such as laptops, tablets, and phones may want to talk to each other regardless of location. For instance, a family member might be traveling with a laptop, another at a coffee shop, and those two devices might want to communicate, in addition to talking to the devices at home.

And, in both scenarios, there might be questions about giving limited access to friends. Perhaps you’d like to give a friend access to part of your file server, or as a company, you might have contractors working on a limited project.

Pretty soon you wind up with a mess of VPNs, forwarded ports, and tricks to make it all work. With the increasing prevalence of CGNAT, a lot of times you can’t even open a port to the public Internet. Each application or device probably has its own gateway just to make it visible on the Internet, some of which you pay for.

Then you add on the question of: should you really trust your LAN anyhow? With possibilities of guests using it, rogue access points, etc., the answer is probably “no”.

We can move the responsibility for dealing with NAT, fluctuating IPs, encryption, and authentication, from the application layer further down into the network stack. We then arrive at a much simpler picture for all.

So this page is fundamentally about making the network work, simply and effectively.

How do we make the Internet work in these scenarios?

We’re going to combine three concepts:

A VPN, providing fully encrypted and authenticated communication and stable IPs
Mesh Networking, in which devices automatically discover optimal paths to reach each other
Zero-trust networking, in which we do not need to trust anything about the underlying LAN, because all our traffic uses the secure systems in points 1 and 2.

By combining these concepts, we arrive at some nice results:

You can ssh hostname, where hostname is one of your machines (server, laptop, whatever), and as long as hostname is up, you can reach it, wherever it is, wherever you are.
- Combined with mosh, these sessions will be durable even across moving to other host networks.
- You could just as well use telnet, because the underlying network should be secure.
You don’t have to mess with encryption keys, certs, etc., for every internal-only service. Since IPs are now trustworthy, that’s all you need. hosts.allow could make a comeback!
You have a way of transiting out of extremely restrictive networks. Every tool discussed here has a way of falling back on routing things via a broker (relay) on TCP port 443 if all else fails.

There might sometimes be tradeoffs. For instance:

On LANs faster than 1Gbps, performance may degrade due to encryption and encapsulation overhead. However, these tools should let hosts discover the locality of each other and not send traffic over the Internet if the devices are local.
With some of these tools, hosts local to each other (on the same LAN) may be unable to find each other if they can’t reach the control plane over the Internet (Internet is down or provider is down)

Some other features that some of the tools provide include:

Easy sharing of limited access with friends/guests
Taking care of everything you need, including SSL certs, for exposing a certain on-net service to the public Internet
Optional routing of your outbound Internet traffic via an exit node on your network. Useful, for instance, if your local network is blocking tons of stuff.

Let’s dive in.

Types of Mesh VPNs

I’ll go over several types of meshes in this article:

Fully decentralized with automatic hop routing

This model has no special central control plane. Nodes discover each other in various ways, and establish routes to each other. These routes can be direct connections over the Internet, or via other nodes. This approach offers the greatest resilience. Examples I’ll cover include Yggdrasil and tinc.
Automatic peer-to-peer with centralized control

In this model, nodes, by default, communicate by establishing direct links between them. A regular node never carries traffic on behalf of other nodes. Special-purpose relays are used to handle cases in which NAT traversal is impossible. This approach tends to offer simple setup. Examples I’ll cover include Tailscale, Zerotier, Nebula, and Netmaker.
Roll your own and hybrid approaches

This is a “grab bag” of other ideas; for instance, running Yggdrasil over Tailscale.

Terminology

For the sake of consistency, I’m going to use common language to discuss things that have different terms in different ecosystems:

Every tool discussed here has a way of dealing with NAT traversal. It may assist with establishing direct connections (eg, STUN), and if that fails, it may simply relay traffic between nodes. I’ll call such a relay a “broker”. This may or may not be the same system that is a control plane for a tool.
All of these systems operate over lower layers that are unencrypted. Those lower layers may be a LAN (wired or wireless, which may or may not have Internet access), or the public Internet (IPv4 and/or IPv6). I’m going to call the unencrypted lower layer, whatever it is, the “clearnet”.

Evaluation Criteria

Here are the things I want to see from a solution:

Secure, with all communications end-to-end encrypted and authenticated, and prevention of traffic from untrusted devices.
Flexible, adapting to changes in network topology quickly and automatically.
Resilient, without single points of failure, and with devices local to each other able to communicate even if cut off from the Internet or other parts of the network.
Private, minimizing leakage of information or metadata about me and my systems
Able to traverse CGNAT without having to use a broker whenever possible
A lesser requirement for me, but still a nice to have, is the ability to include others via something like Internet publishing or inviting guests.
Fully or nearly fully Open Source
Free or very cheap for personal use
Wide operating system support, including headless Linux on x86_64 and ARM.

Fully Decentralized VPNs with Automatic Hop Routing

Two systems fit this description: Yggdrasil and Tinc. Let’s dive in.

Yggdrasil

I’ll start with Yggdrasil because I’ve written so much about it already. It featured in prior posts such as:

Make the Internet Yours Again With an Instant Mesh Network, which described the tyranny of IP rigidity and using Yggdrasil as a global mesh overlay.
Using Yggdrasil As an Automatic Mesh Fabric to Connect All Your Docker Containers, VMs, and Servers is, in a significant sense, a more specific implementation of the ideas contained here; it’s a private Yggdrasil mesh providing the communications layer for dispersed Docker containers.
Recovering Our Lost Free Will Online: Tools and Techniques That Are Available Now features Yggdrasil.

Yggdrasil can be a private mesh VPN, or something more

Yggdrasil can be a private mesh VPN, just like the other tools covered here. It’s unique, however, in that a key goal of the project is to also make it useful as a planet-scale global mesh network. As such, Yggdrasil is a testbed of new ideas in distributed routing designed to scale up to massive sizes and all sorts of connection conditions. As of 2023-04-10, the main global Yggdrasil mesh has over 5000 nodes in it. You can choose whether or not to participate.

Every node in a Yggdrasil mesh has a public/private keypair. Each node then has an IPv6 address (in a private address space) derived from its public key. Using these IPv6 addresses, you can communicate right away.

Yggdrasil differs from most of the other tools here in that it does not necessarily seek to establish a direct link on the clearnet between, say, host A and host G for them to communicate. It will prefer such a direct link if it exists, but it is perfectly happy if it doesn’t.

The reason is that every Yggdrasil node is also a router in the Yggdrasil mesh. Let’s sit with that concept for a moment. Consider:

If you have a bunch of machines on your LAN, but only one of them can peer over the clearnet, that’s fine; all the other machines will discover this route to the world and use it when necessary.
All you need to run a broker is just a regular node with a public IP address. If you are participating in the global mesh, you can use one (or more) of the free public peers for this purpose.
It is not necessary for every node to know about the clearnet IP address of every other node (improving privacy). In fact, it’s not even necessary for every node to know about the existence of all the other nodes, so long as it can find a route to a given node when it’s asked to.
Yggdrasil can find one or more routes between nodes, and it can use this knowledge of multiple routes to aggressively optimize for varying network conditions, including combinations of, say, downloads and low-latency ssh sessions.

Behind the scenes, Yggdrasil calculates optimal routes between nodes as necessary, using a mesh-wide DHT for initial contact and then deriving more optimal paths. (You can also read more details about the routing algorithm.)

One final way that Yggdrasil is different from most of the other tools is that there is no separate control server. No node is “special”, in charge, the sole keeper of metadata, or anything like that. The entire system is completely distributed and auto-assembling.

Meeting neighbors

There are two ways that Yggdrasil knows about peers:

By broadcast discovery on the local LAN
By listening on a specific port (or being told to connect to a specific host/port)

Sometimes this might lead to multiple ways to connect to a node; Yggdrasil prefers the connection auto-discovered by broadcast first, then the lowest-latency of the defined path. In other words, when your laptops are in the same room as each other on your local LAN, your packets will flow directly between them without traversing the Internet.

Unique uses

Yggdrasil is uniquely suited to network-challenged situations. As an example, in a post-disaster situation, Internet access may be unavailable or flaky, yet there may be many local devices – perhaps ones that had never known of each other before – that could share information. Yggdrasil meets this situation perfectly. The combination of broadcast auto-detection, distributed routing, and so forth, basically means that if there is any physical path between two nodes, Yggdrasil will find and enable it.

Ad-hoc wifi is rarely used because it is a real pain. Yggdrasil actually makes it useful! Its broadcast discovery doesn’t require any IP address provisioned on the interface at all (it just uses the IPv6 link-local address), so you don’t need to figure out a DHCP server or some such. And, Yggdrasil will tend to perform routing along the contours of the RF path. So you could have a laptop in the middle of a long distance relaying communications from people farther out, because it could see both. Or even a chain of such things.

Yggdrasil: Security and Privacy

Yggdrasil’s mesh is aggressively greedy. It will peer with any node it can find (unless told otherwise) and will find a route to anywhere it can. There are two main ways to make sure you keep unauthorized traffic out: by restricting who can talk to your mesh, and by firewalling the Yggdrasil interface. Both can be used, and they can be used simultaneously.

I’ll discuss firewalling more at the end of this article. Basically, you’ll almost certainly want to do this if you participate in the public mesh, because doing so is akin to having a globally-routable public IP address direct to your device.

If you want to restrict who can talk to your mesh, you just disable the broadcast feature on all your nodes (empty MulticastInterfaces section in the config), and avoid telling any of your nodes to connect to a public peer. You can set a list of authorized public keys that can connect to your nodes’ listening interfaces, which you’ll probably want to do. You will probably want to either open up some inbound ports (if you can) or set up a node with a known clearnet IP on a place like a $5/mo VPS to help with NAT traversal (again, setting AllowedPublicKeys as appropriate). Yggdrasil doesn’t allow filtering multicast clients by public key, only by network interface, so that’s why we disable broadcast discovery. You can easily enough teach Yggdrasil about static internal LAN IPs of your nodes and have things work that way. (Or, set up an internal “gateway” node or two, that the clients just connect to when they’re local). But fundamentally, you need to put a bit more thought into this with Yggdrasil than with the other tools here, which are closed-only.

Compared to some of the other tools here, Yggdrasil is better about information leakage; nodes only know details, such as clearnet IPs, of directly-connected peers. You can obtain the list of directly-connected peers of any known node in the mesh – but that list is the public keys of the directly-connected peers, not the clearnet IPs.

Some of the other tools contain a limited integrated firewall of sorts (with limited ACLs and such). Yggdrasil does not, but is fully compatible with on-host firewalls. I recommend these anyway even with many other tools.

Yggdrasil: Connectivity and NAT traversal

Compared to the other tools, Yggdrasil is an interesting mix. It provides a fully functional mesh and facilitates connectivity in situations in which no other tool can. Yet its NAT traversal, while it exists and does work, results in using a broker under some of the more challenging CGNAT situations more often than some of the other tools, which can impede performance.

Yggdrasil’s underlying protocol is TCP-based. Before you run away screaming that it must be slow and unreliable like OpenVPN over TCP – it’s not, and it is even surprisingly good around bufferbloat. I’ve found its performance to be on par with the other tools here, and it works as well as I’d expect even on flaky 4G links.

Overall, the NAT traversal story is mixed. On the one hand, you can run a node that listens on port 443 – and Yggdrasil can even make it speak TLS (even though that’s unnecessary from a security standpoint), so you can likely get out of most restrictive firewalls you will ever encounter. If you join the public mesh, know that plenty of public peers do listen on port 443 (and other well-known ports like 53, plus random high-numbered ones).

If you connect your system to multiple public peers, there is a chance – though a very small one – that some public transit traffic might be routed via it. In practice, public peers hopefully are already peered with each other, preventing this from happening (you can verify this with yggdrasilctl debug_remotegetpeers key=ABC...). I have never experienced a problem with this. Also, since latency is a factor in routing for Yggdrasil, it is highly unlikely that random connections we use are going to be competitive with datacenter peers.

If you’re open to participating in the public mesh, this is one of the easiest things of all. Have your friend install Yggdrasil, point them to a public peer, give them your Yggdrasil IP, and that’s it. (Well, presumably you also open up your firewall – you did follow my advice to set one up, right?)

If your friend is visiting at your location, they can just hop on your wifi, install Yggdrasil, and it will automatically discover a route to you. Yggdrasil even has a zero-config mode for ephemeral nodes such as certain Docker containers.

Yggdrasil doesn’t directly support publishing to the clearnet, but it is certainly possible to proxy (or even NAT) to/from the clearnet, and people do.

Yggdrasil: DNS

There is no particular extra DNS in Yggdrasil. You can, of course, run a DNS server within Yggdrasil, just as you can anywhere else. Personally I just add relevant hosts to /etc/hosts and leave it at that, but it’s up to you.

Yggdrasil: Source code, pricing, and portability

Yggdrasil is fully open source (LGPLv3 plus additional permissions in an exception) and highly portable. It is written in Go, and has prebuilt binaries for all major platforms (including a Debian package which I made).

There is no charge for anything with Yggdrasil. Listed public peers are free and run by volunteers. You can run your own peers if you like; they can be public and unlisted, public and listed (just submit a PR to get it listed), or private (accepting connections only from certain nodes’ keys). A “peer” in this case is just a node with a known clearnet IP address.

Yggdrasil encourages use in other projects. For instance, NNCP integrates a Yggdrasil node for easy communication with other NNCP nodes.

Yggdrasil conclusions

Yggdrasil is tops in reliability (having no single point of failure) and flexibility. It will maintain opportunistic connections between peers even if the Internet is down. The unique added feature of being able to be part of a global mesh is a nice one. The tradeoffs include being more prone to need to use a broker in restrictive CGNAT environments. Some other tools have clients that override the OS DNS resolver to also provide resolution of hostnames of member nodes; Yggdrasil doesn’t, though you can certainly run your own DNS infrastructure over Yggdrasil (or, for that matter, let public DNS servers provide Yggdrasil answers if you wish).

There is also a need to pay more attention to firewalling or maintaining separation from the public mesh. However, as I explain below, many other options have potential impacts if the control plane, or your account for it, are compromised, meaning you ought to firewall those, too. Still, it may be a more immediate concern with Yggdrasil.

Although Yggdrasil is listed as experimental, I have been using it for over a year and have found it to be rock-solid. They did change how mesh IPs were calculated when moving from 0.3 to 0.4, causing a global renumbering, so just be aware that this is a possibility while it is experimental.

tinc

tinc is the oldest tool on this list; version 1.0 came out in 2003! You can think of tinc as something akin to “an older Yggdrasil without the public option.”

I will be discussing tinc 1.0.36, the latest stable version, which came out in 2019. The development branch, 1.1, has been going since 2011 and had its latest release in 2021. The last commit to the Github repo was in June 2022.

Tinc is the only tool here to support both tun and tap style interfaces. I go into the difference more in the Zerotier review below. Tinc actually provides a better tap implementation than Zerotier, with various sane options for broadcasts, but I still think the call for an Ethernet, as opposed to IP, VPN is small.

To configure tinc, you generate a per-host configuration and then distribute it to every tinc node. It contains a host’s public key. Therefore, adding a host to the mesh means distributing its key everywhere; de-authorizing it means removing its key everywhere. This makes it rather unwieldy.

tinc can do LAN broadcast discovery and mesh routing, but generally speaking you must manually teach it where to connect initially. Somewhat confusingly, the examples all mention listing a public address for a node. This doesn’t make sense for a laptop, and I suspect you’d just omit it. I think that address is used for something akin to a Yggdrasil peer with a clearnet IP.

Unlike all of the other tools described here, tinc has no tool to inspect the running state of the mesh.

Some of the properties of tinc made it clear I was unlikely to adopt it, so this review wasn’t as thorough as that of Yggdrasil.

tinc: Security and Privacy

As mentioned above, every host in the tinc mesh is authenticated based on its public key. However, to be more precise, this key is validated only at the point it connects to its next hop peer. (To be sure, this is also the same as how the list of allowed pubkeys works in Yggdrasil.) Since IPs in tinc are not derived from their key, and any host can assign itself whatever mesh IP it likes, this implies that a compromised host could impersonate another.

It is unclear whether packets are end-to-end encrypted when using a tinc node as a router. The fact that they can be routed at the kernel level by the tun interface implies that they may not be.

tinc: Connectivity and NAT traversal

I was unable to find much information about NAT traversal in tinc, other than that it does support it. tinc can run over UDP or TCP and auto-detects which to use, preferring UDP.

tinc has no special support for this, and the difficulty of configuration makes it unlikely you’d do this with tinc.

tinc: Source code, pricing, and portability

tinc is fully open source (GPLv2). It is written in C and generally portable. It supports some very old operating systems. Mobile support is iffy.

tinc does not seem to be very actively maintained.

tinc conclusions

I haven’t mentioned performance in my other reviews (see the section at the end of this post). But, it is so poor as to only run about 300Mbps on my 2.5Gbps network. That’s 1/3 the speed of Yggdrasil or Tailscale. Combine that with the unwieldiness of adding hosts and some uncertainties in security, and I’m not going to be using tinc.

Automatic Peer-to-Peer Mesh VPNs with centralized control

These tend to be the options that are frequently discussed. Let’s talk about the options.

Tailscale

Tailscale is a popular choice in this type of VPN. To use Tailscale, you first sign up on tailscale.com. Then, you install the tailscale client on each machine. On first run, it prints a URL for you to click on to authorize the client to your mesh (“tailnet”). Tailscale assigns a mesh IP to each system. The Tailscale client lets the Tailscale control plane gather IP information about each node, including all detectable public and private clearnet IPs.

When you attempt to contact a node via Tailscale, the client will fetch the known contact information from the control plane and attempt to establish a link. If it can contact over the local LAN, it will (it doesn’t have broadcast autodetection like Yggdrasil; the information must come from the control plane). Otherwise, it will try various NAT traversal options. If all else fails, it will use a broker to relay traffic; Tailscale calls a broker a DERP relay server. Unlike Yggdrasil, a Tailscale node never relays traffic for another; all connections are either direct P2P or via a broker.

Tailscale, like several others, is based around Wireguard; though wireguard-go rather than the in-kernel Wireguard.

Tailscale has a number of somewhat unique features in this space:

Funnel, which lets you expose ports on your system to the public Internet via the VPN.
Exit nodes, which automate the process of routing your public Internet traffic over some other node in the network. This is possible with every tool mentioned here, but Tailscale makes switching it on or off a couple of quick commands away.
Node sharing, which lets you share a subset of your network with guests
A fantastic set of documentation, easily the best of the bunch.

Funnel, in particular, is interesting. With a couple of “tailscale serve”-style commands, you can expose a directory tree (or a development webserver) to the world. Tailscale gives you a public hostname, obtains a cert for it, and proxies inbound traffic to you. This is subject to some unspecified bandwidth limits, and you can only choose from three public ports, so it’s not really a production solution – but as a quick and easy way to demonstrate something cool to a friend, it’s a neat feature.

Tailscale: Security and Privacy

With Tailscale, as with the other tools in this category, one of the main threats to consider is the control plane. What are the consequences of a compromise of Tailscale’s control plane, or of the credentials you use to access it?

Let’s begin with the credentials used to access it. Tailscale operates no identity system itself, instead relying on third parties. For individuals, this means Google, Github, or Microsoft accounts; Okta and other SAML and similar identity providers are also supported, but this runs into complexity and expense that most individuals aren’t wanting to take on. Unfortunately, all three of those types of accounts often have saved auth tokens in a browser. Personally I would rather have a separate, very secure, login.

If a person does compromise your account or the Tailscale servers themselves, they can’t directly eavesdrop on your traffic because it is end-to-end encrypted. However, assuming an attacker obtains access to your account, they could:

Tamper with your Tailscale ACLs, permitting new actions
Add new nodes to the network
Forcibly remove nodes from the network
Enable or disable optional features

Of note is that they cannot just commandeer an existing IP. I would say the riskiest possibility here is that could add new nodes to the mesh. Because they could also tamper with your ACLs, they could then proceed to attempt to access all your internal services. They could even turn on service collection and have Tailscale tell them what and where all the services are.

Therefore, as with other tools, I recommend a local firewall on each machine with Tailscale. More on that below.

Tailscale has a new alpha feature called tailnet lock which helps with this problem. It requires existing nodes in the mesh to sign a request for a new node to join. Although this doesn’t address ACL tampering and some of the other things, it does represent a significant help with the most significant concern. However, tailnet lock is in alpha, only available on the Enterprise plan, and has a waitlist, so I have been unable to test it.

Any Tailscale node can request the IP addresses belonging to any other Tailscale node. The Tailscale control plane captures, and exposes to you, this information about every node in your network: the OS hostname, IP addresses and port numbers, operating system, creation date, last seen timestamp, and NAT traversal parameters. You can optionally enable service data capture as well, which sends data about open ports on each node to the control plane.

Tailscale likes to highlight their key expiry and rotation feature. By default, all keys expire after 180 days, and traffic to and from the expired node will be interrupted until they are renewed (basically, you re-login with your provider and do a renew operation). Unfortunately, the only mention I can see of warning of impeding expiration is in the Windows client, and even there you need to edit a registry key to get the warning more than the default 24 hours in advance. In short, it seems likely to cut off communications when it’s most important. You can disable key expiry on a per-node basis in the admin console web interface, and I mostly do, due to not wanting to lose connectivity at an inopportune time.

Tailscale: Connectivity and NAT traversal

When thinking about reliability, the primary consideration here is being able to reach the Tailscale control plane. While it is possible in limited circumstances to reach nodes without the Tailscale control plane, it is “a fairly brittle setup” and notably will not survive a client restart. So if you use Tailscale to reach other nodes on your LAN, that won’t work unless your Internet is up and the control plane is reachable.

Assuming your Internet is up and Tailscale’s infrastructure is up, there is little to be concerned with. Your own comfort level with cloud providers and your Internet should guide you here.

Tailscale wrote a fantastic article about NAT traversal and they, predictably, do very well with it. Tailscale prefers UDP but falls back to TCP if needed. Broker (DERP) servers step in as a last resort, and Tailscale clients automatically select the best ones. I’m not aware of anything that is more successful with NAT traversal than Tailscale. This maximizes the situations in which a direct P2P connection can be used without a broker.

I have found Tailscale to be a bit slow to notice changes in network topography compared to Yggdrasil, and sometimes needs a kick in the form of restarting the client process to re-establish communications after a network change. However, it’s possible (maybe even probable) that if I’d waited a bit longer, it would have sorted this all out.

I touched on the funnel feature earlier. The sharing feature lets you give an invite to an outsider. By default, a person accepting a share can make only outgoing connections to the network they’re invited to, and cannot receive incoming connections from that network – this makes sense. When sharing an exit node, you get a checkbox that lets you share access to the exit node as well. Of course, the person accepting the share needs to install the Tailnet client. The combination of funnel and sharing make Tailscale the best for ad-hoc sharing.

Tailscale: DNS

Tailscale’s DNS is called MagicDNS. It runs as a layer atop your standard DNS – taking over /etc/resolv.conf on Linux – and provides resolution of mesh hostnames and some other features. This is a concept that is pretty slick.

It also is a bit flaky on Linux; dueling programs want to write to /etc/resolv.conf. I can’t really say this is entirely Tailscale’s fault; they document the problem and some workarounds.

I would love to be able to add custom records to this service; for instance, to override the public IP for a service to use the in-mesh IP. Unfortunately, that’s not yet possible. However, MagicDNS can query existing nameservers for certain domains in a split DNS setup.

Tailscale: Source code, pricing, and portability

Tailscale is almost fully open source and the client is highly portable. The client is open source (BSD 3-clause) on open source platforms, and closed source on closed source platforms. The DERP servers are open source. The coordination server is closed source, although there is an open source coordination server called Headscale (also BSD 3-clause) made available with Tailscale’s blessing and informal support. It supports most, but not all, features in the Tailscale coordination server.

Tailscale’s pricing (which does not apply when using Headscale) provides a free plan for 1 user with up to 20 devices. A Personal Pro plan expands that to 100 devices for $48 per year - not a bad deal at $4/mo. A “Community on Github” plan also exists, and then there are more business-oriented plans as well. See the pricing page for details.

As a small note, I appreciated Tailscale’s install script. It properly added Tailscale’s apt key in a way that it can only be used to authenticate the Tailscale repo, rather than as a systemwide authenticator. This is a nice touch and speaks well of their developers.

Tailscale conclusions

Tailscale is tops in sharing and has a broad feature set and excellent documentation. Like other solutions with a centralized control plane, device communications can stop working if the control plane is unreachable, and the threat model of the control plane should be carefully considered.

Zerotier

Zerotier is a close competitor to Tailscale, and is similar to it in a lot of ways. So rather than duplicate all of the Tailscale information here, I’m mainly going to describe how it differs from Tailscale.

The primary difference between the two is that Zerotier emulates an Ethernet network via a Linux tap interface, while Tailscale emulates a TCP/IP network via a Linux tun interface.

However, Zerotier has a number of things that make it be a somewhat imperfect Ethernet emulator. For one, it has a problem with broadcast amplification; the machine sending the broadcast sends it to all the other nodes that should receive it (up to a set maximum). I wouldn’t want to have a lot of programs broadcasting on a slow link. While in theory this could let you run Netware or DECNet across Zerotier, I’m not really convinced there’s much call for that these days, and Zerotier is clearly IP-focused as it allocates IP addresses and such anyhow. Zerotier provides special support for emulated ARP (IPv4) and NDP (IPv6). While you could theoretically run Zerotier as a bridge, this eliminates the zero trust principle, and Tailscale supports subnet routers, which provide much of the same feature set anyhow.

A somewhat obscure feature, but possibly useful, is Zerotier’s built-in support for multipath WAN for the public interface. This actually lets you do a somewhat basic kind of channel bonding for WAN.

Zerotier: Security and Privacy

The picture here is similar to Tailscale, with the difference that you can create a Zerotier-local account rather than relying on cloud authentication. I was unable to find as much detail about Zerotier as I could about Tailscale - notably I couldn’t find anything about how “sticky” an IP address is. However, the configuration screen lets me delete a node and assign additional arbitrary IPs within a subnet to other nodes, so I think the assumption here is that if your Zerotier account (or the Zerotier control plane) is compromised, an attacker could remove a legit device, add a malicious one, and assign the previous IP of the legit device to the malicious one. I’m not sure how to mitigate against that risk, as firewalling specific IPs is ineffective if an attacker can simply take them over. Zerotier also lacks anything akin to Tailnet Lock.

For this reason, I didn’t proceed much further in my Zerotier evaluation.

Zerotier: Connectivity and NAT traversal

Like Tailscale, Zerotier has NAT traversal with STUN. However, it looks like it’s more limited than Tailscale’s, and in particular is incompatible with double NAT that is often seen these days. Zerotier operates brokers (“root servers”) that can do relaying, including TCP relaying. So you should be able to connect even from hostile networks, but you are less likely to form a P2P connection than with Tailscale.

I was unable to find any special features relating to this in the Zerotier documentation. Therefore, it would be at the same level as Yggdrasil: possible, maybe even not too difficult, but without any specific help.

Zerotier: DNS

Unlike Tailscale, Zerotier does not support automatically adding DNS entries for your hosts. Therefore, your options are approximately the same as Yggdrasil, though with the added option of pushing configuration pointing to your own non-Zerotier DNS servers to the client.

Zerotier: Source code, pricing, and portability

The client ZeroTier One is available on Github under a custom “business source license” which prevents you from using it in certain settings. This license would preclude it being included in Debian. Their library, libzt, is available under the same license. The pricing page mentions a community edition for self hosting, but the documentation is sparse and it was difficult to understand what its feature set really is.

The free plan lets you have 1 user with up to 25 devices. Paid plans are also available.

Zerotier conclusions

Frankly I don’t see much reason to use Zerotier. The “virtual Ethernet” model seems to be a weird hybrid that doesn’t bring much value. I’m concerned about the implications of a compromise of a user account or the control plane, and it lacks a lot of Tailscale features (MagicDNS and sharing). The only thing it may offer in particular is multipath WAN, but that’s esoteric enough – and also solvable at other layers – that it doesn’t seem all that compelling to me. Add to that the strange license and, to me anyhow, I don’t see much reason to bother with it.

Netmaker

Netmaker is one of the projects that is making noise these days. Netmaker is the only one here that is a wrapper around in-kernel Wireguard, which can make a performance difference when talking to peers on a 1Gbps or faster link. Also, unlike other tools, it has an ingress gateway feature that lets people that don’t have the Netmaker client, but do have Wireguard, participate in the VPN. I believe I also saw a reference somewhere to nodes as routers as with Yggdrasil, but I’m failing to dig it up now.

The project is in a bit of an early state; you can sign up for an “upcoming closed beta” with a SaaS host, but really you are generally pointed to self-hosting using the code in the github repo. There are community and enterprise editions, but it’s not clear how to actually choose. The server has a bunch of components: binary, CoreDNS, database, and web server. It also requires elevated privileges on the host, in addition to a container engine. Contrast that to the single binary that some others provide.

It looks like releases are frequent, but sometimes break things, and have a somewhat more laborious upgrade processes than most.

I don’t want to spend a lot of time managing my mesh. So because of the heavy needs of the server, the upgrades being labor-intensive, it taking over iptables and such on the server, I didn’t proceed with a more in-depth evaluation of Netmaker. It has a lot of promise, but for me, it doesn’t seem to be in a state that will meet my needs yet.

Nebula

Nebula is an interesting mesh project that originated within Slack, seems to still be primarily sponsored by Slack, but is also being developed by Defined Networking (though their product looks early right now). Unlike the other tools in this section, Nebula doesn’t have a web interface at all. Defined Networking looks likely to provide something of a SaaS service, but for now, you will need to run a broker (“lighthouse”) yourself; perhaps on a $5/mo VPS.

Due to the poor firewall traversal properties, I didn’t do a full evaluation of Nebula, but it still has a very interesting design.

Nebula: Security and Privacy

Since Nebula lacks a traditional control plane, the root of trust in Nebula is a CA (certificate authority). The documentation gives this example of setting it up:

./nebula-cert sign -name "lighthouse1" -ip "192.168.100.1/24"
./nebula-cert sign -name "laptop" -ip "192.168.100.2/24" -groups "laptop,home,ssh"
./nebula-cert sign -name "server1" -ip "192.168.100.9/24" -groups "servers"
./nebula-cert sign -name "host3" -ip "192.168.100.10/24"

So the cert contains your IP, hostname, and group allocation. Each host in the mesh gets your CA certificate, and the per-host cert and key generated from each of these steps.

This leads to a really nice security model. Your CA is the gatekeeper to what is trusted in your mesh. You can even have it airgapped or something to make it exceptionally difficult to breach the perimeter.

Nebula contains an integrated firewall. Because the ability to keep out unwanted nodes is so strong, I would say this may be the one mesh VPN you might consider using without bothering with an additional on-host firewall.

You can define static mappings from a Nebula mesh IP to a clearnet IP. I haven’t found information on this, but theoretically if NAT traversal isn’t required, these static mappings may allow Nebula nodes to reach each other even if Internet is down. I don’t know if this is truly the case, however.

Nebula: Connectivity and NAT traversal

This is a weak point of Nebula. Nebula sends all traffic over a single UDP port; there is no provision for using TCP. This is an issue at certain hotel and other public networks which open only TCP egress ports 80 and 443.

I couldn’t find a lot of detail on what Nebula’s NAT traversal is capable of, but according to a certain Github issue, this has been a sore spot for years and isn’t as capable as Tailscale.

You can designate nodes in Nebula as brokers (relays). The concept is the same as Yggdrasil, but it’s less versatile. You have to manually designate what relay to use. It’s unclear to me what happens if different nodes designate different relays. Keep in mind that this always happens over a UDP port.

There is no particular support here.

Nebula: DNS

Nebula has experimental DNS support. In contrast with Tailscale, which has an internal DNS server on every node, Nebula only runs a DNS server on a lighthouse. This means that it can’t forward requests to a DNS server that’s upstream for your laptop’s particular current location. Actually, Nebula’s DNS server doesn’t forward at all. It also doesn’t resolve its own name.

The Nebula documentation makes reference to using multiple lighthouses, which you may want to do for DNS redundancy or performance, but it’s unclear to me if this would make each lighthouse form a complete picture of the network.

Nebula: Source code, pricing, and portability

Nebula is fully open source (MIT). It consists of a single Go binary and configuration. It is fairly portable.

Nebula conclusions

I am attracted to Nebula’s unique security model. I would probably be more seriously considering it if not for the lack of support for TCP and poor general NAT traversal properties. Its datacenter connectivity heritage does show through.

Roll your own and hybrid

Here is a grab bag of ideas:

Running Yggdrasil over Tailscale

One possibility would be to use Tailscale for its superior NAT traversal, then allow Yggdrasil to run over it. (You will need a firewall to prevent Tailscale from trying to run over Yggdrasil at the same time!) This creates a closed network with all the benefits of Yggdrasil, yet getting the NAT traversal from Tailscale.

Drawbacks might be the overhead of the double encryption and double encapsulation. A good Yggdrasil peer may wind up being faster than this anyhow.

Public VPN provider for NAT traversal

A public VPN provider such as Mullvad will often offer incoming port forwarding and nodes in many cities. This could be an attractive way to solve a bunch of NAT traversal problems: just use one of those services to get you an incoming port, and run whatever you like over that.

Be aware that a number of public VPN clients have a “kill switch” to prevent any traffic from egressing without using the VPN; see, for instance, Mullvad’s. You’ll need to disable this if you are running a mesh atop it.

Other

Combining with local firewalls

For most of these tools, I recommend using a local firewal in conjunction with them. I have been using firehol and find it to be quite nice. This means you don’t have to trust the mesh, the control plane, or whatever. The catch is that you do need your mesh VPN to provide strong association between IP address and node. Most, but not all, do.

Performance

I tested some of these for performance using iperf3 on a 2.5Gbps LAN. Here are the results. All speeds are in Mbps.

Tool	iperf3 (default)	iperf3 -P 10	iperf3 -R
Direct (no VPN)	2406	2406	2764
Wireguard (kernel)	1515	1566	2027
Yggdrasil	892	1126	1105
Tailscale	950	1034	1085
Tinc	296	300	277

You can see that Wireguard was significantly faster than the other options. Tailscale and Yggdrasil were roughly comparable, and Tinc was terrible.

IP collisions

When you are communicating over a network such as these, you need to trust that the IP address you are communicating with belongs to the system you think it does. This protects against two malicious actor scenarios:

Someone compromises one machine on your mesh and reconfigures it to impersonate a more important one
Someone connects an unauthorized system to the mesh, taking over a trusted IP, and uses the privileges of the trusted IP to access resources

To summarize the state of play as highlighted in the reviews above:

Yggdrasil derives IPv6 addresses from a public key
tinc allows any node to set any IP
Tailscale IPs aren’t user-assignable, but the assignment algorithm is unknown
Zerotier allows any IP to be allocated to any node at the control plane
I don’t know what Netmaker does
Nebula IPs are baked into the cert and signed by the CA, but I haven’t verified the enforcement algorithm

So this discussion really only applies to Yggdrasil and Tailscale. tinc and Zerotier lack detailed IP security, while Nebula expects IP allocations to be handled outside of the tool and baked into the certs (therefore enforcing rigidity at that level).

So the question for Yggdrasil and Tailscale is: how easy is it to commandeer a trusted IP?

Yggdrasil has a brief discussion of this. In short, Yggdrasil offers you both a dedicated IP and a rarely-used /64 prefix which you can delegate to other machines on your LAN. Obviously by taking the dedicated IP, a lot more bits are available for the hash of the node’s public key, making “collisions technically impractical, if not outright impossible.” However, if you use the /64 prefix, a collision may be more possible. Yggdrasil’s hashing algorithm includes some optimizations to make this more difficult. Yggdrasil includes a genkeys tool that uses more CPU cycles to generate keys that are maximally difficult to collide with.

Tailscale doesn’t document their IP assignment algorithm, but I think it is safe to say that the larger subnet you use, the better. If you try to use a /24 for your mesh, it is certainly conceivable that an attacker could remove your trusted node, then just manually add the 240 or so machines it would take to get that IP reassigned. It might be a good idea to use a purely IPv6 mesh with Tailscale to minimize this problem as well.

So, I think the risk is low in the default configurations of both Yggdrasil and Tailscale (certainly lower than with tinc or Zerotier). You can drive the risk even lower with both.

Final thoughts

For my own purposes, I suspect I will remain with Yggdrasil in some fashion. Maybe I will just take the small performance hit that using a relay node implies. Or perhaps I will get clever and use an incoming VPN port forward or go over Tailscale.

Tailscale was the other option that seemed most interesting. However, living in a region with Internet that goes down more often than I’d like, I would like to just be able to send as much traffic over a mesh as possible, trusting that if the LAN is up, the mesh is up.

I have one thing that really benefits from performance in excess of Yggdrasil or Tailscale: NFS. That’s between two machines that never leave my LAN, so I will probably just set up a direct Wireguard link between them. Heck of a lot easier than trying to do Kerberos!

Finally, I wrote this intending to be useful. I dealt with a lot of complexity and under-documentation, so it’s possible I got something wrong somewhere. Please let me know if you find any errors.

This blog post is a copy of a page on my website. That page may be periodically updated.

Using Yggdrasil As an Automatic Mesh Fabric to Connect All Your Docker Containers, VMs, and Servers

February 1, 2023Technologymesh, yggdrasilJohn Goerzen

Update 2023-04: The version of this page on my public website has some important updates, including how to use broadcast detection in Docker, Yggdrasil zero-config for ephemeral containers, and more. See it for the most current information.

Sometimes you might want to run Docker containers on more than one host. Maybe you want to run some at one hosting facility, some at another, and so forth.

Maybe you’d like run VMs at various places, and let them talk to Docker containers and bare metal servers wherever they are.

And maybe you’d like to be able to easily migrate any of these from one provider to another.

There are all sorts of very complicated ways to set all this stuff up. But there’s also a simple one: Yggdrasil.

My blog post Make the Internet Yours Again With an Instant Mesh Network explains some of the possibilities of Yggdrasil in general terms. Here I want to show you how to use Yggdrasil to solve some of these issues more specifically. Because Yggdrasil is always Encrypted, some of the security lifting is done for us.

Background

Often in Docker, we connect multiple containers to a single network that runs on a given host. That much is easy. Once you start talking about containers on multiple hosts, then you start adding layers and layers of complexity. Once you start talking multiple providers, maybe multiple continents, then the complexity can increase. And, if you want to integrate everything from bare metal servers to VMs into this – well, there are ways, but they’re not easy.

I’m a believer in the KISS principle. Let’s not make things complex when we don’t have to.

Enter Yggdrasil

As I’ve explained before, Yggdrasil can automatically form a global mesh network. This is pretty cool! As most people use it, they join it to the main Yggdrasil network. But Yggdrasil can be run entirely privately as well. You can run your own private mesh, and that’s what we’ll talk about here.

All we have to do is run Yggdrasil inside each container, VM, server, or whatever. We handle some basics of connectivity, and bam! Everything is host- and location-agnostic.

Setup in Docker

The installation of Yggdrasil on a regular system is pretty straightforward. Docker is a bit more complicated for several reasons:

It blocks IPv6 inside containers by default
The default set of permissions doesn’t permit you to set up tunnels inside a container
It doesn’t typically pass multicast (broadcast) packets

Normally, Yggdrasil could auto-discover peers on a LAN interface. However, aside from some esoteric Docker networking approaches, Docker doesn’t permit that. So my approach is going to be setting up one or more Yggdrasil “router” containers on a given Docker host. All the other containers talk directly to the “router” container and it’s all good.

Basic installation

In my Dockerfile, I have something like this:

FROM jgoerzen/debian-base-security:bullseye
RUN echo "deb http://deb.debian.org/debian bullseye-backports main" >> /etc/apt/sources.list && \
    apt-get --allow-releaseinfo-change update && \
    apt-get -y --no-install-recommends -t bullseye-backports install yggdrasil
...
COPY yggdrasil.conf /etc/yggdrasil/
RUN set -x; \
    chown root:yggdrasil /etc/yggdrasil/yggdrasil.conf && \
    chmod 0750 /etc/yggdrasil/yggdrasil.conf && \
    systemctl enable yggdrasil

The magic parameters to docker run to make Yggdrasil work are:

--cap-add=NET_ADMIN --sysctl net.ipv6.conf.all.disable_ipv6=0 --device=/dev/net/tun:/dev/net/tun

This example uses my docker-debian-base images, so if you use them as well, you’ll also need to add their parameters.

Note that it is NOT necessary to use --privileged. In fact, due to the network namespaces in use in Docker, this command does not let the container modify the host’s networking (unless you use --net=host, which I do not recommend).

The --sysctl parameter was the result of a lot of banging my head against the wall. Apparently Docker tries to disable IPv6 in the container by default. Annoying.

Configuration of the router container(s)

The idea is that the router node (or more than one, if you want redundancy) will be the only ones to have an open incoming port. Although the normal Yggdrasil case of directly detecting peers in a broadcast domain is more convenient and more robust, this can work pretty well too.

You can, of course, generate a template yggdrasil.conf with yggdrasil -genconf like usual. Some things to note for this one:

You’ll want to change Listen to something like Listen: ["tls://[::]:12345"] where 12345 is the port number you’ll be listening on.
You’ll want to disable the MulticastInterfaces entirely by just setting it to [] since it doesn’t work anyway.
If you expose the port to the Internet, you’ll certainly want to firewall it to only authorized peers. Setting AllowedPublicKeys is another useful step.
If you have more than one router container on a host, each of them will both Listen and act as a client to the others. See below.

Configuration of the non-router nodes

Again, you can start with a simple configuration. Some notes here:

You’ll want to set Peers to something like Peers: ["tls://routernode:12345"] where routernode is the Docker hostname of the router container, and 12345 is its port number as defined above. If you have more than one local router container, you can simply list them all here. Yggdrasil will then fail over nicely if any one of them go down.
Listen should be empty.
As above, MulticastInterfaces should be empty.

Using the interfaces

At this point, you should be able to ping6 between your containers. If you have multiple hosts running Docker, you can simply set up the router nodes on each to connect to each other. Now you have direct, secure, container-to-container communication that is host-agnostic! You can also set up Yggdrasil on a bare metal server or VM using standard procedures and everything will just talk nicely!

Security notes

Yggdrasil’s mesh is aggressively greedy. It will peer with any node it can find (unless told otherwise) and will find a route to anywhere it can. There are two main ways to make sure your internal comms stay private: by restricting who can talk to your mesh, and by firewalling the Yggdrasil interface. Both can be used, and they can be used simultaneously.

By disabling multicast discovery, you eliminate the chance for random machines on the LAN to join the mesh. By making sure that you firewall off (outside of Yggdrasil) who can connect to a Yggdrasil node with a listening port, you can authorize only your own machines. And, by setting AllowedPublicKeys on the nodes with listening ports, you can authenticate the Yggdrasil peers. Note that part of the benefit of the Yggdrasil mesh is normally that you don’t have to propagate a configuration change to every participatory node – that’s a nice thing in general!

You can also run a firewall inside your container (I like firehol for this purpose) and aggressively firewall the IPs that are allowed to connect via the Yggdrasil interface. I like to set a stable interface name like ygg0 in yggdrasil.conf, and then it becomes pretty easy to firewall the services. The Docker parameters that allow Yggdrasil to run are also sufficient to run firehol.

Naming Yggdrasil peers

You probably don’t want to hard-code Yggdrasil IPs all over the place. There are a few solutions:

You could run an internal DNS service
You can do a bit of scripting around Docker’s --add-host command to add things to /etc/hosts

Other hints & conclusion

Here are some other helpful use cases:

If you are migrating between hosts, you could leave your reverse proxy up at both hosts, both pointing to the target containers over Yggdrasil. The targets will be automatically found from both sides of the migration while you wait for DNS caches to update and such.
This can make services integrate with local networks a lot more painlessly than they might otherwise.

This is just an idea. The point of Yggdrasil is expanding our ideas of what we can do with a network, so here’s one such expansion. Have fun!

Note: This post also has a permanent home on my webiste, where it may be periodically updated.

Music Playing: Both Whole-House and Mobile

December 9, 2022Music, Softwareampache, jellyfin, lms, mopidy, music, streamingJohn Goerzen

It’s been nearly 8 years since I last made choices about music playing. At the time, I picked Logitech Media Server (LMS, aka Slimserver and Squeezebox server) for whole-house audio and Ampache with the DSub Android app.

It’s time to revisit that approach. Here are the things I’m looking for:

Whole-house audio: a single control point for all the speakers in the house, which are all connected to some form of Linux (Raspberry Pi or x86). The speakers should be reasonably in sync with each other, and the control point should be able to adjust volume on them centrally. I should be able to play albums, playlists, etc. on them, and skip tracks or seek within a track.
The ability to stream to an Android mobile device, ideally with downloading capabilities for offline use.
If multiple solutions are used, playlist syncing between them.
Ideally, bookmark support to resume playing a long track where it was left off.
Ideally, podcast support.

The current setup

Here are the current components:

Logitech Media Server, which serves the music library for whole-house synchronized audio
Squeezelite is the LMS client running on my Raspberry Pi and x86 systems
Squeezer is a nice Android client for LMS to control playback, adjust volume, etc. It doesn’t do any playback on the Android device, of course.
Ampache provides the server for streaming clients, both browser-based and mobile
DSub (F-Droid, Play Store) is a nice Android client for Ampache providing streaming and offline playback

LMS makes an excellent whole-house audio system. I can pull up the webpage (or use an Android app like Squeezer) to browse my music library, queue things up to play, and so forth. I can also create playlists, which it saves as m3u files.

This whole setup is boringly reliable. It just works, year in, year out.

The main problem with this is that LMS has no real streaming/offline mobile support. It is also a rather dated system, with a painful UI for playlist management, and in general doesn’t feel very modern. (It’s written largely in Perl also!)

So, I paired with it is Ampache. As a streaming player, Ampache is fantastic; I can access it from a web browser, and it will transcode my FLAC files to the quality I’ve set in my user prefs. The DSub app for Android is fantastic and remembers my last-play locations and such.

The problem is that Ampache doesn’t write its playlists back to m3u format, so I can’t use them with LMS. I have to therefore maintain all the playlists in LMS, and it has a smallish limit on the number of tracks per playlist. Ampache also doesn’t auto-update from LMS playlists, so I have to delete and recreate the playlists catalog periodically to get updates into Ampache. Not fun.

The new experiment

I’m trying out a new system based on these components:

Jellyfin is a media player. It supports not just music, but also video (in fact, the emphasis is more on video). Notably it supports controlling various devices. Its normal frontend is a web browser; Jellyfin’s server won’t output audio to a device itself.
Mopidy is a media player with a web interface that does output audio to a local device. In normal use, it displays an interface to your music, letting you select, queue up, etc.
Mopidy-Jellyfin (docs) is a plugin for Mopidy that enables two things: 1) Browsing the Jellyfin library within Mopidy, and 2) controlling Mopidy from within Jellyfin. Mode 1 barely works, but mode 2 works perfectly. Within Jellyfin, I can “cast to Mopidy” and queue up things, seek, skip tracks, etc.
Snapcast is a generic solution to take audio from some sort of source and distribute it throughout the house, syncing each device (and with better syncing than LMS, too!). The source can be just about anything, and the docs include an example of how to set it up with Mopidy.
Mopidy has selectable web interfaces, and the Mopidy-Muse interface has the added benefit of having integrated Snapcast control. (Mopidy-Iris does as well, though it wasn’t documented there.) Within it, I can adjust volume on devices, mute devices, etc. I could also use the Snapcast web interface for this purpose.
The default Jellyfin Android app lets me stream media to the mobile device, as well as control the Mopidy player.
Finamp (F-Droid, Play Store) is a very nice Android Jellyfin music playing client, which notably supports downloads for offline playing, a feature the stock app lacks.
The Snapcast Android app (F-Droid, Play Store) isn’t strictly necessary, since the Snapcast web app is so simple to use. But it provides near-instant control of speakers and volumes.

This looks a lot more complicated than what I had before, but in reality it only has one additional layer. Since Snapcast is a general audio syncing tool, and Jellyfin doesn’t itself output audio, Mopidy and its extensions is the “glue”.

There’s a lot to like about this setup. There is one single canonical source for music and playlists. Jellyfin can do a lot more besides music, and its mobile app gives me video access also. The setup, in general, works pretty well.

There are a few minor glitches, but nothing huge. For instance, Jellyfin fails to clear the play queue on the mopidy side.

But there is one problem, though: when playing a playlist, it is played out of order. Jellyfin itself has the same issue internally, so I’m unsure where the bug lies.

Rejected option: Jellyfin with jellycli

This could be a nice option; instead of mopidy with a plugin, just run jellycli in headless mode as a more “native” client. It also has the playlist ordering bug, and in addition, fails to play a couple of my albums which Mopidy-Jellyfin handles fine. But, if those bugs were addressed, it has a ton of promise as a simpler glue between Jellyfin and Snapcast than Mopidy.

Rejected option: Mopidy-Subidy Plugin with Ampache

Mopidy has a Subsonic plugin, and Ampache implements the Subsonic API. This would theoretically let me use a Mopidy client to play things on the whole-house system, coming from the same Ampache system.

Although I did get this connected with some trial and error (legacy auth on, API version 1.13.0), it was extremely slow. Loading the list of playlists took minutes, the list of albums and artists many seconds. It didn’t cache any answers either, so it was unusably slow.

Rejected option: Ampache localplay with mpd

Ampache has a feature called localplay which allows it to control a mpd server. I tested this out with mpd and snapcast. It works, but is highly limited. Basically, it causes Ampache to send a playlist — a literal list of URLs — to the mpd server. Unfortunately, seeking within a track is impossible from within the Ampache interface.

I will note that once a person is using mpd, snapcast makes a much easier whole-house solution than the streaming option I was trying to get working 8 years ago.

Building an Asynchronous, Internet-Optional Instant Messaging System

December 8, 2022Technologyasynchronous, nncpJohn Goerzen

I loaded up this title with buzzwords. The basic idea is that IM systems shouldn’t have to only use the Internet. Why not let them be carried across LoRa radios, USB sticks, local Wifi networks, and yes, the Internet? I’ll first discuss how, and then why.

How do set it up

I’ve talked about most of the pieces here already:

Delta Chat, which is an IM app that uses mail servers (SMTP and IMAP) as transport, and OpenPGP encryption for security.
One of the items I highlighted in Tools for Communicating Offline and in Difficult Circumstances, such as:
- NNCP, which I have talked about a lot. NNCP can run atop a network, over USB drives (sneakernet) (even airgapped), and over many other transports.
- Syncthing, which can form an ad-hoc mesh.
- Filespooler, which handles remote command execution, and can use a transport such as Syncthing or NNCP to get messages to their destination.
- Yggdrasil, which forms an auto-mesh network over things like ad-hoc wifi. It’s not asynchronous itself, but its properties may be used to build an asyncrhonous email network – email itself can be asynchronous across any carrier. Others such as Tor could also be used.
- And various other physical carriers such as LoRa and XBee SX radios.
Email servers. For instance, there are existing instructions for running Postfix or Exim over NNCP. These can be easily adapted to run across something like Filespooler instead. These can be run locally on a laptop, or, with a tool such as Termux, on Android.

So, putting this together:

All Delta Chat needs is access to a SMTP and IMAP server. This server could easily reside on localhost.
Existing email servers support transport of email using non-IP transports, including batch transports that can easily store it in files.
These batches can be easily carried by NNCP, Syncthing, Filespooler, etc. Or, if the connectivity is good enough, via traditional networking using Yggdrasil.
- Side note: Both NNCP and email servers support various routing arrangements, and can easily use intermediary routing nodes. Syncthing can also mesh. NNCP supports asynchronous multicast, letting your messages opportunistically find the best way to their destination.

OK, so why would you do it?

You might be thinking, “doesn’t asynchronous mean slow?” Well, not necessarily. Asynchronous means “reliability is more important than speed”; that is, slow (even to the point of weeks) is acceptable, but not required. NNCP and Syncthing, for instance, can easily deliver within a couple of seconds.

But let’s step back a bit. Let’s say you’re hiking in the wilderness in an area with no connectivity. You get back to your group at a campsite at the end of the day, and have taken some photos of the forest and sent them to some friends. Some of those friends are at the campsite; when you get within signal range, they get your messages right away. Some of those friends are in another country. So one person from your group drives into town and sits at a coffee shop for a few minutes, connected to their wifi. All the messages from everyone in the group go out, all the messages from outside the group come in. Then they go back to camp and the devices exchange messages.

Pretty slick, eh?

Note: this article also has a more permanent home on my website, where it may be periodically updated.

Flying Joy

November 28, 2022Aviation, FamilyJohn Goerzen

Wisdom from my 5-year-old: When flying in a small plane, it is important to give your dolls a headset and let them see out the window, too!

Moments like this make me smile at being a pilot dad.

A week ago, I also got to give 8 children and one adult their first ever ride in any kind of airplane, through EAA’s Young Eagles program. I got to hear several say, “Oh wow! It’s SO beautiful!” “Look at all the little houses!”

And my favorite: “How can I be a pilot?”

How do we make the Internet work in these scenarios?

Types of Mesh VPNs

Terminology

Evaluation Criteria

Fully Decentralized VPNs with Automatic Hop Routing

Yggdrasil

Yggdrasil can be a private mesh VPN, or something more

Meeting neighbors

Unique uses

Yggdrasil: Security and Privacy

Yggdrasil: Connectivity and NAT traversal

Yggdrasil: Sharing with friends

Yggdrasil: DNS

Yggdrasil: Source code, pricing, and portability

Yggdrasil conclusions

tinc

tinc: Security and Privacy

tinc: Connectivity and NAT traversal

tinc: Sharing with friends

tinc: Source code, pricing, and portability

tinc conclusions

Automatic Peer-to-Peer Mesh VPNs with centralized control

Tailscale

Tailscale: Security and Privacy

Tailscale: Connectivity and NAT traversal

Tailscale: Sharing with friends

Tailscale: DNS

Tailscale: Source code, pricing, and portability

Tailscale conclusions

Zerotier

Zerotier: Security and Privacy

Zerotier: Connectivity and NAT traversal

Zerotier: Sharing with friends

Zerotier: DNS

Zerotier: Source code, pricing, and portability

Zerotier conclusions

Netmaker

Nebula

Nebula: Security and Privacy

Nebula: Connectivity and NAT traversal

Nebula: Sharing with friends

Nebula: DNS

Nebula: Source code, pricing, and portability

Nebula conclusions

Roll your own and hybrid

Running Yggdrasil over Tailscale

Public VPN provider for NAT traversal

Other

Combining with local firewalls

Performance

IP collisions

Final thoughts

Background

Enter Yggdrasil

Setup in Docker

Basic installation

Configuration of the router container(s)

Configuration of the non-router nodes

Using the interfaces

Security notes

Naming Yggdrasil peers

Other hints & conclusion

How do set it up

OK, so why would you do it?