acha.ninja

Introducing Bupstash

You know backups right? Those things you never got around to setting up… well no need to feel guilty any longer, today I’m excited to share my new backup tool - Bupstash!

So why is bupstash cool? Bupstash lets you make encrypted, deduplicated and secure backups. Bupstash is fast, focused, and places strong emphasis on both security and privacy. Backups can happen to local drives or over ssh. Bupstash can backup files, directories, and even arbitrary command output.

Bupstash was designed to have:

  • Efficient deduplication - Bupstash can store hundreds of encrypted directory snapshots using a fraction of the space encrypted tarballs would require.

  • Offline decryption keys - Backups do not require the decryption key be anywhere near an at-risk server or computer.

  • Key/value tagging with search - all while keeping the tags fully encrypted.

  • Great performance on slow networks - Bupstash really strives to work well on high latency networks like cellular and connections to far-off lands.

  • Secure access controls - Ransomware and disgruntled business partners will be powerless to delete your backups, even after a full system take over.

  • Efficient incremental backups - Bupstash knows what it backed up last time and skips that work.

Sounds great right? How does it work?

Encrypted deduplication

Bupstash stores data as a kind of Merkle tree with encrypted root nodes. Each time a file is saved, it is split into data chunks, which are encrypted (using crypto_box_curve25519xchacha20poly1305 from libsodium) and then sent to or saved in the backup repository if they have never been seen before.

To identify encrypted chunks for deduplication, we use an HMAC of the decrypted contents. The HMAC key is a shared secret between the encryption key and the decryption key. This means someone cannot guess a chunk’s contents based on the deduplication address in the backup repository.

The end result is that many snapshots which share similar data also share encrypted chunks, resulting in very space-efficient backups. My personal backup repository contains over 400 snapshots of my 40 GB home directory and weighs in at around 50 GB total.

Offline decryption keys

In bupstash a key is either a ‘main key’ or a ‘put key’. Backups made by put keys can only be decrypted by the original main key. With this setup, you can send backups to a bupstash repository without ever risking your decryption key.

You can imagine Bupstash’s model like when you send someone a GPG-encrypted email and you do not have access to their decryption key. Instead you use asymmetric cryptography and send data addressed to their private key. Another benefit of the email-like model is that the email provider does not need access to your decryption keys to store or forward your emails. Likewise, a remote bupstash repository never needs access to your backup decryption keys to operate.

Encrypted metadata search

Bupstash maintains all backups in a mostly append-only log of encrypted backup metadata. In order to allow searching that metadata, bupstash syncs the metadata log client-side before performing decryption and searches client-side. This design sacrifices some performance for search and listing in favour of fully encrypted metadata. In practice bupstash can easily search hundreds to thousands of snapshots in seconds.

Avoiding network around trips

Bupstash focuses on backup and restore streaming, avoiding network round trips as much as possible.

When a user requests an encrypted snapshot, the server walks the entire data structure server-side, and pushes it to the client. The client then is able to verify the structure and restore your data without any network round trips.

The bupstash ‘garbage collector’ works entirely server-side and without requiring the server to decrypt any data, avoiding sending data to the client in costly and slow ways.

What does this all mean? In some benchmarks restoring backups from a distant remote server, I was able to achieve 6x performance improvements over the previous backup tool I was using (benchmarks coming soon).

Secure access controls

Bupstash is designed to allow the use of ssh force commands to control whether an ssh key has permission to make new backups, list backups or remove backups.

In my personal backup setup, my laptop can only add new backups via a vpn, but never remove old backups from my backup repository. If my laptop were to be stolen, all backups are safe from deletion.

When bupstash is configured with append-only access controls, making a backup is analogous to sending someone an email, the recipient of the message is the one who decides when the message can be deleted.

You can see how to set this up in the access controls tutorial.

Efficient network use

Bupstash has the notion of a ‘send log’ for saving bandwidth and accelerating snapshot speeds. Bupstash remembers which data chunks were sent in the previous snapshot, allowing the client to avoid resending data pointlessly.

Example usage

Below is a simple bupstash session:

$ export BUPSTASH_REPOSITORY=ssh://bupstash.io/backups

$ bupstash put host="$(hostname)" name=home-backup.tar /home
ebb66f3baa5d432e9f9a28934888a23d

$ bupstash list host=$(hostname) and 'name=*.tar'
...
id="dbca49b072c0f94b9e72bf81e7716ff9" name="home-backup.tar" hostname="server1" timestamp="2020/08/03 15:47:32"

$ bupstash list-contents id="dbca*"
drwxr-xr-x 0     2020/10/30 13:32:04 .
-rw-r--r-- 1967  2020/10/30 13:32:04 data.txt
...

$ bupstash get id="dbca*" | tar -C ./restore -xvf -
...
$ bupstash get --pick data.txt id="dbca*"
data!

$ bupstash rm name=home-backup.tar and older-than 30d
$ bupstash gc

My goal with the bupstash user interface is letting people get on with their day while having a simple and excellent backup insurance policy.

Conclusion

Bupstash is currently in alpha and ready for people to play with at their own risk.

For more information and help:

Stay tuned for benchmarks and more interesting developments, and thank you for your time :).