Borgbackup

Borgbackup is great. You can run a simple command to backup directories from one machine to another, as long as ssh between the 2 machines is available. Here's a sample command, assuming you have a remote borg repo already set up. I decided to go for the smallest storage box that Hetzner do. This is where I could plug a referral link, but my account at Hetzner is currrently too new so oh well.

Assuming we have that in place, and a borg repo has been initialised on that storage box. Also, we probably also be wanting to put an authorized_keys file there so we do cert based login.

That can be done by doing something like:

# Copies authorized_key to the storage box
# <storage-box-username> This is your hetzner storage box username
$ echo -e "mkdir .ssh \n chmod 700 .ssh \n put authorized_keys .ssh/authorized_keys \n chmod 600 .ssh/authorized_keys" | sftp <storage-box-username>@<storage-box-username>.your-storagebox.de

# To initialise a repo on the storage box
# <borg_repo_name> This is the name you want to give to your repo
$ borg init --encryption=repokey ssh://<storage-box-username>@<storage-box-username>.your-storagebox.de:23/./<borg_repo_name>/

# To create your first backup
# <NAME> This is the name for the current backup. I normally add a timestamp here.
$ borg create ssh://<storage-box-username>@<storage-box-username>.your-storagebox.de:23/./<borg_repo_name>/::<NAME>

Cool. So with that done, we can create offsite backups with a single command. Kinda. There are a few steps we now need to take prior to running this command. As this borg repo requires ssh access, we need to make sure the ssh private key is mounted.

$ eval $(ssh-agent)
$ ssh-add ~/.ssh/key

If we truly want there to be a single command, we also need to set an environment
variable with the repo password.

$ export BORG_PASSPHRASE=""

Storing this password in an environment variable felt like a bad idea at first. However, this password is used to access a repo. If an attacker HAD penetrated the rest of my infrastructure, and they got access to this machine to pull the password, well, no biggy. The vm will already have the data it needs to back up so the losing that password isn't a huge issue. I think.

Anyway, with those things done, we now can actually do a single command to back up the required files off-site. Cool.

So how do we check a remote backup? Well, you could just sftp into your remote borg repo, pull the repo down, and mount it. Like so:

# sft to the repo
$ sftp -P 23 <storage-box-username>@<storage-box-username>.your-storagebox.de

# This will download the repo to the directory you call sftp from
get -r <borg_repo_name>/

# Once downloaded, create another directory called borg-mounted and mount the downloaded repo to it
$ mkdir borg-mounted
$ borg mount /<borg_repo_name> /borg-mounted

Great, so now we can do single command backups. However, I dont like the fact that I need to specify the current time stamp manually. Also, if this is being automated, I probably want to notify myself somehow that a backup has failed so I don't need to babysit it. Also, I'm being economical with 100GB storage box, so I cant have a full offsite backup. I decided to backup 2 small vms which come to about 6GB total plus 30GB of misc data. I wanted to backup my gitlab instance, but that vms chonky, at 64GB, so I decided to just keep a local cached copy of all my repos. Guess that too should be automated.

Anyway, here's the script.

https://gitlab.com/whyitnowork/offsite-backup/-/blob/main/roles/offsite-backup-provision/files/borg-backup.py

The logic is pretty simple.

  1. Load vars form .env
  2. Get the current timestamp
  3. Notify my gotify server of backup starting
  4. Poll local gitlab api for all repos belonging to my username
  5. Check local cache repo, which in my case is found at /mnt/offsite-backup, which is a samba share for how many repos it contains.
  6. If none, perform a full clone of all repos provided by api
  7. If equal, perform a git pull in each repo.
  8. Otherwise, perform a purge of all local repos, and perform a full git clone.
  9. Check that the mounted directorys, using the DIRS_TO_CHECK var to see if they contain data. Doing a lazy check here to see if the samba share is mounted. If not, send alert to gotify and quit.
  10. If directories have data, create a borg backup, with a timestamp appended, of the /mnt/ directory and send off site.
  11. Send gotify alert to signify success and timestamp of completion.

Right. Easy enough. Except, I couldn't find an easy to mount ssh keys for this script. Also, I needed an easy way to ensure the borg passphrase environment variable was set. So I made a wrapper script for the python script.

That can be found here.

This feels hacky, but I'm keeping it. Anyway, we add this to a cron task to run daily at a specified time, and we're done.

Challenges

Borg backup to a Hetzner storage box was a new process for me. Prety easy though as their docs covered all the important bits.

Python was chosen purely because I need to be more familiar with it for work. Also beacuse it's super easy to script with.

Ansible

Actually, no. You see, we need to be able to deploy this to a machine. We can't just a generic vm because my script has a number of dependencies. The full set up requires the following:

  1. Install a bunch of apt packages we need
  2. Install all the pip packages we need
  3. Copy ssh keys over
  4. Mount the samba shares
  5. Create a cron job for the backup script.

Seems like effort if I ever want to redeploy this vm. Also, consistency between dev and production will help with the good old "it works on my machine". Ansible seems like a solid choice.

I created a role for that here.

https://gitlab.com/whyitnowork/offsite-backup/-/blob/main/roles/offsite-backup-provision/tasks/main.yml

I'm making use of Hashicorps Vault to store all sensitive values so I don't need to keep them with the repo. Take for example the following credential stored in the wrapper script. Ansible will copy this template into the correct directory and replace the string with the password retrieved from vault.

"{{ lookup('hashi_vault', 'secret=ansible-secrets/data/borg-backup:borg_repo_password') }}"

Great, except, to configure a vm to run our script, we now need to run 3 commands.

$ export VAULT_ADDRESS="<VAULT_URL>"
$ export VAULT_TOKEN="<VAULT_TOKEN>"
$ ansible-playbook -i inventory playbook.yml

Ok, that's not bad. Once that playbook is run, the vm will append to our online borg_repo, assuming we set the vars across the git repo correctly. Yes. Across my repo. I need to refactor this codebase so the vars are all in one file. For reference, the following files contain variables.

Are we done? Lolno.

Challenges

A challenge here was trying to figure out how to set and persist environment variables for cron tasks. Solved that by using the wrapper script.

Terraform

This all still requires creating the inital vm to run the playbook on and setting the basic info like networking. Easy enough. We can just clone one of our template vms using Terraform.

Heres the terraform config.

https://gitlab.com/whyitnowork/offsite-backup/-/blob/main/terraform/main.tf

The tl;dr for this file is that it clones a vm using the vpshere provider to work with my VMWare base infra. I would like to move the hardcoded vars to a central location accessible to the entire codebase. That's one for the future list.

Now, to create the vm based on this terraform file, we need to run the following commands:

# First plan out the vm. Notice that we're using a environment var here for the vault token as the terraform config is also pulling secrets from vault.
$ terraform plan -out=deployvm -var "vault_token=$(VAULT_TOKEN)"

# Apply the plan
$ terraform apply "deployvm"

# Now run the playbook
$ ansible-playbook -i inventory playbook.yml

Great. Except we're now using 3 commands. That's far too much for my delicate fingies. Lets wrap these commands into a Makefile. I picked this trick up from one of the resources linked at the end.

The Makefile is here:

https://gitlab.com/whyitnowork/offsite-backup/-/blob/main/Makefile

So, once we set the vault url and token as env vars, we can bring up a full fresh vm just by running the following command:

$ make create-vm-and-run-playbook

Nice. Except......where does the template we're cloning come from?

Challenges

Terraforming for vsphere is easy with the official provisioner. The Proxmox one was a bit more involved, but not particularly complicated overall.

Packer

Ok look. I know we can clone vms with terraform, and thats fine when your infrastructure is managed by someone else, but I like to self-host. Which means I need to create my own "golden images".

Packer, another tool by Hashicorp, allows you to do this. However, packer requires access to the vm it's instantiating, and normally does this by ssh on linux. Which means you need to have a somewhat prepped vm that can

This took a fair bit of time to get right as you need to have a vm that can at least handle ssh. To do this with debian11, we can do this with a preseed.cfg and then a cloud config file. The preseed file contains the minimum to get things working, and I totally didnt just duckduckgo around until I found one that looked like it would work. The cool thing is that packer sets up a http server from which it shares the preseed.cfg file to the os being installed.

Anyway, with that done, packer can get a basic vm set up. We then need to provision it with things that are consistent across my vms. SSH keys, ldap configs, certificats etc. That sorta thing. The cool thing about packer is that it has a built in provisioner that can run playbooks. Seeing as I already had a nice golden provision playbook, I just adapted it to work here.

Except, I now also have a proxmox cluster so I should probably create a golden template for there as well.

Which I did.

Entire codebase for my packer-golden repo can be found here.

https://gitlab.com/whyitnowork/packer-golden

This also includes a Makefile, so I can create a new template pretty easily using a single command:

$ pkr-deb11-esxi
$ pkr-deb11-pv

Challenges

An interesting challenge with this project was passing a variable from packer to the ansible playbook. This was needed because I wanted ansible to install either open-vm-tools or qemu-guest-agent depending on which hypervisor the template was destined for.

This was solved by adding extra vars to the packer ansible provisioner:

{
      "type": "ansible",
      "playbook_file": "{{ user `playbook_file` }}",
      "extra_arguments": [
        "--extra-vars", "hypervisor=pve}}"
      ]
    }

Critiques

The codebase is very specific to my infrstaructure.

The remote repo isn't initialised by this codebase. The assumption made is that the initial offsite-upload will be a manual process. All future backups are automated.

My python script has a number of dependencies. I could probably cut some out by coding smarter.

The packer codebase needs a debian 10 addition for both hypervisors.

I should look into pre existing hardening playbooks for my golden image.

I trimmed some identifying info like my internal domain name from these public repos. Aside from that, they are identical to what I'm currently using.

None of my code has any unit tests. For my next project, I should make this a key learning point.

Resources

https://borgbackup.readthedocs.io/en/stable/index.html
https://www.youtube.com/watch?v=VUKFsmZSEiM
https://codereviewvideos.com/course/installing-kubernetes-rancher-2-terraform
https://github.com/KryptionX/packer-debian-proxmox-template
https://github.com/romantomjak/packer-proxmox-template
https://www.hashicorp.com/