Using rsync on Ubuntu Linux to back up some Windows shares

I use a combination of standard tools to back up some Windows files

A while ago I set up a machine to back up some files and folders on our network for my Real Job™. It was running an old release of Ubuntu Linux, and I had a slow day, so I figured I'd update to the latest point release. I didn't take a backup first even though I knew better¹. The upgrade failed and left the machine unbootable. I was able to get it booting again, but it was in a broken state. Could it have been fixed? Maybe. But it was faster to just blow it away and re-do the whole thing.²

Downloading and preparing the media

For this install I went with the current LTS version of Ubuntu Server 20.04. I chose this for two reasons: 1. The server previously ran Ubuntu server before I trashed it, and 2. I'm most familiar with Debian-based distributions. This howdid can be adapted to just about any Unix-like OS, but you're going to have to do some legwork if you want to go that route.

I'm not going to go into writing the .iso to a disk or go through the installer, otherwise this will be way too long. Use whatever tools you have available to you to prepare your installation media and install it on your computer. I used essentially all the default choices. You do want to make sure that you have SSH Server installed if you think that you're going to be accessing this thing remotely, if not, don't worry about it. You'll also want to create a user account. It doesn't really matter a lot what the user account is, but you may want to make sure that the user's Full Name is filled out appropriately, since we'll be sending email from that account later on.

Installing additional packages

The only additional package I installed were sendmail and mutt.

apt install sendmail mutt

This is so that the server can send me notifications when it's done³. This is all handled over the internal network, so sendmail's default configuration is fine. I did have to contact our network team to have them allow this computer to be able to relay mail through their mail server.

Laying the groundwork

The first thing to do is to create some mountpoints. These are going to be the places where we're going to attach the Windows shared folders. I also have two USB drives that I'm going to use to backup the data to⁴, and I need four additional mountpoints for each of the Windows folders that I'm backing up. Ubuntu generally seems like it wants to put mounted drives in the /media folder, so let's oblige.

either sudo or su to root
# mkdir /media/external
# mkdir /media/external2
# mkdir /media/dir1
# mkdir /media/dir2
# mkdir /media/dir3

dir1 dir2, and dir3 arent' really what I named them. I gave them meaningful names that mirror what folders they're backing up. external and external2 are for the external USB drives that I'm going to use.

The server is a relatively ancient Windows Server, and it still serves out shared files using the old, insecure, deprecated version of SMB/CIFS⁵ that can not be upgraded⁶

For some background, the folders that we have exposed on this server hold tempoary video files. We have users who go outside the office and come back with video files that they need to edit. The files don't usually need to stick around more than a few days, but sometimes accidents happen and the files get deleted before the project is done or someone accidentally deletes someone else's project, or a project needs to be redone days after it was originally declared finished because an error was discovered or a late-change needed to be made. Our goal is to back up everything in the work folders so that if something is deleted it can be restored.

Since these folders are on a Windows machine and password protected, we'll need a place to put our credentials so that we can reference them later. Log in as your user and create a file to hold the credentials. You can call the file anything you want, but I called mine .smbcredentials. Edit the file using your favorite text editor and add the following, substituting your real username and password for credentials that have access to the folders you're working with:

~/.smbcredentials
user=username
password=mYRe@llySecur3p@ssw0rd!*

Now that we can authenticate to the Windows server, it's time to connect to some folders!

We'll start by modifying your /etc/fstab file⁷ with your favorite text editor. We'll add one line. It looks complicated⁸, but we'll break it down.

/etc/fstab
//127.0.0.1/dir1 /media/dir1 cifs iocharset=utf8,credentials=/home/user/.smbcredentials,dir_mode=0555,gid=1000,vers=2.0 0 0

//127.0.0.1/dir1 - The location of the folder we're trying to access
/media/dir1 - The location where we're going to attach the folder to our local file system
cifs - The filesystem we're connecting to. Since this is a CIFS share, we can specify 'cifs'
iocharset=utf8 - This is the character set to use when connecting to the server, basically so that filenames don't get mangled. You can read more about it in a lot of places online, so I won't go into it here.
credentials=/home/user/.smbcredentials - This is the path to the file where we stored our credentials to connect to the server. You want to change the path to wherever it is that you saved the credentials to (i.e. put in the correct user name)
dir_mode=0555 - Specifies the permissions on the mounted directories. Basically 0555 means that the system has read and execute permissions, but not write permissions. This is so that we can't accidentally delete anything on the remote filesystem (say, if we misconfigure something or make a typo).
gid=1000 - This Group ID of the user that will be assigned to this mapped directory. There are lots of ways you can figure out what the Group your user name is in. If you don't know the ID number of the group you're in, you can run the id command and it will list the groups your user is in along with the corresponding group ID number
vers=2.0 - Specifies that we're using version 2.0 of the protocol. If you have to use the older version (vers=1.0 you might also need to add sec=ntlm)
0 0 - Options for Dump and Pass. In short, this tells the system to ignore the directory if we do a backup with the dump command, and to ignore the directory if we have to do an fsck for some reason.

Whew!

Once the line is added, it's time to test it by attempting to mount the folder we just defined.

mount /media/dir1

If there are errors, you'll have to check the logs (by using something like dmesg) check the error messages, correct the problems and try again. Assuming that there were no errors, you can check that /media/dir1 actually has what you would expect in it by doing something like ls /media/dir1 and inspecting the output. If it all checks out, add lines to define the rest of the folders you want to connect to. Once that's done, you can mount them individually or do something like mount -a to mount everything. I'd recommend against rebooting to mount all the drives, just in case one or more of them doesn't work for some reason. It's not a fatal error, and can be fixed, but it will slow your boot way down while it waits for these file systems to time out.

Preparing the backup drives

For my purposes, a couple of large cheap(ish) USB drives is sufficient for backing up this data⁹. Since this is going into a Windows shop, it makes sense to format these as NTFS, which can be done with any Windows computer I can get my hands on¹⁰. Once the're formatted, we can plug them into the backup computer.

We're not going to delve too deep into it, but when you plug a drive into a Linux or Unix-like system it gets assigned a designation like /dev/sda1. There is a problem, though that sometimes the device names move around depending on where a device is plugged in and when they're activated in the boot process. Ubuntu has settled on using UUIDs that should theoretically never change. To find the UUID of your USB devices under Ubuntu, you can use the blkid command and you'll see output similar to the following

# blkid
/dev/sda2: UUID="fa340e2e-8649-4fc6-bb70-ca1922d6f8b4" TYPE="ext4" PARTUUID="b44a0c90-bda1-4ac4-b1ea-bab8f2089e83"
/dev/sdb1: LABEL="My Passport" UUID="10E6A2CCE6A2B200" TYPE="ntfs" PTTYPE="atari" PARTLABEL="My Passport" PARTUUID="09942f7a-e315-454f-be40-a588781edad3"
/dev/sdc1: LABEL="My Book" UUID="C6B04293B0428A3F" TYPE="ntfs" PTTYPE="atari" PARTUUID="00021365-01"

In this listing /dev/sda2 is my system drive, so I don't need to worry about that for now. /dev/sdb1 is the first partition on one of my USB drives and /dev/sdc1 is the first partition of my other USB drive. We'll note the UUID of each of the drives and edit the /etc/fstab and add two more lines:

/etc/fstab
UUID=10E6A2CCE6A2B200   /media/external ntfs    defaults,nofail
UUID=C6B04293B0428A3F /media/external2 ntfs defaults,nofail

The options are a little bit different here:

UUID=XXXXXXXXXXXXXXXX the UUID of the device identified above
/media/external - Where we want to attach the storage in our filesystem
ntfs - The filesystem of the drives. In this case, ntfs, which is included with a lot of Linux distributions these days, and should be safe to write to
defaults,nofail - use 'default' options (default options as defined by your distribution maintainer, may need tweaking by you, but for my needs this is sufficient). nofail is supposed to tell the system to not check if the device actually exists or not at mount time, i.e. if the drive isn't plugged in when the system is turned on

Check that the external drives mount by issuing mount commands for each of the new drives: mount /media/external and mount /media/external2. At this point I thought about changing ownership of the folder to my 'user' user, but when I went to check the mounted folders it turned out that this wasn't necessary. Linux can read and write NTFS, but it can't change around the permissions, so that's fine.

Backup scripts

If you don't know what a shell script is, it's basically a file that has a list of commands to run in order to do a thing. I'm admittedly not much of a scripter¹¹, and I'm sure that these could be written better.

I broke these up into three scripts (which seemed like a good idea at the time). backup.sh, purge.sh, and copy.sh.

backup.sh
rm results.txt
touch results.txt
./copy.sh >> results.txt
./purge.sh >> results.txt
mutt -s "Backup Results" user@example.com < results.txt

Let's run through the steps:

rm results.txt - Remove results.txt from the last time the backup ran
touch results.txt - Recreate results.txt
./copy.sh >> results.txt - Run copy.sh in the current folder and put the output in the file results.txt
./purge.sh >> results.txt - Run purge.sh in the current folder and put the output in the file results.txt
mutt -s "Backup Results" user@example.com < results.txt - Take results.txt and use mutt to email it to user@example.com using the subject line "Backup Results"

copy.sh
rsync -vruWh /media/dir1/ /media/external/dir1/
...
df -h /media/dir1

The copy.sh actually contains more lines, one for each directory that we're backing up, but all of them are basically identical. They use rsync with the following options:

-v - Increases verbosity. It outputs a list of files and directories that it works on
-r - Recursive. Recursively works down the filesystem and copy all of the files and folders inside other folders to make sure that we get everything in the source
-u - Update. This option updates the copy of the backed up file only if the file on the remote side has a created-by time that is newer, otherwise it skips the file. Useful so that we don't waste resources copying over files that haven't changed
-W - Transfer the whole file. I had ended up with some corrupted files, but it may have been for unrelated reasons. So instead of copying file deltas, I just had it copy the whole file again if it needed to be updated
-h - Output everything in a human-readable format. Useful for logs so that I see things like 1.0GB instead of 8589934592

df -h /media/dir1 - Uses df -h to generate a human-readable report of how much space is left on our backup drive. Once that fills up I know it's time to rotate to the other drive (I do this by editing the backup scripts to start backing up to /media/external2/dir1/ and changing the last line to df -h /media/dir2).

The purge.sh is to handle a folder with special circumstances. The folder it watches is one that contains a lot of temp files that it doesn't really make sense to keep a copy of long term (think temporary project files generated by an editor). purge.sh script contains the following:

purge.sh
find /media/dir1/tempfiles/ -type f -mtime +15 -exec rm {} \;

What this does is to use the find command to search /media/dir1 for any file that's older than 15 days old, and then execute the rm command on them.

Once the scripts are done, I tested them by making them executable: chmod +x backup.sh copy.sh purge.sh, and then testing them one at a time to make sure that they do what I want them to do. First by running copy.sh, then purge.sh and then backup.sh. It might be wise to touch results.txt to create the results file, just in case the backup.sh script fails because it can't remove something that isn't there.

Scheduling

Assuming that everything worked, it's time to run it on a regular schedule so I don't have remember to log in to the server every time I want to refresh the backup. To do that, we'll use cron¹². If you don't know, cron is a utility to do things in intervals that you define. In my case I want to run the backup scripts hourly. That should give enough time that the scripts will finish running before they're started up again, and it will also give enough coverage that I should be able restore a file that was accidentally deleted even if it's only been there for an hour. Editing the cron schedule for a user is as follows: Log in as your non-root user (in this case, user, and enter the command crontab -e to edit your own cron configuration. I added the following line to mine.

crontab
@hourly /home/user/backup.sh

This line runs the backup.sh backup script once an hour. The backup scripts put all their results into a file and then email that file to me so that I can keep tabs on what they're doing, and I have time to worry about something else.

Footnotes

Because what could go wrong?
Technically I was able to SSH into the old box and retrieve all the configuration files, so the process was actually fairly quick, but for now we'll pretend that I did everything from scratch.
Sendmail is the mail server that does the actual sending, and mutt is the user agent with prepares the email messages. You can swap out either piece if you want to, but you're on your own if you do that.
It will hopefully make more sense later, but when one drive fills up, I like to start the other one and rotate them. Each rotation takes about six months in our current setup.
SMB stands for 'Server Message Block' and is how most Windows servers' 'shared folders' works. It's also known as CIFS, the Common Internet File System, sometimes.
At least, it can't be updated by me. This machine I'm using is technically owned by a vendor and to hear them tell it, I'm very lucky they let me have login credentials to it all.
Warning: editing your /etc/fstab file is potentially dangerous, and can make your system ubootable if you make a mistake. Be careful!
Because it is
You have to be careful about going too cheap. Some of the really cheap drives have write speeds that are so slow that they're almost useless for this kind of activity. I'm looking at you, WD Easystore
You can also create NTFS volumes under Linux. It's pretty easy if you use something like gparted, but that's beyond the scope of this document.
There are books and courses that are dedicated to shell scripting, and I highly encourage you to check them out if you want to tell my how badly I scripted these things
Yes, I know that could use systemd timers for this, but I didn't do that.
Cron is a very powerful tool and I'm barely scratching the surface of what it can do here. I encourage you to learn about what it can do so you can tell me a better way to do this.