A while ago I set up a machine to back up some files and folders on our network for my Real Job™. It was running an old release of Ubuntu Linux, and I had a slow day, so I figured I'd update to the latest point release. I didn't take a backup first even though I knew better1. The upgrade failed and left the machine unbootable. I was able to get it booting again, but it was in a broken state. Could it have been fixed? Maybe. But it was faster to just blow it away and re-do the whole thing.2
Downloading and preparing the media
For this install I went with the current LTS version of Ubuntu Server 20.04. I chose this for two reasons: 1. The server previously ran Ubuntu server before I trashed it, and 2. I'm most familiar with Debian-based distributions. This howdid can be adapted to just about any Unix-like OS, but you're going to have to do some legwork if you want to go that route.
I'm not going to go into writing the .iso to a disk or go through the installer, otherwise this will be way too long. Use whatever tools you have available to you to prepare your installation media and install it on your computer. I used essentially all the default choices. You do want to make sure that you have SSH Server installed if you think that you're going to be accessing this thing remotely, if not, don't worry about it. You'll also want to create a user account. It doesn't really matter a lot what the user account is, but you may want to make sure that the user's Full Name is filled out appropriately, since we'll be sending email from that account later on.
Installing additional packages
The only additional package I installed were sendmail and mutt.
apt install sendmail mutt
This is so that the server can send me notifications when it's done3. This is all handled over the internal network, so sendmail's default configuration is fine. I did have to contact our network team to have them allow this computer to be able to relay mail through their mail server.
Laying the groundwork
The first thing to do is to create some mountpoints. These are going to be the places where we're going to attach the Windows shared folders. I also have two USB drives that I'm going to use to backup the data to4, and I need four additional mountpoints for each of the Windows folders that I'm backing up. Ubuntu generally seems like it wants to put mounted drives in the /media
folder, so let's oblige.
eithersudo
orsu to root
# mkdir /media/external # mkdir /media/external2 # mkdir /media/dir1 # mkdir /media/dir2 # mkdir /media/dir3
dir1
dir2
, and dir3
arent' really what I named them. I gave them meaningful names that mirror what folders they're backing up. external
and external2
are for the external USB drives that I'm going to use.
The server is a relatively ancient Windows Server, and it still serves out shared files using the old, insecure, deprecated version of SMB/CIFS5 that can not be upgraded6
For some background, the folders that we have exposed on this server hold tempoary video files. We have users who go outside the office and come back with video files that they need to edit. The files don't usually need to stick around more than a few days, but sometimes accidents happen and the files get deleted before the project is done or someone accidentally deletes someone else's project, or a project needs to be redone days after it was originally declared finished because an error was discovered or a late-change needed to be made. Our goal is to back up everything in the work folders so that if something is deleted it can be restored.
Since these folders are on a Windows machine and password protected, we'll need a place to put our credentials so that we can reference them later. Log in as your user and create a file to hold the credentials. You can call the file anything you want, but I called mine .smbcredentials
. Edit the file using your favorite text editor and add the following, substituting your real username and password for credentials that have access to the folders you're working with:
~/.smbcredentials
user=username
password=mYRe@llySecur3p@ssw0rd!*
Now that we can authenticate to the Windows server, it's time to connect to some folders!
We'll start by modifying your /etc/fstab
file7 with your favorite text editor. We'll add one line. It looks complicated8, but we'll break it down.
/etc/fstab
//127.0.0.1/dir1 /media/dir1 cifs iocharset=utf8,credentials=/home/user/.smbcredentials,dir_mode=0555,gid=1000,vers=2.0 0 0
-
//127.0.0.1/dir1
- The location of the folder we're trying to access -
/media/dir1
- The location where we're going to attach the folder to our local file system -
cifs
- The filesystem we're connecting to. Since this is a CIFS share, we can specify 'cifs' -
iocharset=utf8
- This is the character set to use when connecting to the server, basically so that filenames don't get mangled. You can read more about it in a lot of places online, so I won't go into it here. -
credentials=/home/user/.smbcredentials
- This is the path to the file where we stored our credentials to connect to the server. You want to change the path to wherever it is that you saved the credentials to (i.e. put in the correct user name) -
dir_mode=0555
- Specifies the permissions on the mounted directories. Basically0555
means that the system hasread
andexecute
permissions, but not write permissions. This is so that we can't accidentally delete anything on the remote filesystem (say, if we misconfigure something or make a typo). -
gid=1000
- This Group ID of the user that will be assigned to this mapped directory. There are lots of ways you can figure out what the Group your user name is in. If you don't know the ID number of the group you're in, you can run theid
command and it will list the groups your user is in along with the corresponding group ID number -
vers=2.0
- Specifies that we're using version 2.0 of the protocol. If you have to use the older version (vers=1.0
you might also need to addsec=ntlm
) -
0 0
- Options for Dump and Pass. In short, this tells the system to ignore the directory if we do a backup with thedump
command, and to ignore the directory if we have to do anfsck
for some reason.
Whew!
Once the line is added, it's time to test it by attempting to mount the folder we just defined.
mount /media/dir1
If there are errors, you'll have to check the logs (by using something like dmesg
) check the error messages, correct the problems and try again. Assuming that there were no errors, you can check that /media/dir1
actually has what you would expect in it by doing something like ls /media/dir1
and inspecting the output. If it all checks out, add lines to define the rest of the folders you want to connect to. Once that's done, you can mount them individually or do something like mount -a
to mount everything. I'd recommend against rebooting to mount all the drives, just in case one or more of them doesn't work for some reason. It's not a fatal error, and can be fixed, but it will slow your boot way down while it waits for these file systems to time out.
Preparing the backup drives
For my purposes, a couple of large cheap(ish) USB drives is sufficient for backing up this data9. Since this is going into a Windows shop, it makes sense to format these as NTFS, which can be done with any Windows computer I can get my hands on10. Once the're formatted, we can plug them into the backup computer.
We're not going to delve too deep into it, but when you plug a drive into a Linux or Unix-like system it gets assigned a designation like /dev/sda1
. There is a problem, though that sometimes the device names move around depending on where a device is plugged in and when they're activated in the boot process. Ubuntu has settled on using UUID
s that should theoretically never change. To find the UUID
of your USB devices under Ubuntu, you can use the blkid
command and you'll see output similar to the following
# blkid
/dev/sda2: UUID="fa340e2e-8649-4fc6-bb70-ca1922d6f8b4" TYPE="ext4" PARTUUID="b44a0c90-bda1-4ac4-b1ea-bab8f2089e83"
/dev/sdb1: LABEL="My Passport" UUID="10E6A2CCE6A2B200" TYPE="ntfs" PTTYPE="atari" PARTLABEL="My Passport" PARTUUID="09942f7a-e315-454f-be40-a588781edad3"
/dev/sdc1: LABEL="My Book" UUID="C6B04293B0428A3F" TYPE="ntfs" PTTYPE="atari" PARTUUID="00021365-01"
In this listing /dev/sda2
is my system drive, so I don't need to worry about that for now. /dev/sdb1
is the first partition on one of my USB drives and /dev/sdc1
is the first partition of my other USB drive. We'll note the UUID
of each of the drives and edit the /etc/fstab
and add two more lines:
/etc/fstab
UUID=10E6A2CCE6A2B200 /media/external ntfs defaults,nofail
UUID=C6B04293B0428A3F /media/external2 ntfs defaults,nofail
The options are a little bit different here:
-
UUID=XXXXXXXXXXXXXXXX
theUUID
of the device identified above -
/media/external
- Where we want to attach the storage in our filesystem -
ntfs
- The filesystem of the drives. In this case, ntfs, which is included with a lot of Linux distributions these days, and should be safe to write to -
defaults,nofail
- use 'default' options (default options as defined by your distribution maintainer, may need tweaking by you, but for my needs this is sufficient).nofail
is supposed to tell the system to not check if the device actually exists or not at mount time, i.e. if the drive isn't plugged in when the system is turned on
Check that the external drives mount by issuing mount
commands for each of the new drives: mount /media/external
and mount /media/external2
. At this point I thought about changing ownership of the folder to my 'user' user, but when I went to check the mounted folders it turned out that this wasn't necessary. Linux can read and write NTFS, but it can't change around the permissions, so that's fine.
Backup scripts
If you don't know what a shell script is, it's basically a file that has a list of commands to run in order to do a thing. I'm admittedly not much of a scripter11, and I'm sure that these could be written better.
I broke these up into three scripts (which seemed like a good idea at the time). backup.sh
, purge.sh
, and copy.sh
.
backup.sh
rm results.txt
touch results.txt
./copy.sh >> results.txt
./purge.sh >> results.txt
mutt -s "Backup Results" user@example.com < results.txt
Let's run through the steps:
-
rm results.txt
- Removeresults.txt
from the last time the backup ran -
touch results.txt
- Recreateresults.txt
-
./copy.sh >> results.txt
- Runcopy.sh
in the current folder and put the output in the fileresults.txt
-
./purge.sh >> results.txt
- Runpurge.sh
in the current folder and put the output in the fileresults.txt
-
mutt -s "Backup Results" user@example.com < results.txt
- Takeresults.txt
and usemutt
to email it touser@example.com
using the subject line"Backup Results"
copy.sh
rsync -vruWh /media/dir1/ /media/external/dir1/
...
df -h /media/dir1
The copy.sh
actually contains more lines, one for each directory that we're backing up, but all of them are basically identical. They use rsync
with the following options:
-
-v
- Increases verbosity. It outputs a list of files and directories that it works on -
-r
- Recursive. Recursively works down the filesystem and copy all of the files and folders inside other folders to make sure that we get everything in the source -
-u
- Update. This option updates the copy of the backed up file only if the file on the remote side has a created-by time that is newer, otherwise it skips the file. Useful so that we don't waste resources copying over files that haven't changed -
-W
- Transfer the whole file. I had ended up with some corrupted files, but it may have been for unrelated reasons. So instead of copying file deltas, I just had it copy the whole file again if it needed to be updated -
-h
- Output everything in a human-readable format. Useful for logs so that I see things like1.0GB
instead of8589934592
-
df -h /media/dir1
- Usesdf -h
to generate a human-readable report of how much space is left on our backup drive. Once that fills up I know it's time to rotate to the other drive (I do this by editing the backup scripts to start backing up to/media/external2/dir1/
and changing the last line todf -h /media/dir2
).
The purge.sh
is to handle a folder with special circumstances. The folder it watches is one that contains a lot of temp files that it doesn't really make sense to keep a copy of long term (think temporary project files generated by an editor). purge.sh
script contains the following:
purge.sh
find /media/dir1/tempfiles/ -type f -mtime +15 -exec rm {} \;
What this does is to use the find
command to search /media/dir1
for any file that's older than 15 days old, and then execute the rm
command on them.
Once the scripts are done, I tested them by making them executable: chmod +x backup.sh copy.sh purge.sh
, and then testing them one at a time to make sure that they do what I want them to do. First by running copy.sh
, then purge.sh
and then backup.sh
. It might be wise to touch results.txt
to create the results file, just in case the backup.sh
script fails because it can't remove something that isn't there.
Scheduling
Assuming that everything worked, it's time to run it on a regular schedule so I don't have remember to log in to the server every time I want to refresh the backup. To do that, we'll use cron
12. If you don't know, cron
is a utility to do things in intervals that you define. In my case I want to run the backup scripts hourly. That should give enough time that the scripts will finish running before they're started up again, and it will also give enough coverage that I should be able restore a file that was accidentally deleted even if it's only been there for an hour. Editing the cron
schedule for a user is as follows: Log in as your non-root user (in this case, user
, and enter the command crontab -e
to edit your own cron configuration
. I added the following line to mine.
crontab
@hourly /home/user/backup.sh
This line runs the backup.sh
backup script once an hour. The backup scripts put all their results into a file and then email that file to me so that I can keep tabs on what they're doing, and I have time to worry about something else.