An rsync conundrum

rsync kept copying files that I knew hadn't changed, but I wasn't sure why.

A couple of weeks ago I accidentally destroyed and then rebuilt a server is responsible for using rsync to back up some video clips from a Windows file server1. If you don't want to read that article, the gist is that rsync runs once an hour, syncs up some folders, and then emails me a list of the files it synced and how full the backup drive is so I know when to rotate it out.

I don't usually study the list of files that are in the email because they don't mean all that much and the list changes every time the script runs anyway, so there's usually no point. But I noticed something unusual over the last few days. I noticed that the top of the list always had the same folder in it. A folder with the date '2-22' in the name2.

I started with the usual suspects. The script is supposed to delete a text file, recreate it, and then add a bunch of stuff to it and email it to me. I checked the timestamp on the text file and that had today's date on it as expected.

Next I ran the backup script by hand instead of letting cron handle it. As expected, the same handful of files were transferred over to the backup drive. And if I ran the script again immediately? The files were transferred again!

Now I'm starting to get confused3. Maybe the drive is full? The hourly report shows that there's plenty of space on the drive, but maybe it's looking at the wrong drive or something?

df -h /media/external/
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdc1       1.9T  1.1T  741G  61% /media/oldexternal

Nope, there's plenty of room left on the drive, and if I cd into the local drive I can see that all the files are there (pathnames are obfuscated)

ls -l /media/external/somefolder
...snip...
-rwxrwxrwx 1 root root  26443776 Feb 22 14:00 WAFF1211_01.MOV
-rwxrwxrwx 1 root root  23035904 Feb 22 14:00 WAFF1212_01.MOV
-rwxrwxrwx 1 root root  19759104 Feb 22 14:00 WAFF1213_01.MOV
...snip...

There are 35 files in total affected, but they all seem to be accounted for in both places. So I cd into the original folder and cp * /media/external/somefolder to make quadruple sure that the files are copied. They are. I run the rsync script again and it dutifully copies 35 files that already exist in two places over again.

This is getting silly.

Next, I look at my rsync options that I'm using: vruWh. I'm starting to think that the culprit is u, which is supposed to copy the file if it's been updated and W which copies the whole file instead of just the changed bits. But in order for this to be true, something would have to be modifying these 35 files constantly, and I'm not seeing any evidence of that.

To confirm my theory, I go to the source folder and run ls -l on it to verify if the files are indeed being constantly changed or something else bizarre.

ls -l /media/originalfolder
...snip...
-rwxrwxrwx 1 root user  26443776 Mar 22 14:00 WAFF1211_01.MOV
-rwxrwxrwx 1 root user  23035904 Mar 22 14:00 WAFF1212_01.MOV
-rwxrwxrwx 1 root user  19759104 Mar 22 14:00 WAFF1213_01.MOV
...snip...

And there it is! The file sizes weren't changing. Somehow the timestamps on the files had been set to March 22, 2021, i.e. two weeks from now (I'm guessing a camera with the wrong internal date set). rsync kept seeing them as new and kept determining that they were newer than the local copies which had a creation date of Feb 22, 2021. So now I needed to update the dates on all the files.

Okay, sure, I could have just logged in as root and done something like touch *, but that wouldn't have eaten up enough time or article space, so I decided on a much more search-engine-keywordy solution.

Since this is a Windows shop I was able to map the folder that I'm backing up to my local machine to a local drive letter. We'll call it R:. I need to update the 'file modified' time on these files, and you can use the built-in Windows copy command to do that by issuing the command copy /b filename.ext +,,, which is a little bit weird to look at, but it basically just copies a file onto itself, which has the side effect of resetting the 'created' time to be whatever time you ran the command. This is great for one file, but I have 35 files, and I don't want to do that much typing4

So I need a list of all the files in the directory without all that extraneous info like dates and sizes. The solution: dir /b for a 'bare' directory listing. And we can redirect that to a text file by using dir /b > files.txt. So we now have a text file with all the files in the folder, each on their own line.

The next step is to convert this into a batch file5. I used Notepad++ for this, but you could use whatever you want. First I used the Search → replace, checked the regular expressions option and searched for ^ (which is the start of the line) and replace it with copy /b   (note the extra space at the end) which puts the copy /b at the beginning of every line. Then I changed the Search Mode to Extended and searched for \r\n to search for each newline character6 and replaced it with  +,,\r\n (note the preceeding space), which replaces the crlf with  crlf, so we have the arguments on each line. That leaves us with a file containing 35 commands, each updating the timestamp on one file

files.txt
...snip...
copy /b WAFF1211_01.MOV +,,
copy /b WAFF1212_01.MOV +,,
copy /b WAFF1213_01.MOV +,,
...snip...

Now we just need to change the file to a batch file: ren files.txt files.bat, run it, and watch the magic.

R:\files.bat
...snip...
R:\>copy /b WAFF1200_01.MOV.ewc2 +,,
        1 file(s) copied.
...snip...
(Repeat for 34 other affected files)

Now running ls -l or dir (depending on what side you're checking from), shows that the files now have today's date on them, and rsync no longer thinks that they're newer and has stopped copying them

Success!

Footnotes

  1. Using rsync on Ubuntu Linux to back up some Windows shares
  2. It only took me two weeks to notice it
  3. Some may think that I exist in a perpetual state of confusion. I will neither confirm nor deny that at this time.
  4. I mean, yeah, I'm taking the long way to get to the solution, but I do have some standards
  5. Yes, I know you could have done this with a Powershell one-liner. I'm suitably impressed.
  6. Windows uses both a carriage return and a linefeed, instead of just a carriage return implying a linefeed. If you don't know what a carriage is (not the horse-drawn kind) or a line-feed, go find your nearest old geek and ask. Or do a web search, I guess, but that's less fun


Read more articles · Go back to the homepage