<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="../articles/article.xsl"?> 

<articles>
  <article date="8 Mar 2021">
    <pagetitle>Dates in mirror may be closer than they appear</pagetitle>
    <articleheader>An rsync conundrum</articleheader>
    <articleabstract>rsync kept copying files that I knew hadn't changed, but I wasn't sure why.</articleabstract>
    <articlebody>

<p>A couple of weeks ago I accidentally destroyed and then rebuilt a server is responsible for using <code>rsync</code> to back up some video clips from a Windows file server<sup class="inlinefootnote">1</sup>. If you don't want to read that article, the gist is that <code>rsync</code> runs once an hour, syncs up some folders, and then emails me a list of the files it synced and how full the backup drive is so I know when to rotate it out.</p>

<p>I don't usually study the list of files that are in the email because they don't mean all that much and the list changes every time the script runs anyway, so there's usually no point. But I noticed something unusual over the last few days. I noticed that the top of the list always had the same folder in it. A folder with the date '2-22' in the name<sup class="inlinefootnote">2</sup>.</p>

<p>I started with the usual suspects. The script is supposed to delete a text file, recreate it, and then add a bunch of stuff to it and email it to me. I checked the timestamp on the text file and that had today's date on it as expected.</p>

<p>Next I ran the backup script by hand instead of letting <code>cron</code> handle it. As expected, the same handful of files were transferred over to the backup drive. And if I ran the script again immediately? The files were transferred again!</p>

<p>Now I'm starting to get confused<sup class="inlinefootnote">3</sup>. Maybe the drive is full? The hourly report shows that there's plenty of space on the drive, but maybe it's looking at the wrong drive or something?</p>

<pre><code>df -h /media/external/
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdc1       1.9T  1.1T  741G  61% /media/oldexternal
</code></pre>

<p>Nope, there's plenty of room left on the drive, and if I <code>cd</code> into the local drive I can see that all the files are there (pathnames are obfuscated)</p>

<pre><code>ls -l /media/external/somefolder
...snip...
-rwxrwxrwx 1 root root  26443776 Feb 22 14:00 WAFF1211_01.MOV
-rwxrwxrwx 1 root root  23035904 Feb 22 14:00 WAFF1212_01.MOV
-rwxrwxrwx 1 root root  19759104 Feb 22 14:00 WAFF1213_01.MOV
...snip...
</code></pre>

<p>There are 35 files in total affected, but they all seem to be accounted for in both places. So I <code>cd</code> into the original folder and <code>cp * /media/external/somefolder</code> to make quadruple sure that the files are copied. They are. I run the <code>rsync</code> script again and it dutifully copies 35 files that already exist in two places over again.</p>

<p>This is getting silly.</p>

<p>Next, I look at my <code>rsync</code> options that I'm using: <code>vruWh</code>. I'm starting to think that the culprit is <code>u</code>, which is supposed to copy the file if it's been updated and <code>W</code> which copies the whole file instead of just the changed bits. But in order for this to be true, something would have to be modifying these 35 files constantly, and I'm not seeing any evidence of that.</p>

<p>To confirm my theory, I go to the source folder and run <code>ls -l</code> on it to verify if the files are indeed being constantly changed or something else bizarre.</p>

<pre><code>ls -l /media/originalfolder
...snip...
-rwxrwxrwx 1 root user  26443776 Mar 22 14:00 WAFF1211_01.MOV
-rwxrwxrwx 1 root user  23035904 Mar 22 14:00 WAFF1212_01.MOV
-rwxrwxrwx 1 root user  19759104 Mar 22 14:00 WAFF1213_01.MOV
...snip...
</code></pre>

<p>And there it is! The file sizes weren't changing. Somehow the timestamps on the files had been set to March 22, 2021, i.e. two weeks from now (I'm guessing a camera with the wrong internal date set). <code>rsync</code> kept seeing them as new and kept determining that they were newer than the local copies which had a creation date of Feb 22, 2021. So now I needed to update the dates on all the files.</p>

<p>Okay, sure, I could have just logged in as root and done something like <code>touch *</code>, but that wouldn't have eaten up enough time or article space, so I decided on a much more search-engine-keywordy solution.</p>

<p>Since this is a Windows shop I was able to map the folder that I'm backing up to my local machine to a local drive letter. We'll call it <code>R:</code>. I need to update the 'file modified' time on these files, and you can use the built-in Windows <code>copy</code> command to do that by issuing the command <code>copy /b filename.ext +,,</code>, which is a little bit weird to look at, but it basically just copies a file onto itself, which has the side effect of resetting the 'created' time to be whatever time you ran the command. This is great for one file, but I have 35 files, and I don't want to do that much typing<sup class="inlinefootnote">4</sup></p>

<p>So I need a list of all the files in the directory without all that extraneous info like dates and sizes. The solution: <code>dir /b</code> for a 'bare' directory listing. And we can redirect that to a text file by using <code>dir /b > files.txt</code>. So we now have a text file with all the files in the folder, each on their own line.</p>

<p>The next step is to convert this into a batch file<sup class="inlinefootnote">5</sup>. I used <code>Notepad++</code> for this, but you could use whatever you want. First I used the <code>Search &#8594; replace</code>, checked the <code>regular expressions</code> option and searched for <code>^</code> (which is the start of the line) and replace it with <code>copy /b &#160;</code> (note the extra space at the end) which puts the <code>copy /b </code> at the beginning of every line. Then I changed the Search Mode to <code>Extended</code> and searched for <code>\r\n</code> to search for each newline character<sup class="inlinefootnote">6</sup> and replaced it with <code>&#160;+,,\r\n</code> (note the preceeding space), which replaces the <code>crlf</code> with <code>&#160;crlf</code>, so we have the arguments on each line. That leaves us with a file containing 35 commands, each updating the timestamp on one file </p>

<pre>files.txt
...snip...
copy /b WAFF1211_01.MOV +,,
copy /b WAFF1212_01.MOV +,,
copy /b WAFF1213_01.MOV +,,
...snip...
</pre>

<p>Now we just need to change the file to a batch file: <code>ren files.txt files.bat</code>, run it, and watch the magic.</p>

<pre><code>R:\files.bat
...snip...
R:\>copy /b WAFF1200_01.MOV.ewc2 +,,
        1 file(s) copied.
...snip...</code>
(Repeat for 34 other affected files)
</pre>

<p>Now running <code>ls -l</code> or <code>dir</code> (depending on what side you're checking from), shows that the files now have today's date on them, and <code>rsync</code> no longer thinks that they're newer and has stopped copying them</p>

<p>Success!</p>

</articlebody>
    <footnotes>
      <footnote><a href="http://wyrm.org/howdid/rsync-cifs.xml">Using rsync on Ubuntu Linux to back up some Windows shares</a></footnote>
      <footnote>It only took me two weeks to notice it</footnote>
      <footnote>Some may think that I exist in a perpetual state of confusion. I will neither confirm nor deny that at this time.</footnote>
      <footnote>I mean, yeah, I'm taking the long way to get to the solution, but I <em>do</em> have some standards</footnote>
      <footnote>Yes, I know you could have done this with a Powershell one-liner. I'm suitably impressed.</footnote>
      <footnote>Windows uses both a carriage return and a linefeed, instead of just a carriage return implying a linefeed. If you don't know what a carriage is (not the horse-drawn kind) or a line-feed, go find your nearest old geek and ask. Or do a web search, I guess, but that's less fun</footnote>
   </footnotes>
  </article>
</articles>
