Making a sitemap

Your website should probably have a sitemap

When I switched the guts of this site out for org mode, one of the things I liked was that it would automatically generate a sitemap without me having to do much.

Unfortunately, it generates an HTML sitemap, which is great and easy for humans with a browser to read, but it really should be an XML document1.

I checked around for ways to make an XML sitemap, and I couldn't find any good solutions, so I decided to roll my own solution. It took me six lines of scripting:

#!/usr/local/bin/bash

echo '<?xml version="1.0" encoding="UTF-8"?>' > sitemap.xml
echo '<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">' >> sitemap.xml

find -E ./ -regex '.*(articles|blockedsites|blog|books|errors|freebsd|freebsdhardware|howdid|howtointernet|inventory|uses)*/.*\.org' -not -name draft* | sed -E 's/\.org/\.html<\/loc><\/url>/g' | sed -E 's/\.\//<url><loc>https:\/\/wyrm.org\//g' >> sitemap.xml

find -E ./ -regex '.*(articles|blockedsites|blog|books|errors|freebsd|freebsdhardware|howdid|howtointernet|inventory|uses)*/.*xml' -not -name draft* | sed -E 's/\.xml/\.xml<\/loc><\/url>/g' | sed -E 's/\.\//<url><loc>https:\/\/wyrm.org\//g' >> sitemap.xml

echo '</urlset>' >> sitemap.xml

I run this from the root of the directory where the source files for this website lives on my hard drive. After putting the required header in the top of the file, it searches for any .org files it can find, uses sed to change .org to .html and get rid of the local file paths and replace them with URLs. Then the process repeats with the XML files that are still hanging around (except it doesn't change their extensions). It also ignores any directories that have projects and asides that I don't want published yet, and it ignores article drafts that aren't ready for production.

I just run it before kicking off org-publish and then it sees the new sitemap and uploads it to the site. I could2 probably integrate this into the org-publish workflow

If you want to use this, you'll probably want to modify it

This article was posted on 31 Mar 26 and I haven't looked at it since.

Footnotes:

1

Weird how I keep hearing that XML is dying, but it keeps hanging around anyway

2

read: 'should'