Wednesday, July 14, 2010

A quick and dirty bash script to periodically clean out a static html mirror of a dynamic site

I use nginx to flatten my site to html buffers on disk just in case varnish ever crashes i can warm up the cache with the last known good coppies of stuff. However to make sure that we don't constantly serve old pages i move the files from a "fresh" folder to a "stale" folder every so often. I do some tricky stuff in my nginx config to check the upstream server first. If that fails (is overloaded) then check the "fresh" folder and THEN if worst comes to worst check the "stale" server and serve from there. At least the user doesn't see an error page. Just slightly older content.

Anyway I will post the nginx config soon. for now here the new pure bash version of the cleanup script. I posted a perl version a little while back. I think this version is faster.

#!/bin/bash
MINS=30 #the age threshold in minutes at which point the file is moved to stale
DIR=/cache/fresh #starting directory
NEWDIR=/cache/stale #directory to move to

cd $DIR
for file in `find . -type f -mmin +$MINS`
do

#we need to get the directory name holding our file and remove the leading . so that it is just /foo/bar
backup_dir=$(dirname $file| sed 's/^\.\///')

#we don't actually need to use the basename of the file unless we are going to move it to a different directory inside the stale directory than it was inside the fresh directory
#file_name=$(basename $file)

# the following just makes sure to skip the . directory if the find command picks it up
dot="."
if [ $backup_dir = $dot ] ; then
backup_dir=""
fi

#lets check to see if the supposed new directory is already in the stale folder as a "file" rather than a directory. sometimes when using pretty urls or a combination of pretty urls and non pretty urls we can end up with files that should be directories. just because a files ends in .php on disk doesnt mean that it doesnt have virtual "subdirectories" under it when viewed via the web ex: /page.php/1/ which in our flat html version should be /page.php/1/index.html so lets delete the file and replace it with a directory.
if [ -f $NEWDIR/$backup_dir ] ; then
rm -f $NEWDIR/$backup_dir
fi
mkdir -p $NEWDIR/$backup_dir

#if we were moving our stale cache to another server we could create the directory by issuing remote ssh commands ex
#ssh testaccount@192.168.10.15 mkdir -p $DIR/$backup_dir

#mv $file $NEWDIR/$backup_dir #mv is coughing on files with spaces in the name. I could spend the time to regex escape the special characters in the filename but why bother when rsync works just fine
rsync --stats -auvz --remove-sent-files --times -og $file $NEWDIR/$backup_dir
rm -f $file
done