An occasional outlet for my thoughts on life, technology, motorcycles, backpacking, kayaking, skydiving...

Wednesday, June 11, 2008

cron safe bash script for checking URLs (pages) for updates

I needed to monitor a government data source for updates. Of course being a government institution, they haven't caught on to the RSS/ATOM concept so the normal tools for this will not work. (There really should be a "Google Alert" option for this.) This is the simple bash script I whipped up to serve the purpose.

#!/usr/bin/env bash
# - intended to be call via crontab (call with -v for verbose) ##

# If you want to override the standard temp dir, define it here.
export TMPDIR=$HOME/tmp

# List the pages you wish to check below, one per line, between the EOFs
pages=`cat < < EOF
for page in $pages; do
hash=$(echo "$page"|md5sum |sed 's/[^a-z0-9]//gi')
basename=$(basename $0)
[[ "$1" = "-v" ]] && echo "Checking $page"
if [[ ! -f "$file" ]]; then
file=$(mktemp -t $name)
echo "Creating cache $file ..."
curl --stderr /dev/null "$page" > $file
[[ "$1" = "-v" ]] && echo "Checking against cache file $file ..."
curl --stderr /dev/null "$page"| diff -u $file - > $diff || (echo -e "Found a change in $page\n"; cat $diff; echo "")
patch $file $diff
[[ "$1" = "-v" ]] && echo "Done!"

# vim:ft=sh

And to run this script every hour, I modify my crontab (via: "crontab -e") and make it look like so:
#min hour day/m month day/w command
0 */1 * * * /home/rbronosky/bin/pagecheck

# vi:syntax=crontab:ts=8:tw=0

Enjoy! I hope someone else finds this useful.