An occasional outlet for my thoughts on life, technology, motorcycles, backpacking, kayaking, skydiving...

Wednesday, June 11, 2008

cron safe bash script for checking URLs (pages) for updates

I needed to monitor a government data source for updates. Of course being a government institution, they haven't caught on to the RSS/ATOM concept so the normal tools for this will not work. (There really should be a "Google Alert" option for this.) This is the simple bash script I whipped up to serve the purpose.

#!/usr/bin/env bash
# pagecheck.sh - intended to be call via crontab (call with -v for verbose) ##

# If you want to override the standard temp dir, define it here.
export TMPDIR=$HOME/tmp

# List the pages you wish to check below, one per line, between the EOFs
pages=`cat < < EOF
http://www.gadoe.org/pea_communications.aspx?ViewMode=0
http://www.gadoe.org/pea_communications.aspx?ViewMode=1&obj=1635
EOF
`
for page in $pages; do
hash=$(echo "$page"|md5sum |sed 's/[^a-z0-9]//gi')
basename=$(basename $0)
name=$basename.cache.$hash
file=$TMPDIR/$name
diff=$TMPDIR/$basename.diff
[[ "$1" = "-v" ]] && echo "Checking $page"
if [[ ! -f "$file" ]]; then
file=$(mktemp -t $name)
echo "Creating cache $file ..."
curl --stderr /dev/null "$page" > $file
else
[[ "$1" = "-v" ]] && echo "Checking against cache file $file ..."
curl --stderr /dev/null "$page"| diff -u $file - > $diff || (echo -e "Found a change in $page\n"; cat $diff; echo "")
patch $file $diff
fi
done
[[ "$1" = "-v" ]] && echo "Done!"

# vim:ft=sh


And to run this script every hour, I modify my crontab (via: "crontab -e") and make it look like so:

MAILTO=me@mydomain.com
#min hour day/m month day/w command
0 */1 * * * /home/rbronosky/bin/pagecheck

# vi:syntax=crontab:ts=8:tw=0


Enjoy! I hope someone else finds this useful.

1 comment:

Followers