2008 / urlwatch - a tool for monitoring webpages for updates

urlwatch - a tool for monitoring webpages for updates

This script is intended to help you watch URLs and get notified (via email or in your terminal) of any changes. The change notification will include the URL that has changed and a unified diff of what has changed.

The script supports the use of a filtering hook function to strip trivially-varying elements of a webpage.

Basic features

  • Simple configuration (text file, one URL per line)
  • Easily hackable (clean Python implementation)
  • Can run as a cronjob and mail changes to you
  • Always outputs only plaintext - no HTML mails :)
  • Supports removing noise (always-changing website parts)
  • Example hooks to filter content in Python
  • Uses If-Modified-Since header to save bandwidth (new in 1.9)
[image: urlwatch logo]

Download

Official Debian package (by Franck Joncourt)

Package information: http://packages.debian.org/urlwatch

If you have sid repositories enabled, you can install urlwatch via:

    apt-get install urlwatch

Source tarball

You can download the source tarball of urlwatch here:

Advanced features

  • Clean up "bad" HTML (long lines, etc..) with python-utidylib
  • Convert iCalendar files (*.ics) to plaintext using ical2text
  • Convert HTML to plaintext using lynx, html2text or a regex
  • Watch output of shell commands (new in 1.9)

3rd party patches / Contributions

License

urlwatch is released under the terms of the BSD license

Code repository

You can follow development of urlwatch here (please get in touch if this link is broken, as this is not a permanent repository URL):

    git clone http://khan.thpinfo.com/~thp/urlwatch.git

Information about the User-Agent

Since version 1.3, urlwatch now sends a better User-Agent string. More information about this User-agent string can be found on this page.

Thomas Perl (thp at this domain), jabber: thp@jabber.org