I originally wrote Clay Pigeon as a quick exercise in turning feeds into data. Getting what I need from Atom is thirty-six lines of Python, using libxml2 to parse and evaluate xpath. Throw in a few tests, then move on to the storage and front end.
I wanted to see how far that design decision would take me: a simple implementation limited to well-formed Atom feeds. I was shocked (shocked!) to find non-well-formed feeds, and tweaked the logging to act with a little less surprise.
But then, there are still sites out there without Atom feeds. So an extra thirty-one lines of code and a quick import of the Universal Feed Parser and it’s up to speed again, this time with RSS support but sadly no Hot RSS. Take a look if you want to visualise your posting schedule.
There’s a nice bit of duck typing in feedparser.py
: as the argument
url_file_stream_or_string
to
open_resource
suggests, you can pass in pretty
much anything as a source. Anything with a read
method is used directly,
otherwise it’s tried as a URL, a file and then data. If it was opened from a URL,
the stream object will have a url
property. I’ve got
my content in a string. So:
stream = StringIO.StringIO(s) stream.url = base d = feedparser.parse(stream)
This means I can manage my own file handling but also get relative URLs resolved against the correct base. Compare with Java’s InputSource, which pulls the same trick with a static type system.