kafsemo.org: 2015-12-06: Using HTTP caching libraries

Efficient use of HTTP used to require a lot of custom code. However, libraries can dramatically reduce the amount of code you’re maintaining, and also allow sharing any improvements.

Efficient use of HTTP/1.1 means caching (RFC 7234), conditional requests (RFC 7232) and compressed Content-Encoding, amongst other things. For an RSS reader, I previously wrote a bulk feed downloader that took advantage of all these things in around 700 lines of perl.

A large part of that logic is included in httplib2, a great Python library that acts as an in-process caching proxy. To allow use of Python’s de facto standard Requests library, I’ve been using cachecontrol: “The httplib2 caching algorithms packaged up for use with requests.”.

Instead of:

    sess = requests.session()
    response = sess.get('http://example.com/')

write:

    sess = CacheControl(requests.session(), cache = FileCache('.web_cache'))
    response = sess.get('http://example.com/')

Hey presto; all your HTTP calls backed by a persisted cache that obeys HTTP’s rules for cache freshness. On the Java side, Apache HttpComponents does a pretty good job of the same thing.

What’s the result? Firstly; I’m down to under 400 lines (now, of Python). Secondly, and more importantly, much of my code is now in a common open source library. I can benefit from others’ fixes, and contribute my own as well.

The switch from Perl threads to Python’s concurrent.futures is also welcome.

I’ve lost some functionality from my own code. I no longer have a summary of how much bandwidth was saved due to compression at the end. However, I don’t miss it, and this kind of rewrite is a great chance to throw away behaviour that I’m not actually using.

The moral of the story is: treat self-maintained code as a liability to be reduced where possible. Unless there’s good reason, prefer de facto standard libraries, and architectures that allow small, well-defined libraries to be introduced.

(Music: Motörhead, “R.A.M.O.N.E.S.”)