Efficient use of HTTP used to require a lot of custom code. However, libraries can dramatically reduce the amount of code you’re maintaining, and also allow sharing any improvements.
Efficient use of HTTP/1.1 means caching
(RFC 7234),
conditional requests (RFC 7232) and
compressed Content-Encoding
, amongst other things. For an RSS reader,
I previously wrote
a bulk feed downloader that took advantage of all these things in around
700 lines of perl.
A large part of that logic is included in httplib2, a great Python library that acts as an in-process caching proxy. To allow use of Python’s de facto standard Requests library, I’ve been using cachecontrol: “The httplib2 caching algorithms packaged up for use with requests.”.
Instead of:
sess = requests.session() response = sess.get('http://example.com/')
write:
sess = CacheControl(requests.session(), cache = FileCache('.web_cache')) response = sess.get('http://example.com/')
Hey presto; all your HTTP calls backed by a persisted cache that obeys HTTP’s rules for cache freshness. On the Java side, Apache HttpComponents does a pretty good job of the same thing.
What’s the result? Firstly; I’m down to under 400 lines (now, of Python). Secondly, and more importantly, much of my code is now in a common open source library. I can benefit from others’ fixes, and contribute my own as well.
The switch from Perl threads to Python’s concurrent.futures is also welcome.
I’ve lost some functionality from my own code. I no longer have a summary of how much bandwidth was saved due to compression at the end. However, I don’t miss it, and this kind of rewrite is a great chance to throw away behaviour that I’m not actually using.
The moral of the story is: treat self-maintained code as a liability to be reduced where possible. Unless there’s good reason, prefer de facto standard libraries, and architectures that allow small, well-defined libraries to be introduced.