kafsemo.org: 2004-12-15: URIs Are The New Strings

Remember the good old days of C++, when any system of moderate size would have at least three distinct string classes, each with its own assumptions and failings? Converting between them was never fun, but invariably essential, as each API required a specific type. It’s reassuring to see these problems repeat: for a very small Java project I’ve brought in a single class library, and now have four distinct URI classes. (That’s not a contrived example, and I’m not including internal classes. During development, that would have pushed the total past eight.) Of course, they all have limitations: java.net.URL doesn’t handle opaque URIs, java.net.URI gets the rules for normalisation, equality and relative resolution wrong, and breaks some cases that java.net.URL used to cover. Custom implementations tend to ignore the less interesting parts of the spec – IPv6 addresses, maybe Unicode – and everyone has their own opinion about file URIs.

The only real way to convert between them is via strings, throwing away static typing in the process and increasing the risk of falling foul of a substandard implementation. This makes RDF, based as it is around URI equivalence, harder than it needs to be.

RFC 2396bis is apparently now final, but I doubt that’s going to stop the spider authors who thing that URI processing is just concatenation and wishing. The tag URI spec is going through channels currently, and provides one of the best-written foundations for universal addressing that I’ve read.

Bad Concert Photography

Fifteen-second exposure:

Move around, you freaks!

(Music: Twilight Singers, “Cloudbusting”)