kafsemo.org

Large File Reminder

2005-04-02

As “large” files (that is, files larger than 2^31, or sometimes 2^32, bytes) become more common, it’s increasingly important to write code that handles these cases. (Another 32-bit limit, Unix’s time_t expiring in a little over thirty years, is a more complicated case. For a revealing misinteraction, Solaris for SPARC has a 64-bit kernel but, at least up to version 8, had most of its userspace built as 32-bit. Give a file a datestamp after 2038 and then try to rm it without ‘-f’.)

Writing large file-aware code in Java requires a little attention to detail, too: most I/O APIs use 64-bit longs for length, but some use ints. Even if you simply narrow all longs to ints, everything works without problem right up until you hit your first large file. At that point? Impossible to say in general. Maybe you get a nonsensical progress indicator, or maybe it just breaks.

Mixing integral types highlights one quirk of Java’s type system:

int count = 0;
long r;
...
r = skip(x);
...
count = count + r; /* This line will not compile */
...
count += r;        /* This compiles without warning */

It’s an easy mistake, to assume that a += b is a literal shorthand for a = a + b; in Java, as the spec says, it is shorthand for a = (T)(a + b) (where T is the type of a), which in this case means that the long is silently narrowed into an int.

One C# feature not yet available in Java is the ‘checked’ keyword, which turns any numeric overflow into an exception, rather than wrapping silently. This identifies the problem, rather than fixing it, but seems like a much better alternative to silent corruption. With 128-bit file systems now shipping, maybe it would be simpler to just change language?

(Music: Millencolin, “No Cigar”)
(More from this year, or the front page? [K])