kafsemo.org

Atom almost baked?

2005-04-26

(Via Tim and Sam.) The Atom working group has submitted the latest Atom Syndication Format draft as a proposed standard. This is the ninth draft under the IETF process, since July of last year. The changes show how the format has evolved, and the Atom Wiki and mailing list have kept the discussion open to the public.

One of the aims of Atom was to create a syndication format that would cope with all kinds of data. But, in doing so, it’s important that it still scales down to really simple cases. To see how it shapes up, I transliterated my current RSS feed into the format described by draft-ietf-atompub-format-08 (kafsemo.atom).

What’s different? A number of fields, optional in RSS 2.0, are mandatory here – specifically, atom:author and atom:updated. (A link to the feed’s location is strongly recommended.) I’m not sure what atom:author should be for feeds of, say, event logs, but see NumberOfAuthorsDiscussion for discussion. atom:updated seems like duplication of the HTTP Last-Modified header but, like a number of Atom’s decisions, this is acknowledgement that it’s easy to lose external metadata during a processing pipeline.

Individual entries have more mandatory fields: atom:published, atom:updated and atom:title. Again, there are cases where the value of atom:title isn’t immediately obvious. Of course, it’s always possible to derive pseudo-titles from content, but I worry that encouraging this kind of form-filling approach may devalue data.

Specifying ‘published’ and ‘updated’ as two (potentially) distinct dates is a great decision. The values chosen, and whether or not a change is considered significant, are left to the judgement of the producer. However, Atom’s only date format is a full ISO 8601 datestamp; for feeds with infrequent updates, I prefer the option of specifying only a date. Of course, I can add “T00:00:00Z” for the time part, and I appreciate the aim of requiring a point in time, rather than making consumers deal with a mix of points in time and intervals.

The ‘type’ attribute on many fields addresses one of the longest-running unresolved issues with RSS – namely, whether content is plain text or double-escaped HTML. (Embedded XHTML is also allowed, currently wrapped in a semantically null xhtml:div element.) As well, there are separate elements for content and summary. Although there were extensions for RSS that resolved these ambiguities, having it covered in the core is really going to improve the quality of feeds.

Tim has called for Atom Now, rather than more bikeshedding. (Verity Stob’s code walk-through survival guide is also insightful for presenting a proposal to a group.) If you’ve been waiting for the specification to settle before looking at it, now’s a great time: looks good to me.

(Music: Millencolin, “Hard Times”)
(More from this year, or the front page? [K])