As the W3C investigates compound document formats, ways of combining multiple formats into a single document, it’s worth bearing in mind Tim Bray’s complaint that XML specifications are often timid in scope and Norman Walsh’s argument that XML needs to become a richer, more featureful language.
The use cases and requirements state that “CDR MUST exploit existing specifications [...] wherever possible.” What could be a better spec to base this on than the well-understood Hot Comments design pattern? This is already used by most TrackBack implementations, to embed RDF in HTML, so further building on this precedent seems wise.
One of the unfortunate side-effects of HTML 4.0 mandating the use of XML is that presentation is no longer included in the same file. The attempt to solve this with external CSS files is admirable, but makes for an administrative nightmare as content and presentation must be edited separately. My humble proposal? XML+Data. It’s built on XML, guaranteeing extensibility and widespread acceptance, but also includes ideas from MIME to ensure that everyone can find something to enjoy.
It’s so simple, so easy to understand, that documenting it almost seems patronising: each comment may, optionally, contain a MIME-style entity, that may later be referred to using its declared content ID as a fragment identifier (this requires some change to the semantics of fragment identifiers, but changing as minor a spec as the URI one shouldn’t be difficult).
Let’s take a look at a simple example:
<?xml version='1.0' charset='windows-1252'?> <!--
Content-Type: text/css Content-ID: style doc { background: white; color: black; } item { display: block; }
--> <?xml-stylesheet href='#style'?> <doc xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns='http://www.w3.org/1999/xhtml'> <!--
Content-Type: text/xml Content-ID: transformation <?xml version='1.1'?> <xsl:stylesheet> <!- -
Content-Type: text/css Content-ID: style html { background: gray; color: black; }
- -> <xsl:template match='/doc'> <html xmlns='http://www.w3.org/1999/xhtml'> <head> <link rel='stylesheet' href='#style'/> </head> <body> <xsl:apply-templates select='item'/> </body> </html> </xsl:template> <xsl:template match='item'> <h1 xmlns='http://www.w3.org/1999/xhtml'><xsl:value-of select='.'/></h1> </xsl:template> </xsl:stylesheet>
--> <item>Test</item> <item>Second item</item> </doc>
These commented blocks may nest, as shown above, where the embedded XSL transformation includes its own CSS presentation. Of course, embedding in the comments block means certain sequences of characters (most notable, ‘--’) cannot appear. No problem – break them up with spaces. When decoding, ‘- -’ should be replaced with ‘--’. This means there’s no way to encoding an actual ‘- -’, but don’t worry about that. (Additionally, there’s no reason that identifiers shouldn’t nest: referring to ‘#transformation%23style’ in the outer document does exactly what one would expect.)
As a convenience, any XML namespace prefixes defined in the enclosing document are inherited by the embedded document. This might cause problems with default namespaces being inherited by non-namespace-aware documents; again, don’t worry about that. Oh, and I suppose white space rules should be sorted out, too.
One downside is that, until they’re all rewritten to support XML+Data, most editors won’t be able to highlight syntax. (Although, as a workaround, tools could split the embedded data into separate files for editing, recombining them for publishing. If you start with a great idea like this, you can usually rely on such elegant solutions to just fall into place.)
As with any standard, it’s important to get the specification clear before distracting oneself with code – very often, writing code will lead you to think about things in a different way, creating a moving target that can stand in the way of the real business of getting a spec finalised.
Although I haven’t carried out a full patent search yet, I would be happy to license this work under reasonable and non-discriminatory terms.