These are my notes on authoring notes in HTML, authored in HTML. As usual, I am following my web design checklist, but also targeting semantic web here.


HTML 5 with RDFa and XML syntax (since HTML 5 ate XHTML, turning it into an alternative concrete syntax) is a particularly machine-friendly combination: thanks to RDF, it is rich with metadata; thanks to XML (and XSLT), it is pretty good for processing; thanks to HTML 5, it is fine for semantic document structuring. Authoring directly in it—as opposed to using DITA or other markup languages—provides most flexibility and control.

As for drawbacks, the paragraphs are still annoying to deal with (and so are the hyperlinks, though that's common), as inherent in SGML. Also the correspondence between HTML 5 sectioning and RDF vocabularies is unobvious: while I consider this HTML document to be a note—or an article—and defining it as such, it is suggested to use the article element in HTML 5 for complete chunks of information, implying that just a part of an HTML document is an article, which may lead to contradictory semantics. Besides, it would involve additional nesting, which makes editing even more awkward (html, body, section, and p are already there most of the time; quite a lot to place regular text directly in a document).



One should be careful with CURIEs – that is, read that section. I have spent some time debugging a document, mostly because of skipping it. Other than that, it's pretty simple, as can be seen in this page's source: multiple vocabularies can be used rather easily, and there is a choice of terms. Document metadata goes into the head element, the rest gets embedded via RDFa attributes.


Some of the HTML-specific metadata (see standard metadata names) is redundant while there is RDFa, yet some software may rely on it, so it might be worthwhile to cover.

Mixing duplicate attributes such as rel and property leads to strange results, so perhaps it's better to avoid. Though in some cases the HTML ones should be used together with RDFa ones: for instance, the link elements must have href attributes, so one gets limited to single plain URIs without prefixes in those, while they are the primary way to set document metadata – yet the property attribute is still handy to use.


DOCTYPE is reduced to a "legacy string" now, but it's still there. And I haven't even found DTDs for HTML 5, so apparently one can't set those even if one wants to.



Explicit and semantic sectioning is neat, but leads to a couple of issues. Firstly, as mentioned above, the correspondence of those semantics to the RDFa ones is tricky to establish: this document is an article, with article metadata defined for it, so it doesn't make much sense to add an article element into its body, or to turn it into a wrapper document. Secondly, editing becomes more awkward with additional nesting: for instance, this text is indented with 10 spaces, with 2 spaces per level. Not a big deal, but SGML editing is relatively poor as it is, so it doesn't encourage to introduce more nesting.

Those issues, combined with apparent lack of software handling, make me to wonder whether it's worth using at all. But they should still be handy for software processing, so I'll try to use them for now.

Header, footer, nav

Those also look neat on the first sight: one can put creation and modification dates there, license information, navigational links (just a "home" link would be sufficient for this website). But those are common enough for client software to deal with them; otherwise it's like bloating the documents, but marking the bloat, so that it can be removed.

A header is still handy for a title and a foreword though, so I'm using it here. While footer is not for conclusions, but for mostly unrelated bits and metadata.

On the bright side, finally I can continue an outer section after closing an inner one, or even between those – without adding dummy sections. Though web browsers without CSS are not likely to make it visually distinguishable.

Editing and preprocessing

Hyperlinks make HTML editing awkward, so I've hacked together the html-wysiwyg minor mode.

There is duplicate data in the documents, too much to write it manually each time. A skeleton document can be used, but it may get tricky to introduce global changes into the resulting documents then (though still possible to do reliably, since the data is structured). So I've composed an XSLT to translate a simpler XML into the resulting files, and published it in my homepage repository, along with XSLTs to produce indexes and atom feeds. Work with file paths gets a bit awkward with those.

Paragraphs are annoying to compose, but not sure if there's a reliable way to detect and mark those automatically. Though inserting them is easy in the emacs html-mode: likely because of the annoyance, their insertion is bound to C-c RET by default. "Skeleton commands" in general are handy when there is repetition.

Setting fill-column to 80 in .dir-locals.el helps to compensate for the nesting-caused indentation.

Static and compiler-assisted code highlighting based on Emacs major modes, as it is in org-mode's HTML export, would be tricky to get.


As with other technologies, it is useful to inspect history in order to understand it better. A history of HTML, www-talk archive, CERN 2019 WorldWideWeb Rebuild, initial HTML tags, HTML specification draft, RFC 1866 (HTML 2.0), HTML history in Wikipedia are helpful for that, though unfortunately the period when it was getting shaped (particularly when images and forms were introduced, between 1992 and 1995) is missing from the www-talk archive.