These are my notes on authoring notes in HTML, authored in HTML. As usual, I am following my web design checklist, but also targeting semantic web here.
HTML 5 with RDFa and XML syntax (since HTML 5 ate XHTML, turning it into an alternative concrete syntax) is a particularly machine-friendly combination: thanks to RDF, it is rich with metadata; thanks to XML (and XSLT), it is pretty good for processing; thanks to HTML 5, it is fine for semantic document structuring. Authoring directly in it—as opposed to using DITA or other markup languages—provides most flexibility and control.
As for drawbacks, the paragraphs are still annoying to deal with (and so
are the hyperlinks, though that's common), as inherent in SGML. Also the
correspondence between HTML 5 sectioning and RDF vocabularies is
unobvious: while I consider this HTML document to be a note—or an
article—and defining it as such, it is suggested to use
the article
element in HTML 5 for complete chunks of
information, implying that just a part of an HTML document is an
article, which may lead to contradictory semantics. Besides, it would
involve additional nesting, which makes editing even more awkward
(html
, body
, section
,
and p
are already there most of the time; quite a lot to
place regular text directly in a document).
One should be careful with CURIEs – that is, read that
section. I have spent some time debugging a document, mostly
because of skipping it. Other than that, it's pretty
simple, as can be seen in this page's source: multiple
vocabularies can be used rather easily, and there is a
choice of terms. Document metadata goes into
the head
element, the rest gets embedded
via RDFa attributes.
Some of the HTML-specific metadata (see standard metadata names) is redundant while there is RDFa, yet some software may rely on it, so it might be worthwhile to cover.
Mixing duplicate attributes such as rel
and property
leads to strange results, so perhaps it's
better to avoid. Though in some cases the HTML ones should be used
together with RDFa ones: for instance, the link
elements
must have href
attributes, so one gets limited to single
plain URIs without prefixes in those, while they are the primary way
to set document metadata – yet the property
attribute is
still handy to use.
Explicit and semantic sectioning is neat, but leads to a couple of
issues. Firstly, as mentioned above, the correspondence of those
semantics to the RDFa ones is tricky to establish: this
document is an article, with article metadata defined for it,
so it doesn't make much sense to add an article
element
into its body, or to turn it into a wrapper document. Secondly,
editing becomes more awkward with additional nesting: for instance,
this text is indented with 10 spaces, with 2 spaces per level. Not a
big deal, but SGML editing is relatively poor as it is, so it doesn't
encourage to introduce more nesting.
Those issues, combined with apparent lack of software handling, make me to wonder whether it's worth using at all. But they should still be handy for software processing, so I'll try to use them for now.
Those also look neat on the first sight: one can put creation and modification dates there, license information, navigational links (just a "home" link would be sufficient for this website). But those are common enough for client software to deal with them; otherwise it's like bloating the documents, but marking the bloat, so that it can be removed.
A header
is still handy for a title and a foreword
though, so I'm using it here. While footer
is not for
conclusions, but for mostly unrelated bits and metadata.
On the bright side, finally I can continue an outer section after closing an inner one, or even between those – without adding dummy sections. Though web browsers without CSS are not likely to make it visually distinguishable.
Hyperlinks make HTML editing awkward, so I've hacked together the html-wysiwyg minor mode.
There is duplicate data in the documents, too much to write it manually each time. A skeleton document can be used, but it may get tricky to introduce global changes into the resulting documents then (though still possible to do reliably, since the data is structured). So I've composed an XSLT to translate a simpler XML into the resulting files, and published it in my homepage repository, along with XSLTs to produce indexes and atom feeds. Work with file paths gets a bit awkward with those.
Paragraphs are annoying to compose, but not sure if there's a reliable way to detect and mark those automatically. Though inserting them is easy in the emacs html-mode: likely because of the annoyance, their insertion is bound to C-c RET by default. "Skeleton commands" in general are handy when there is repetition.
Setting fill-column to 80 in .dir-locals.el
helps to compensate for the nesting-caused indentation.
Static and compiler-assisted code highlighting based on Emacs major modes, as it is in org-mode's HTML export, would be tricky to get.
As with other technologies, it is useful to inspect history in order to understand it better. A history of HTML, www-talk archive, CERN 2019 WorldWideWeb Rebuild, initial HTML tags, HTML specification draft, RFC 1866 (HTML 2.0), HTML history in Wikipedia are helpful for that, though unfortunately the period when it was getting shaped (particularly when images and forms were introduced, between 1992 and 1995) is missing from the www-talk archive.