Lightweight web browsers

While huge web browsers such as Firefox tend to handle web applications, they also tend to consume a lot of resources even for a modern computer (and just be unusable on older hardware), to have the same bugs for decades, and to regularly get new ones. Using simple web browsers for simple HTML documents is a solution, and here are my notes on such browsers, as well as on technologies used to build those.

Early browsers

HTML was pretty simple initially (and perhaps by itself, without JS and CSS, it's still manageable), and so were web browsers. HTML history covers some of it.

Textual browsers

These can be relatively simple, since they offload much of work (i.e., text rendering) to terminal emulators and/or Emacs. But there often are drawbacks too, such as poor support or no support of embedded images (inherent in mostly-textual environments) and HTML forms (particularly for XML-based documents, such as XHTML and HTML5; often due to poor ad hoc parsing). While in a world of non-bloated technologies it would be fine for most of the browsing, illustrations and structured input are still useful sometimes. Perhaps the best support of those I've seen is in eww; other common textual browsers are lynx and w3m, and I wrote pancake, a pandoc-based one.

Graphical browsers

links is an interesting project, and there is links project documentation. Many bits in it seem to be quite ad hoc, including parsing (which also fails on XHTML/HTML5 forms, and has special cases for a few websites), and some optimisations look like an overkill, sacrificing convenience and flexibility (e.g., using hardcoded glyphs), but it appears to work faster over Internet (with 10-20 ms network latency) than major browsers would even with local files. It can use one of a few drivers (terminal, X, SVGAlib, DirectFB, SDL, etc) to work in different environments.

NetSurf

NetSurf is another interesting project. Much of its functionality is split into reusable libraries. As links, it uses a framebuffer abstraction (LibNSFB), a faulty hand-written parser (Hubbub), has Unicode-related warts. Unlike links, it uses native GUI for menus and such when built for GTK, and libcurl. It has a bunch of additional issues (partially broken TLS, poor CSS support and no way in sight to disable/override it, poor word wrapping, etc), but still fun to poke and to read its sources, which are not in a particularly bad shape.

Other graphical browsers

There's not many others that I know of: there was dillo, but its domain name had expired, and its sourceforge mirror is broken. Then there are ones that reuse huge bloated engines, and just HTML renderers such as tkHTML (though that one is discontinued, and was used for the Html Viewer 3 web browser). KHTML and GtkHTML seem to be quite large, and not sure if they are used in any maintained web browsers (not counting WebKit-based ones), or are still active themselves.

Technologies

While HTTP can be handled by libcurl, and proper streaming parsing of XML-based documents, as well as of potentially messy and SGML-based HTML ones, can be achieved with libxml2 (or some other parser), the most challenging part seems to be GUI.

GUI toolkits such as GTK+ 3 can handle text rendering (with Pango in case of GTK, which even handles markup similar to the initial HTML version on its own), inputs, and image rendering, but not quite in a way needed for a web browser: one can, for instance, just put labels and other standard widgets on a window (perhaps forgetting about right-to-left scripts), but not with text still being selectable as normally expected. Drawing inputs on top of a single label may be viable, but it would still be hard to leave spaces that wouldn't be counted in text selection (and likely would require forking underlying libraries, which are pretty large too).

Then there's xlib, with little and dated support for text rendering (and less portable than GTK), and if one is going to render texts separately, it gets pretty close to using a framebuffer abstraction (on top of X, SDL, Linux framebuffer, etc), as graphical web browsers tend to do.

The libraries relevant to text rendering are HarfBuzz (complex text rendering), Fontconfig (font selection), FreeType (glyph rendering), Pango (mostly bookkeeping and ease of use on top of those). Unicode itself can be quite a pain to deal with without such libraries.

With Pango and GTK (and cairo), text rendering without labels would involve something like the GTK custom drawing example (and/or a custom widget) with addition of Pango-related functions such as those used in gtklabel.c, e.g.:

PangoLayout *layout = gtk_widget_create_pango_layout(widget, "foo bar");
GtkStyleContext *styleCtx = gtk_widget_get_style_context(widget);
gtk_render_layout(styleCtx, cr, x, y, layout);
gtk_widget_queue_draw(widget);

And then one would have to implement selection, carefully splitting texts into boxes and mixing those with boxes for inputs and images (netsurf source code seems to be a fine example of that, except for its word wrapping issues), perhaps try to handle screenreaders and bidirectional text, etc. Pango can handle word wrapping, among other things. Although with GTK, regular labels and a custom layout widget for inline grouping can be used as well. Generally it's not trivial to implement a nice and somewhat complete web browser GUI, but seems doable.

Complexity sources

Much of complexity (everything around text rendering and processing) comes from writing systems, hence on a large scale it's accidental complexity. HTML (and typographic) elements themselves seem to augment the language similarly to punctuation, and perhaps wouldn't be as useful with a constructed language akin to Lojban (though parsing, highlighting, and special rendering for certain constructs would still be useful for humans to skim/scan documents). Likely even such a language with a simplified writing system wouldn't be an optimal way to convey information from a two-dimensional surface to a human, and it's far from viable.