Distributed systems

There are quite a few definitions of "distributed" and "decentralized" in use, in this note I'm using the following ones:

Centralized
Clients interacting with a single server (either physical or controlled by the same entity).
Decentralized
Clients interacting with multiple servers (controlled by different entities), which often build a federated network.
Distributed
Clients interacting with other clients directly, acting as servers themselves.

Distributed systems are useful for various purposes, but the common/achievable niceties are:

These are mostly shared with federated systems, but take it further.

The common advantages of centralized systems over these seem to be search/discovery, often sort-of-free hosting for end users, greater UI uniformity in some cases, easier/faster introduction of new features.

Usable systems

Actually usable (reliably working, specified, having users and decent software) systems so far are usually federated/decentralized; those can, in principle, be quite close to distributed systems (simply by setting their servers on user machines). So, generally it seems more useful to focus on those if the intention is to get things done: SMTP (email), NNTP (Usenet), XMPP (jabber), and HTTP (World Wide Web) are relatively well-supported, standardized, and usable for various kinds of communication.

Sometimes even centralized but non-commercial projects and services are okay: OpenStreetMap, The Internet Archive, Wikimedia Foundation projects (Wikipedia, Wiktionary, Wikidata, Wikibooks, etc), arXiv, FLOSS projects, possibly LibGen and Sci-Hub (though they infringe copyright). As long as they are easy (and legal, and free) to fork and aren't in a position to extort users, centralization can be fine. Conversely, there can be technically distributed systems effectively controlled by a single entity (e.g., a distributed PKI with a single root, or anything legally restricted). While this note is mostly about distributed network protocols, they are neither necessary nor sufficient for a community control over a system, but rather just may be a useful tool to achieve it.

Existing systems

There is quite a few of them; I am going to write mostly about those which work over Internet. There's also a related Wikipedia category.

Generic networks

Tor and I2P: both support "hidden services", on top of which many regular protocols can be used, but it is more about privacy (and a bit about routing) than about decentralisation: they provide NAT traversal, encryption, and static addresses, but that's it: the benefits for decentralisation (excluding privacy) are basically the same as in IPv6 with IPsec. Tor documentation is relatively nice, and there are I2P docs. Tor provides a nice C client, I2P uses Java.

Mesh networks

Some mesh networks, like Telehash, provide routing as well, though advantages for decentralisation seem to be similar to those of Tor and I2P; just better in that they extend it beyond Internet. Telehash documentation is also pretty nice and full of references.

Cjdns (or its name, at least) seems to be relatively well-known, but it relies on node.js. Netsukuku and B.A.T.M.A.N. are two more protocols the names of which are known.

Those would be nice to get someday, but they would require quite a lot of users to function, and various government restrictions seem to complicate their usage (this varies from jurisdiction to jurisdiction and from year to year, but seems to be pretty bad in Russia in 2018).

IM and other social services

See also: Distributed state and network topologies in chat systems.

File sharing and websites

Web crawling

YaCy and a few more (some of which are dead by now) distributed search engines exist. I have only tried YaCy, and it works, though haven't managed to find its technical documentation – so it's not clear how it works.

Other information

These networks include search for files, but by their names, not content-addressable (so they can't be easily verified, which brings additional challenges).

Related papers:

Cryptocurrencies

Plenty of those popped up recently. Bitcoin-like ones (usually with a proof of work and block chaining) look like quite a waste of resources (and perhaps a pyramid scheme) to me, though the idea itself is interesting. I was rather interested in "digital cash" payment systems before, but those didn't quite take off so far.

As of 2021, bitcoin-like cryptocurrencies seem to be eating other distributed projects: many of those are merged with their custom cryptocurrencies, or occasionally piggyback on existing ones, but either way they become more complicated and commercialized. As of 2022, the "crypto" clipping seems to be associated more widely with cryptocurrencies and related technologies than with cryptography in general.

General P2P networking tools

GNUnet

Not sure how to classify it, but here are some links: gnunet.org, wiki://GNUnet, A Secure and Resilent Communication Infrastructure for Decentralized Networking Applications. Seems promising, but tricky to build, to figure how it all works, and to do anything with it now (a lack of documentation seems to be the primary issue, though probably there's more).

Taler and secushare (using PSYC) are getting built on top of it, but it's not clear how's it going, how abandoned or alive it is, etc. Their documentation also seems to be obsolete/outdated/abandoned/incomplete. Update (January 2018): apparently secushare prototype won't be released this year.

libp2p

libp2p apparently provides common primitives needed for peer-to-peer networking in the presence of NATs and other obstructions. At the time of writing there's no C API (so it's only usable from a few languages) and its website is quite broken. At the same time worldwide IPv6 adoption reaches more than 32%, so possibly NATs will disappear before workarounds will become usable.

Generic protocols

There are more or less generic network protocols that may be used, possibly together with Tor, to get working and secure peer-to-peer services.

SSH

SSH is quite nice and layered. Apparently its authentication is not designed for distributed systems (such as distributed IMs or file sharing), its connection layer looks rather bloated, and generally it's not particularly simple. Those are small bits of a large protocol, but they seem to make it not quite usable for peer-to-peer communication.

TLS

TLS may provide mutual authentication, and there are readily available tools to work with it.

IPsec

Has uses similar to TLS, but is a generally better way to solve the same problems. IPv6 (or, rather, individual addresses) is needed for its wide deployment though. IPv6 gets adopted, but slowly. Once computers will become addressable individually (again), and transport layer encryption will be there by default, it should render plenty of the contemporary higher-level network protocols obsolete.

Ad hoc messaging

Pretty much every distributed IM tries to reinvent everything, and virtually none are satisfactory, but at least some of the problems are already solved separately, and there are:

Combinations such as Tor + TLS + rlwrap may serve as ad hoc IMs rather easily. Well, if participants are willing and able to use those, which often isn't the case. In an attempt to try and facilitate things like that, I wrote TLSd and a P2P IM with a libpurple plugin as one of the examples.

Users

Distributed systems, particularly when used for social activities, require users – so that there would be somebody to send messages to in case of an IM. That's quite a problem, since even by sticking to federated networks it is easy to lose or decrease contact with most people one knows, even if those are tech savvy; apparently it's even easier when moving to distributed and less common ones.

In my experience so far, for personal communication it works like this:

Apart from not being backed by a huge company for marketing and not having a lot of users using them already, probably the primary issue for non-centralised systems is that there's usually little focus on finding new contacts and establishing new relationships.

Though XMPP MUCs and that microblogging fediverse thing (see OStatus above) may be suitable for it; not distributed, but at least closer to it. A nicer approach would be to use distributed search, but I'm not aware of generic/suitable systems of that kind: they tend to either be an easy target for spammers, or require existing social connections in a system (although some may be suitable still).

Search, FOAF, and the rest of RDF

Some kind of a distributed search/directory may connect small peer-to-peer islands into a usable network. While it is hard to decide on an algorithm, lists of known and/or somewhat trusted nodes are common for both structured and unstructured networks, as well as for use of social graphs: if those would be provided by peers, a client may decide by itself which algorithm to apply. This reduces the task to just including known nodes into local directory entries, which can be shipped over any other protocols (e.g., HTTP, possibly over Tor).

Knowledge representation, which is needed for a generic directory structure, is tricky, but there is RDF (resource description framework) already. There is FOAF (friend of a friend ontology), specifically for describing persons, their relationships (including linking the persons they know), and other social things. A basic FOAF search engine must be fairly straightforward to set: basically a triple store filled with FOAF data. See also: Semantic Web.

Weather data

Except for common messaging and file sharing, one of the distributed (or at least federated) system applications I keep considering is weather data sharing: it'd be useful, and it's quite different from those other applications.

Weather data is commonly of interest to people, and it's right out there, not encumbered by patents or copyright laws, just has to be measured and distributed. But commercial organizations working on that try to extract some profit, so they don't simply share that data with anyone for free. There are state agencies too, paid out of taxes, but at least in Russia apparently you can't easily get weather data out of it either -- only a lot of bureaucracy, and even if it was possible, there are many awkward custom formats and ways to access the data, which won't make a reliable system. People sharing this data with each other would solve that problem.

Though there's at least one nice exception: the Norwegian Meteorological Institute shares weather data freely and for the whole globe.

The challenges/requirements also differ from those with messaging or file sharing, since there's a lot of data regularly updated by many people, and potentially being requested many times, but confidentiality isn't needed. There already are protocols somewhat suitable for that: NNTP (which is occasionally used for weather broadcasts, just in a free form), DNS, and IRC explicitly aim relaying; SMTP (with mailing lists) and XMPP (with pubsub) may be suitable too, possibly with ad hoc relaying.

For reference, as of 2022 there are about 1200 cities with a population of more than 500 thousand people; individual hourly measurements from each of those would constitute a message per 3 seconds. Wouldn't harm to have more than one weather station per city, to cover smaller cities, and so on, but the order seems to be manageable even with modest resources and without much of caching or relaying, assuming that there are not too many clients receiving all the data just as it arrives.

The links/peering can be set manually, and/or data can be signed (DNSSEC, OpenPGP, etc) and verified by end users with a PKI/WOT; the former may just be simpler, and appears to work in practice.

Collaboration/coordination/organization is likely to be tricky, though possible: plenty of people contribute their computing resources to BOINC projects, OONI, file sharing networks, and so on. But weather collection is different in requiring special equipment (at least a temperature sensor) being set outside, complicating contribution.

See also

Not quite about collaborative protocols like those listed above, but just about distributed computing (including software design aiming multiple servers controlled by a single entity), there's a nice "A Note on Distributed Computing" paper.