Distributed systems

There are quite a few definitions of "distributed" and "decentralized" in use, in this note I'm using the following ones:

Centralized
Clients interacting with a single server (either physical or controlled by the same entity).
Decentralized
Clients interacting with multiple servers (controlled by different entities), which often build a federated network.
Distributed
Clients interacting with other clients directly, acting as servers themselves.

Distributed systems are useful for various purposes, but the common/achievable niceties are:

These are mostly shared with federated systems, but take it further.

The common advantages of centralized systems over these seem to be search/discovery, often sort-of-free hosting for end users, greater UI uniformity in some cases, easier/faster introduction of new features.

Usable systems

Actually usable (reliably working, specified, having users and decent software) systems so far are usually federated/decentralized; those can, in principle, be quite close to distributed systems (simply by setting their servers on user machines). So, generally it seems more useful to focus on those if the intention is to get things done: SMTP (email), NNTP (Usenet), XMPP (jabber), and HTTP (World Wide Web) are relatively well-supported, standardized, and usable for various kinds of communication.

Sometimes even centralized but non-commercial projects and services are okay: OpenStreetMap, The Internet Archive, Wikimedia Foundation projects (Wikipedia, Wiktionary, Wikidata, Wikibooks, etc), arXiv, FLOSS projects, possibly LibGen and Sci-Hub (though they infringe copyright). As long as they are easy (and legal, and free) to fork and aren't in a position to extort users, centralization can be fine. Conversely, there can be technically distributed systems effectively controlled by a single entity (e.g., a distributed PKI with a single root, or anything legally restricted). While this note is mostly about distributed network protocols, they are neither necessary nor sufficient for a community control over a system, but rather just may be a useful tool to achieve it.

Existing systems

There is quite a few of them; I am going to write mostly about those which work over Internet. There's also a related Wikipedia category.

Generic networks

Tor and I2P: both support "hidden services", on top of which many regular protocols can be used, but it is more about privacy (and a bit about routing) than about decentralisation: they provide NAT traversal, encryption, and static addresses, but that's it: the benefits for decentralisation (excluding privacy) are basically the same as in IPv6 with IPsec. Tor documentation is relatively nice, and there are I2P docs. Tor provides a nice C client, I2P uses Java.

Mesh networks

Some mesh networks, like Telehash, provide routing as well, though advantages for decentralisation seem to be similar to those of Tor and I2P; just better in that they extend it beyond Internet. Telehash documentation is also pretty nice and full of references.

Cjdns (or its name, at least) seems to be relatively well-known, but it relies on node.js. Netsukuku and B.A.T.M.A.N. are two more protocols the names of which are known.

One of the large Wi-Fi mesh networking projects is Freifunk, but apparently it's only widespread in DACH countries.

Those would be nice to get someday, but they would require quite a lot of users to function, and various government restrictions seem to complicate their usage (this varies from jurisdiction to jurisdiction and from year to year, but seems to be pretty bad in Russia in 2018, and even worse by 2023).

IM and other social services

See also: Distributed state and network topologies in chat systems.

File sharing and websites

Web crawling

YaCy and a few more (some of which are dead by now) distributed search engines exist. I have only tried YaCy, and it works, though haven't managed to find its technical documentation – so it's not clear how it works.

Other information

These networks include search for files, but by their names, not content-addressable (so they can't be easily verified, which brings additional challenges).

Related papers:

Cryptocurrencies

Plenty of those popped up recently. Bitcoin-like ones (usually with a proof of work and block chaining) look like quite a waste of resources (and perhaps a pyramid scheme) to me, though the idea itself is interesting. I was rather interested in "digital cash" payment systems before, but those didn't quite take off so far.

As of 2021, bitcoin-like cryptocurrencies seem to be eating other distributed projects: many of those are merged with their custom cryptocurrencies, or occasionally piggyback on existing ones, but either way they become more complicated and commercialized. As of 2022, the "crypto" clipping seems to be associated more widely with cryptocurrencies and related technologies than with cryptography in general.

General P2P networking tools

GNUnet

Not sure how to classify it, but here are some links: gnunet.org, wiki://GNUnet, A Secure and Resilent Communication Infrastructure for Decentralized Networking Applications. Seems promising, but tricky to build, to figure how it all works, and to do anything with it now (a lack of documentation seems to be the primary issue, though probably there's more).

Taler and secushare (using PSYC) are getting built on top of it, but it's not clear how's it going, how abandoned or alive it is, etc. Their documentation also seems to be obsolete/outdated/abandoned/incomplete. Update (January 2018): apparently secushare prototype won't be released this year.

libp2p

libp2p apparently provides common primitives needed for peer-to-peer networking in the presence of NATs and other obstructions. At the time of writing there's no C API (so it's only usable from a few languages) and its website is quite broken. At the same time worldwide IPv6 adoption reaches more than 32%, so possibly NATs will disappear before workarounds will become usable.

Generic protocols

There are more or less generic network protocols that may be used, possibly together with Tor, to get working and secure peer-to-peer services.

SSH is quite nice and layered. Apparently its authentication is not designed for distributed systems (such as distributed IMs or file sharing), its connection layer looks rather bloated, and generally it's not particularly simple. Those are small bits of a large protocol, but they seem to make it not quite usable for peer-to-peer communication.

TLS may provide mutual authentication, and there are readily available tools to work with it.

IPsec uses similar to TLS, but is a generally better way to solve the same problems. Individual addresses (which IPv6 should bring) are needed to use it for P2P widely though. IPv6 gets adopted, but slowly. Once computers will become addressable individually (again), and transport layer encryption will be there by default, it may render plenty of the contemporary higher-level network protocols obsolete.

Pretty much every distributed IM tries to reinvent everything, and virtually none are satisfactory, but at least some of the problems are already solved separately: one can use dynamic DNS, Tor, or a VPN to obtain reachable addresses (even if the involved IP addresses change, and/or are behind NAT), and then use any basic/common communication protocol on top. Or even set a VM and rely on SSH access, communicating inside that system then.

Users

Distributed systems, particularly when used for social activities, require users – so that there would be somebody to send messages to in case of an IM. That's quite a problem, since even by sticking to federated protocols it is easy to lose or decrease contact with people.

Search in particular is tricky in such systems, though usually some form of communication with strangers and self-organization (e.g., via multi-user chats, web pages) is possible, so that people can find groups with shared interests. Perhaps being sociable is easier and more useful than technical solutions.

Search, FOAF, and the rest of RDF

Some kind of a distributed search/directory may connect small peer-to-peer islands into a usable network. While it is hard to decide on an algorithm, lists of known and/or somewhat trusted nodes are common for both structured and unstructured networks, as well as for use of social graphs: if those would be provided by peers, a client may decide by itself which algorithm to apply. This reduces the task to just including known nodes into local directory entries, which can be shipped over any other protocols (e.g., HTTP, possibly over Tor).

Knowledge representation, which is needed for a generic directory structure, is tricky, but there is RDF (resource description framework) already. There is FOAF (friend of a friend ontology), specifically for describing persons, their relationships (including linking the persons they know), and other social things. A basic FOAF search engine must be fairly straightforward to set: basically a triple store filled with FOAF data. See also: Semantic Web.

Weather data

Except for common messaging and file sharing, one of the distributed (or at least federated) system applications I keep considering is weather data sharing: it'd be useful, and it's quite different from those other applications.

Weather data is commonly of interest to people, and it's right out there, not encumbered by patents or copyright laws, just has to be measured and distributed. But commercial organizations working on that try to extract some profit, so they don't simply share that data with anyone for free. There are state agencies too, paid out of taxes, but at least in Russia apparently you can't easily get weather data out of it either -- only a lot of bureaucracy, and even if it was possible, there are many awkward custom formats and ways to access the data, which won't make a reliable system. People sharing this data with each other would solve that problem.

Though there's at least one nice exception: the Norwegian Meteorological Institute shares weather data freely and for the whole globe.

The challenges/requirements also differ from those with messaging or file sharing, since there's a lot of data regularly updated by many people, and potentially being requested many times, but confidentiality isn't needed. There already are protocols somewhat suitable for that: NNTP (which is occasionally used for weather broadcasts, just in a free form), DNS, and IRC explicitly aim relaying; SMTP (with mailing lists) and XMPP (with pubsub) may be suitable too, possibly with ad hoc relaying.

For reference, as of 2022 there are about 1200 cities with a population of more than 500 thousand people; individual hourly measurements from each of those would constitute a message per 3 seconds. Wouldn't harm to have more than one weather station per city, to cover smaller cities, and so on, but the order seems to be manageable even with modest resources and without much of caching or relaying, assuming that there are not too many clients receiving all the data just as it arrives.

The links/peering can be set manually, and/or data can be signed (DNSSEC, OpenPGP, etc) and verified by end users with a PKI/WOT; the former may just be simpler, and appears to work in practice.

Collaboration/coordination/organization is likely to be tricky, though possible: plenty of people contribute their computing resources to BOINC projects, OONI, file sharing networks, and so on. But weather collection is different in requiring special equipment (at least a temperature sensor) being set outside, complicating contribution.

See also

Not quite about collaborative protocols like those listed above, but just about distributed computing (including software design aiming multiple servers controlled by a single entity), there's a nice "A Note on Distributed Computing" paper.