Distributed systems

Distributed systems are useful for various purposes, but the common/achievable niceties are:

These are mostly shared with federated systems, but take it a bit further.

1 Existing systems

There is quite a few of them; I am going to write mostly about those which work over Internet, though mesh networks which are based on lower levels are interesting, too. Grid computing systems, like BOINC, are nice as well, but it is not about them. There's also a related Wikipedia category.

1.1 Generic networks

Tor and I2P: both support "hidden services", on top of which regular protocols could be used, but it is more about privacy (and a bit about routing) than about decentralisation: they provide NAT traversal and static addresses, but that's it. Tor documentation is relatively nice, and there is plenty of I2P docs. Tor provides a nice C client, I2P uses Java – what makes Tor considerably easier to install.

1.2 Mesh networks

Some mesh networks, like Telehash, provide routing as well, though advantages for decentralisation seem to be similar to those of Tor and I2P; just better in that they extend it beyond internet. Telehash documentation is also pretty nice and full of references.

Cjdns (or its name, at least) seems to be relatively well-known, but it relies on node.js.

Netsukuku and B.A.T.M.A.N. are two more protocols the names of which are known.

1.3 IM and other social services

  • Tox implements its own network (DHT, onion routing, NAT traversal, etc), and has some documentation. Works, though not particularly easy to build, and toxic (apparently the primary implementation) ceases to work after a few days here, requiring a restart.
  • Rival Messenger and Bleep are based on Telehash and BitTorrent, respectively. Have not tried those.
  • RetroShare provides a bunch of things, but with a web-based UI, and I gave up on building it.
  • Matrix seems to be getting relatively popular, but uses HTTP APIs, the specification is not available without JS, there are SDKs (I wonder whether it's ever a useful thing to provide an SDK instead of a single documented library; usually it's just additional pain to work with), web-based clients, etc – seems to be pretty unpleasant overall, following poor practices. And it's rather federated, not strictly distributed; functionality-wise, it's like XMPP with a few XEPs crammed into the core.
  • Ricochet reuses Tor network, its protocol is documented and doesn't seem to be bloated. Unfortunately, it's bundled with GUI, apparently there is no separate library, and it's in C++ anyway, what would make bindings harder if there was one. Probably it wouldn't be that hard to reimplement (or to extract the non-GUI code bits and make C bindings, to get a reusable library).
  • XMPP is quite nice and is supported relatively widely (with quite a choice of servers, clients, and libraries), but also not strictly distributed – though can be used as such, see below ("ad hoc messaging").
  • Email: likewise, but using it in a distributed fashion wouldn't be interoperable with common deployments in most cases, and some software may assume a federated setting.
  • ActivityPub: federated, replaces OStatus, supported by Mastodon (which seems to be getting popular); used for both microblogging and private messaging. RDF-based, W3 recommendation. Hence good specification, and generally doesn't look bad, but the specification doesn't include authentication and authorization as of now (January 2018), and the existing implementations seem to be all awkward: rather poor web UIs, languages such as JS.
  • Other IMs: there is a nice comparison of privacy-oriented IMs, file sharing services, and social networks on the secushare website.
  • Other social networking tools: there is a wiki comparison of those.

1.4 File sharing and websites

  • BitTorrent, of course, with Mainline DHT.
  • IPFS seems to be getting, well, maybe not popular, but mentioned here and there. There are papers and it is documented, but the implementations are currently in Go (reference), JS (incomplete), and Python (started). So, that would involve setting the whole Go thing to try, but the IPFS whitepaper looks nice. There's plenty of documentation, and a few separate parts (which can be and are isolated into libraries; though would be more helpful if they were actually reusable C libraries), but they still are a part of a single project, which is not small or simple.
  • Freenet is a distributed data store, apparently not very interactive. Or maybe it is; it's in Java, and I didn't try it myself.
  • ZeroNet: haven't tried it, and it's in Python, but apparently it's popular enough to at least mention. Apparently it doesn't care much about security (a HN thread).
  • HTTP/rsync/Gopher/whatever over onion (or otherwise a static address): Tor allows anyone to host a server accessible from outside by a fixed address, and there's plenty of tools to work with common protocols.
  • Gnutella: see below.

1.5 Search

1.5.1 Web crawling

YaCy and a few more (some of which are dead by now) distributed search engines exist. I have only tried YaCy, and it works, though haven't managed to find its technical documentation – so it's not clear how it works.

1.5.2 Other information

These networks include search for files, but by their names, not content-addressable (so they can't be easily verified, what brings additional challenges).

  • Gnutella again: used for file sharing, with query-based search (an unstructured system, as opposed to DHT-based and content-addressable structured ones). Somewhat limited and hardly secure/reliable for search, but seemed to work in practice.

Related papers:

1.6 Cryptocurrencies

Plenty of those popped up recently. Looks like quite a waste of electricity and hardware to me, yet the idea itself is interesting.

1.7 GNUnet

Not sure how to classify it, but here are some links: gnunet.org, wiki://GNUnet, A Secure and Resilent Communication Infrastructure for Decentralized Networking Applications. Seems promising, but tricky to build, to figure how it all works, and to do anything with it now (a lack of documentation seems to be the primary issue, though probably there's more).

Taler and secushare (using PSYC) are getting built on top of it, but it's not clear how's it going, how abandoned or alive it is, etc. Their documentation also seems to be obsolete/outdated/abandoned/incomplete. Update (January 2018): apparently secushare prototype won't be released this year.

1.8 Generic protocols

There are more or less generic network protocols that may be used on top of e.g. Tor, to get usable distributed services.

1.8.1 Plan 9's 9P

9P looks nice: it's simple, documented, and generic. Security in Plan 9 is also nice. It's somewhat future-proof, though has certain limitations (though it's a rabbit hole to try to make something distant-future-proof). There's not much of software that supports it, and hacking authentication usable for distributed services into it might be tricky.

1.8.2 SSH

SSH is quite nice and layered. But apparently its authentication is not designed for distributed systems (such as distributed IMs or file sharing), its connection layer looks rather bloated, and generally it's not particularly simple. Those are small bits of a large protocol, but they seem to make it not quite usable for peer-to-peer communication.

1.8.3 TLS

TLS may provide mutual authentication, and there is plenty of tools to work with it.

2 Ad hoc messaging

Pretty much every distributed IM tries to reinvent everything, and virtually none are satisfactory, but at least some of the problems are already solved separately, and there are:

  • iptables and plain TCP or Tor (which supports transparent proxying, does encryption, and helps to preserve privacy) for routing.
  • TLS/SSH/OpenPGP/SASL/GSSAPI/OTR/Noise/etc for encryption and authentication. TLS and OpenPGP are perhaps the most easily usable in an ad hoc setting.
  • netcat, socat, plumber, pipes, etc for sorting/composing/testing those; also rlwrap to make cli programs such as netcat usable for chatting.
  • IRC, PSYC, XMPP, SMTP, 9P for authentication/messaging/data transfer.
  • Whole multi-user and distributed operating systems for all kinds of interaction. And QEMU to isolate those a bit (though VM escaping is not unheard of, so that's just for friend-to-friend activities).

Combinations such as Tor + TLS + rlwrap may serve as ad hoc IMs rather easily. Well, if participants are willing and able to use those, what often isn't the case. In an attempt to try and facilitate things like that, I wrote TLSd and a P2P IM with a libpurple plugin as one of the examples.

And my another attempt to get distributed messaging with existing tools is XMPP over Tor.

3 Users

Distributed systems, particularly when used for social activities, require users – so that there would be somebody to send messages to in case of an IM. That's quite a problem, since even by sticking to federated networks it is easy to lose or decrease contact with most people one knows, even if those are tech savvy; apparently it's even easier when moving to distributed and less common ones.

In my experience so far, for personal communication it works like this:

  • Centralised systems: dozens of contacts, occasionally finding new ones.
  • Decentralised/federated systems: a few contacts and decreasing.
  • Distributed systems: you alone.

Apart from not being backed by a huge company for marketing and not having a ton of users using them already, probably the primary issue for non-centralised systems is that there's usually little focus on finding new contacts and establishing new relationships.

Though XMPP MUCs and that microblogging fediverse thing (see OStatus above) may be suitable for it; not distributed, but at least closer to it. A nicer approach would be to use distributed search, but I'm not aware of generic/suitable systems of that kind (see 1.5): they tend to either be an easy target for spammers, or require existing social connections in a system (although some may be suitable still).

4 Search (1.5), FOAF, and the rest of RDF

Some kind of a distributed search/directory may connect small peer-to-peer islands into a usable network. While it is hard to decide on an algorithm, lists of known and/or somewhat trusted nodes are common for both structured and unstructured networks, as well as for use of social graphs: if those would be provided by peers, a client may decide by itself what algorithm to apply. This reduces the task to just including known nodes into local directory entries, which can be shipped over any other protocols (e.g., HTTP over Tor).

Knowledge representation, which is needed for a generic directory structure, is tricky, but there is RDF (resource description framework) already. And there even is FOAF (friend of a friend ontology), specifically for describing persons, their relationships (including linking the persons they know), and other social things. There's a FOAF search engine, too: it's centralised and searches mostly in large centralised places, but such an engine can relatively easily be set locally, searching for users and information they provide in a distributed, peer-to-peer network. See also: Semantic Web.