UP | HOME

Distributed systems

Distributed systems are useful for various purposes, but the common/achievable niceties are:

1 Existing systems

There is quite a few of them; I am going to write mostly about those which work over Internet, though mesh networks which are based on lower levels are interesting, too. Grid computing systems, like BOINC, are nice as well, but it is not about them. I have not tried much, and there is more of such systems around, but here is a little overview.

There's also a related Wikipedia category.

1.1 Search

YaCy and a few more (some of which are dead by now) distributed search engines exist. I have only tried YaCy, and it works, though it is not easy to find its technical documentation.

1.2 Generic networks

Tor and I2P: both support "hidden services", on top of which regular protocols could be used, but it is more about privacy (and a bit about routing), than about decentralization: they provide NAT traversal and static addresses, but that's it. Tor documentation is relatively nice, and there is plenty of I2P docs. Tor provides a nice C client, I2P uses Java – what makes Tor much easier to install, at least.

1.3 Mesh networks

Some mesh networks, like Telehash, provide routing as well, though advantages for decentralization seem to be similar to those of Tor and I2P; just better in that they extend it beyond internet. Telehash documentation is also pretty nice and full of references.

1.4 IM and other social services

  • Tox implements its own network (DHT, onion routing, NAT traversal, etc), and has some documentation. Works, though not particularly easy to build, and toxic (apparently the primary implementation) ceases to work after a few days here, requiring a restart.
  • Rival Messenger and Bleep are based on Telehash and BitTorrent, respectively. Have not tried those.
  • RetroShare provides a bunch of things, but with a web-based UI, and I gave up on building it.
  • Matrix seems to be getting relatively popular, but uses HTTP APIs, the specification is not available without JS, mentions IoT, there are SDKs (I wonder whether it's ever a useful thing to provide an SDK instead of a single documented library; usually it's just additional pain to work with), web-based clients, etc – seems to be pretty unpleasant overall, following poor practices.
  • Ricochet reuses Tor network, its protocol is documented and doesn't seem to be bloated. Unfortunately, it's bundled with GUI, apparently there is no separate library, and it's in C++ anyway, what would make bindings harder if there was one. Probably it wouldn't be that hard to reimplement (or to extract the non-GUI code bits and make C bindings, to get a reusable library).
  • Other IMs: there is a nice comparison of privacy-oriented IMs, file sharing services, and social networks on the secushare website.
  • Other social networking tools: there is a wiki comparison of those.

1.5 File sharing

BitTorrent, of course, with Mainline DHT.

IPFS seems to be getting, well, maybe not popular, but mentioned here and there. There are papers and it is documented, but the implementations are currently in Go (reference), JS (incomplete), and Python (started). So, that would involve setting the whole Go thing to try.

Freenet is a distributed data store, apparently not very interactive. Or maybe it is; it's in Java, and I didn't try it myself.

1.6 Cryptocurrencies

Plenty of those popped up recently. Looks like quite a waste of electricity and hardware to me, yet the idea itself is interesting.

1.7 Decentralized/federated systems

Some common systems, like XMPP, and even email, are decentralized, though the latter has a few huge centers nowadays, because of spammers, big companies, and lazy users. It is not exactly what this note is about, but perhaps worth mentioning.

1.8 GNUnet

Not sure how to classify it, but here are some links: gnunet.org, wiki://GNUnet, A Secure and Resilent Communication Infrastructure for Decentralized Networking Applications. Seems promising, but tricky to build, to figure how it all works, and to do anything with it now (a lack of documentation seems to be the primary issue, though probably there's more).

Taler and secushare (using PSYC) are getting built on top of it, but it's not clear how's it going, how abandoned or alive it is, etc. Their documentation also seems to be obsolete/outdated/abandoned/incomplete.

1.9 Generic protocols

There are more or less generic network protocols that may be used on top of e.g. Tor, to get usable distributed services.

1.9.1 Plan 9's 9P

9P seems to be very nice: it's simple, documented, and generic. Security in Plan 9 is also nice. It's somewhat future-proof, though has certain limitations (though it's a rabbit hole to try to make something distant-future-proof). Alas, there's not much of software that supports it, and hacking authentication usable for distributed services into it might be tricky.

1.9.2 SSH

SSH is quite nice and layered. But apparently its authentication is not designed for distributed systems (such as distributed IMs or file sharing), its connection layer looks rather bloated, and generally it's not particularly simple. Those are small bits of a large protocol, but they seem to make it not quite usable for peer-to-peer communication.

2 Ad hoc messaging

Pretty much every distributed IM tries to reinvent everything, and virtually none are satisfactory, but at least some of the problems are already solved separately, and there are:

  • iptables and plain TCP or Tor (which supports transparent proxying, does encryption, and helps to preserve privacy) for routing.
  • TLS/SSH/OpenPGP/SASL/GSSAPI/OTR/Noise/etc for encryption and authentication.
  • netcat, socat, plumber, pipes, etc for sorting/composing/testing those; also rlwrap to make cli programs such as netcat usable for chatting.
  • IRC, PSYC, XMPP, SMTP, 9P for authentication/messaging/data transfer.
  • Whole multi-user and distributed operating systems for all kinds of interaction. And QEMU to isolate those a bit (though VM escaping is not unheard of, so that's just for friend-to-friend activities).

So, maybe just setting Tor, XMPP/PSYC/IRC/9P, and then using OTR is the way to go. Or SSH with tweaked authentication, and a bunch of isolated programs. Or a very simple custom protocol with those. There's a lot of ways to set it for oneself, but it's still a problem to communicate with or send files to arbitrary users without centralized (at best, controlled by you) systems.

3 Yet another protocol (?)

There's a few reasons to not design yet another protocol:

  • As with standards in general, it doesn't look like a good idea.
  • There is the "don't roll your own crypto" rule/argument, which makes sense, and restricts what one can do when it comes to software that even uses existing cryptographic libraries, or reimplements existing algorithms, or even protocols.
  • When there are similar projects, it might be better to contribute to those instead of starting from scratch.

Though it's not hard to make counter-arguments to those:

  • The existing ones are not satisfactory.
  • There are nice and flexible existing libraries, such as Noise protocol framework and OTR. Besides, the argument itself may sound less compelling since we write other kinds of critical software anyway: applying the same reasoning to everything is rather impractical, but otherwise it's inconsistent.
  • The existing projects tend to be underdocumented and/or bloated to the point where it's easier to start from scratch.

So, in a hypothetical situation of actually doing that, it seems that a usable system can be created by:

  • Documenting everything.
  • Implementing a C library to make the bindings viable.
  • Keeping it very simple to make reimplementations and understanding easy.
  • Implementing CLI tools for debug and scripting.

Keeping the layers well-separated and reusing existing protocols whenever possible should help to achieve that. Perhaps the first step should be a protocol to use on top of e.g. Tor, that would simply do mutual authentication: that alone would be quite helpful, and could be used to create unix domain sockets named using fingerprints; from that point, it'd be rather easy to reuse that layer in pretty much any scenario.

4 Users

Distributed systems, particularly when used for social activities, require users – so that there would be somebody to send messages to in case of an IM. That's quite a problem, since even by sticking to federated networks it is easy to lose or decrease contact with most people one knows, even if those are tech savvy; apparently it's even easier when moving to distributed and less common ones. Even programmers are quite terrible at using computers, unfortunately.