Voice conferences

Voice conferences never seemed handy to me for a number of reasons: they are strictly real-time (quite an issue if you don't maintain a sleep regimen or any of participants have anything else scheduled), "half-duplex" (in that only one person can talk at a time for the speech to be distinguishable), there's no reliable and easy way to get greppable logs (transcriptions), and unless it's combined with textual chat, there's no way to copy and paste texts, to share links or program output, etc.

They are probably good for multiplayer games, when you are busy controlling a character, but also need to coordinate actions in real-time. They may also be nice for those who are not used to reading and writing/typing, or even for a casual chat.

What puzzles me is that even programmers (who should be used to reading and writing, and do share links and other texts quite often) often seem to prefer voice communication. As well as that many software projects aim to provide voice/video conferences even before usable textual messaging/conferences.

Nevertheless, sometimes voice conferences are hard to avoid, and here are my notes on more or less usable protocols.

Requirements and concerns

A particularly nasty thing about voice communication is speaker recognition: coupled with unencrypted/unknown protocols and surveillance and/or data breaches, it can be quite uncomfortable to use. Hence my initial requirements are end-to-end encryption, an open protocol, at least an open source (preferably libre) client for GNU/Linux in existence, preferably a distributed or federated protocol.

Apparently the requirements imposed by the majority of users, which should also be taken into account in order to actually use such a protocol, are that it should be extremely easy to set and to use on various systems: not more than a few mouse clicks or touchscreen taps. Perhaps being well-known is another thing that is important to inexperienced users, since the less known things they tend to find tend to be malware even by the relaxed, non-RMS definition of malware.

And the obvious requirement for it is to work well: acceptable sound/video quality (no perceivable noise, pauses, or delays) even with not-so-good channels, perhaps NAT traversal, etc.


There's a comparison of VoIP software and a few more lists on Wikipedia, the YBTI map, and just widely known protocols. Apparently newbie users mostly think in terms of client software that implements those protocols, so the clients are even more widely known.

XMPP + Jingle
Though it might be nice, with XMPP it's tricky even to ensure textual, one-to-one message delivery: different clients and servers support different sets of XEPs, many users are not willing to configure anything, clients hide most of the settings and logs from users, users who are willing to configure or debug things can't do it with reasonable effort because of that.
SIP and H.323
Apparently those are very common for VoIP, and there's a lot of software (even Android API for SIP), but they require to set a server, and users don't hear about them that often, making it something rather obscure for them. While it's not immediately available, "let's rather not deal with it" is a possible reaction.

WebRTC is a strange protocol to find in web browsers (how's that functionality even related to web?), yet it may seem usable: NAT travesal (ICE, STUN, TURN) is present, end-to-end encryption (DTLS), voice and video conferences, supported by common web browsers for a while now, making it relatively easy to use: a single mouse click to get into a conference. It's not perfect, but open and standardised, and reuses other standards.

When I've tried to use it with a public server, there was just noise and no distinguishable words; tcpdump showed that it used a slow TURN server instead of STUN with hole punching, even though UDP hole punching worked fine between the used networks when I tried it manually, using netcat. Getting help with issues like that is tricky: there's no webrtc channel on freenode, and webrtc.org contacts only list Twitter, Google+, and Google groups (which eat messages from smaller servers rather often and seemingly randomly).

test.webrtc.org was helpful for debugging, that's how I've discovered that my ISP started putting me behind NAT. I have barely made it to pass the "Reflexive connectivity" test by acquiring a NAT-free IP address and redirecting all the UDP traffic (because there's no fixed port – so one doesn't have to set port forwarding at all, if only it worked) to my host from my router, by adding the following into /etc/config/firewall (that's on LibreCMC, should be the same with OpenWRT, and /etc/init.d/firewall restart should be invoked to apply it):

config redirect
option enabled '1'
option src 'wan'
option dest 'lan'
option proto 'udp'
option name 'all udp'
option dest_ip ''

Though it didn't quite work for sound anyway, even while passing all the tests.

Experimental/research projects
There are various experimental (or just planned) projects, some of which attempt to build on distributed systems, yet they usually aren't usable even by advanced users, if they work at all. Salsify looks interesting, though at the time of writing it only supports video streams.


By the time this note was last modified, there was no nice option for voice conferences in sight, but voice conferences themselves don't look like a nice option for communication in most cases. Avoidance is perhaps a good strategy, at least for now.