Voice conferences

Recently, in an ongoing attempt to avoid Skype, I've read about protocols suitable for voice conferences, and now decided to write it down.

1 Preface

Voice conferences never seemed handy to me for a number of reasons: they are strictly real-time, "half-duplex" (in that only one person can talk at a time for the speech to be distinguishable), there's no reliable and easy way to get greppable logs (transcriptions), and unless it's combined with textual chat, there's no way to copy and paste things, to share links or program output, etc.

They are probably good for multiplayer games, when you are busy controlling a character, but also need to coordinate actions in real-time. They may also be nice for those who are not used to reading and writing/typing, or even for a casual chat.

What puzzles me is that even programmers (who should be used to reading and writing, and do share links and other textual things quite often) seem to prefer voice communication surprisingly often. As well as that projects such as tox and secushare aim to provide nice (secure and private) voice/video conferences while there's still no usable protocol for such textual conferences, which should be easier to implement, suggesting that voice/video conferences are deemed pretty important by those.

2 Requirements and concerns

A particularly nasty thing about voice communication is speaker recognition and identification: you leave your biometrics along with what you are saying. If a protocol is centralized, doesn't provide end-to-end encryption, and/or unknown, it's roughly as good as making the conversations public and signed at once, and possibly stored until the end of civilization, what makes it considerably less comfortable to talk. Even apart from all the ethical and security issues, and considering a casual chat, while some people write silly things in public and sign those with their names all the time, for others it'd be embarrassing. Perhaps forgiving silliness and not judging yourself and others helps to care about that less, but it's out of scope of this note.

So, my initial requirements are: end-to-end encryption, an open protocol, at least an open source (preferably libre) client in existence, preferably a distributed protocol.

Apparently the requirements imposed by the majority of users, which should also be taken into account in order to actually use such a protocol, are that it should be extremely easy to set and to use on various systems: not more than a few mouse clicks or touchscreen taps. Perhaps being well-known is another thing that is important to inexperienced users, since the less known things they tend to find tend to be malware – even by the relaxed, non-RMS definition of malware (while by the RMS definition, most of the widely used programs are malware).

And the obvious requirement for it is to work well: acceptable sound/video quality (no perceivable noise, pauses, or delays) even with not-so-good channels, NAT traversal, etc.

3 Protocols

There's a comparison of VoIP software and a few more lists on Wikipedia, the YBTI map, and just widely known protocols. Seems like newbie users mostly think in terms of client software that implements those protocols, by the way, so the clients are even more widely known.

3.1 XMPP + Jingle

Though it might be nice, with XMPP it's pretty tricky even to ensure textual, one-to-one message delivery: different clients and servers support different sets of XEPs, users are not willing to configure anything, clients hide most of the settings and logs from users, users who are willing to configure or debug things can't do it with reasonable effort because of that. I don't like the protocol enough to try to improve the software, while most of the computer users wouldn't even care to try to configure it if it doesn't work out of the box. As a result, it seems to me that it's usable for one-to-one text communications with those few people who are willing to set it, but not much else.

3.2 SIP and H.323

Apparently those are very common for VoIP, and there's a lot of software (even Android API for SIP), but they require to set a server, and users don't hear about them that often, making it something rather obscure for them. While it's not immediately available, "let's rather not deal with it" is a possible reaction.

3.3 WebRTC

WebRTC is a weird thing to find in web browsers (how's that functionality even related to web?), yet it seems to be pretty nice: good NAT travesal (ICE, STUN, TURN), end-to-end encryption (DTLS), voice and video conferences, supported by all the common web browsers for a while now, making it extremely easy to use: a single mouse click to get into a conference. It's not perfect, but open and standartized, and uses other standards, too.

When I've tried to use it with some public server, there was just noise and no distinguishable words; tcpdump showed that it used a slow TURN server instead of STUN with hole punching, even though UDP hole punching worked fine between the used networks when I've tried it manually, using netcat. Apparently further debug will require reading plenty of docs, since there's nowhere to ask about it: no webrtc channel on freenode, and webrtc.org contacts only list Twitter, Google+, and Google groups (which ignore my letters most of the time for an unknown reason, even with usually negative spamassassin score).

3.3.1 Update

Found test.webrtc.org, discovered that my ISP now puts me behind a NAT, so barely made it to pass the "Reflexive connectivity" test by acquiring a NAT-free IP address and redirecting all the UDP traffic (because there's no fixed port – so one doesn't have to set port forwarding at all, if only it worked) to my host from my router, by adding the following into /etc/config/firewall (that's on LibreCMC, should be the same with OpenWRT):

config redirect
        option enabled '1'
        option src 'wan'
        option dest 'lan'
        option proto 'udp'
        option name 'all udp'
        option dest_ip ''

(Then /etc/init.d/firewall restart to apply the changes.)

I don't know why it didn't work without that though.

3.3.2 Update #2

I didn't test it still, the idea of using voice conferences was ditched.

3.4 Tox and secushare

Though Tox seems nice and almost works, it's not that stable, and doesn't have quite usable software even for those who are willing to build and configure it. Haven't tried its voice conferences though. But still, it's not likely to work for regular users at the moment, even if voice conferences work well.

Secushare is similar, just bad parts are currently worse (harder to set, fewer things seem to be ready), and good parts may be better.

4 Conclusion

It's 2016, there's still no satisfactory way to communicate over internet – neither in text nor in voice. Not that much because of incompatible views on what's better (there is that "convenience versus security" thing though, but it doesn't have to be that bad), but because it's not done.