Voice conferences

Voice conferences never seemed handy to me for a number of reasons: they are strictly real-time (quite an issue if you don't maintain a sleep regimen or any of participants have anything else scheduled), effectively half-duplex (only one person can talk at a time for the speech to be intelligible), there's no reliable and easy way to get greppable logs (transcriptions), and unless it's combined with textual chat, there's no way to copy and paste texts, to share links or program output, etc.

They are probably good for multiplayer games, when you are busy controlling a character, but also need to coordinate actions in real-time. They may also be nice for those who are not used to reading and writing/typing, or even for a casual chat.

Nevertheless, sometimes voice conferences are hard to avoid, and here are my notes on relatively usable protocols.

Requirements and concerns

A particularly nasty thing about voice communication is speaker recognition: coupled with unencrypted/unknown protocols and surveillance and/or data breaches, it can be quite uncomfortable to use. Hence my initial requirements are end-to-end encryption, an open protocol, at least an open source (preferably libre) client for GNU/Linux in existence, preferably a distributed or federated protocol.

Apparently the requirements imposed by the majority of users, which should also be taken into account in order to actually use such a protocol, are that it should be extremely easy to set and to use on various systems: not more than a few mouse clicks or touchscreen taps. Perhaps being well-known is another thing that is important to inexperienced users, since the less known things they tend to find tend to be malware even by the relaxed, non-RMS definition of malware.

And the obvious requirement for it is to work well: acceptable sound/video quality (no perceivable noise, pauses, or delays) even over poor connection, perhaps NAT traversal, etc.

Protocols

There's a comparison of VoIP software and a few more lists in Wikipedia, and in the YBTI map. Apparently newbie users mostly think in terms of client software that implements those protocols, so the clients are even more widely known.

XMPP + Jingle (with (S)RTP)
As WebRTC, uses ICE and RTP, with just negotiation over XMPP/Jingle. Not commonly implemented in XMPP clients, and as of 2021, even Jingle file transfers keep failing in pretty good conditions (major clients, dedicated IP address at least on one of the clients, the server providing SOCKS5 proxy and not minding in-band bytestreams); actually even textual message transfer doesn't always work, particularly if iOS devices or poor connections are involved. Audio (and video) calls can work fine if clients and servers are set properly, and I'm using those regularly (after setting everything myself), but there's usually no conferencing support anyway.
SIP and H.323 (also using (S)RTP)
Those are common for VoIP, and there's a lot of software (even Android API for SIP; Linphone seems to be a popular client, and Asterisk -- a fairly common server), but they require to set a server, and users don't hear about them that often, making it something rather obscure for them. While it's not immediately available, "let's rather not deal with it" is a likely reaction.
WebRTC (using (S)RTP, as others)

WebRTC is a strange protocol to find in web browsers (how's that functionality even related to the Web?), yet it may seem usable: NAT traversal (ICE, STUN, TURN) is present, end-to-end encryption (DTLS), voice and video conferences, supported by common web browsers for a while now, making it relatively easy to use: a single mouse click to get into a conference. It's not perfect, but open and standardised, and reuses other standards.

When I've tried to use it with a public server, there was just noise and no distinguishable words; tcpdump showed that it used a slow TURN server instead of STUN with hole punching, even though UDP hole punching worked fine between the used networks when I tried it manually, using netcat. Getting help with issues like that is tricky: there's no WebRTC IRC channel or XMPP conference in sight, and webrtc.org contacts only list Twitter, Google+, and Google groups (which eat messages from smaller servers rather often and seemingly randomly).

test.webrtc.org was helpful for debugging, that's how I've discovered that my ISP started putting me behind NAT. I have barely made it to pass the "Reflexive connectivity" test by acquiring a NAT-free IP address and redirecting all the UDP traffic (because there's no fixed port – so one doesn't have to set port forwarding at all, if only it worked) to my host from my router, by adding the following into /etc/config/firewall (that's on LibreCMC, should be the same with OpenWRT, and /etc/init.d/firewall restart should be invoked to apply it):

config redirect
          option enabled '1'
          option src 'wan'
          option dest 'lan'
          option proto 'udp'
          option name 'all udp'
          option dest_ip '192.168.1.8'

It didn't quite work for sound anyway, even while passing all the tests. Possibly the issue was on the other end.

Though Jitsi Meet uses WebRTC (and Jitsi Videobridge bridges it to Jitsi's regular SIP), which works fine.

Mumble
I haven't tried or read much about this one, but it seems to be used occasionally, it's open and apparently works, so may be worth looking into (TBD).
Experimental/research projects
There are various experimental (or just planned) projects, some of which attempt to build on distributed systems, yet they usually aren't usable even by advanced users, if they work at all. Salsify looks interesting, though at the time of writing it only supports video streams.

Basically, the primary option is (S)RTP, using one of the negotiation protocols, and various audio/video codecs (RTP payload formats).

Conferencing setup

In addition to protocols (and related software) covered here, one should set a microphone (see computer hardware notes), and noise and/or echo cancellation (see, for instance, the CentOS 7 workstation notes).

Conclusion

As of 2021, perhaps the only fine option I've observed being used successfully is Jitsi Meet, which uses WebRTC, and I assume the server proxies everything, without awkward attempts to establish peer-to-peer connections over annoying networks that complicate it. But if it wasn't for the added awkwardness of NATs and software infrastructures, it would be much easier, since working protocols and related technologies are available for a while now.