================= No tech support ================= A few days ago I wished to play with computer networks. Now I almost wish I did not have to (though not exactly getting to, anyway: just having issues with others' networks). Yesterday I noticed an issue with calls, tried to debug it today: oddly, an XMPP client kept failing to setup a call, and reconnecting afterwards. Then I noticed that some messages are getting stuck after that, too. Then noticed that no messages go to the server when it happens, though they do come back. Same thing with different clients and devices, all gets stuck upon sending a large packet (with all the transport candidates). Then discovered that packets (either TCP or ICMP) of 1487 bytes or more don't make it to the Hetzner networks (neither this server nor hetzner.com) from me (connected via Rostelecom), though the expected MTU is 1500. Apparently it requires rooting to set a lower MTU on an Android phone, and that would be an awkward workaround anyway. It also works fine from other ISPs (work servers, mobile network operator). Without really hoping to get help, I decided at least to try to contact the Rostelecom's tech support, thinking that it wouldn't harm, and perhaps the right thing to do in this situation. Sent all the ping and traceroute data at once, described the issue, mentioned that it is the same with TCP, bypassed the chat bot and reached a human. The human refused to help if the computer from which I checked it is connected via a home router (hence the title). When I asked whether they could at least check whether large packets get to Hetzner from Rostelecom network, they said that they can't ping "on my behalf". I tried it without a router then, even though it is silly (perhaps should have just lied), and attempted again to get it investigated/fixed, though I was even less hopeful than before contacting the support. They asked whether I have access to resources despite the issues with packet sending then; I said that I have trouble accessing services there because of the lost packets. And they've finally mentioned that there were issues with other servers today, but everything is fine on their end, and I should contact the resource owners. It is like that almost each time I try to contact the support, Rostelecom's or other local ones. Ugh, I can't save this file now, since tried to edit it in Emacs, remotely, and it is more than 1500 bytes. Will try to paste it via SSH instead. Tried now, failed to; will have to paste slowly. Though the SSH connection I had is stuck now, and failing to open a new one. Did ``sudo ip link set enp6s0 mtu 1486`` now, but it is awkward and wrong, and still does not fix calls on the phone. Maybe I will finally try to switch an ISP yet again once will run out of money on the balance with this one. Though they all seem to be awful like that, as are mobile network operators. And right now I am not quite certain whether the issue is with Rostelecom or farther on the path (dataix.eu, Hetzner itself, somewhere between those; though Rostelecom's AS12389 is peering with Hetzner's AS24940, and traceroute from me goes through both of those, so there are not many organizations in the chain). Update: the same thing happens with wikipedia.org (91.198.174.192). Image loading seemed quite laggy from Wikipedia in the past week or so, possibly that's related. I only hope it is related to technical issues, and not to censorship and intentional blocking (of which there is a lot these days). And maybe in the worst case (if things won't be fixed), and before changing an ISP, I could try setting MTU on the router, to set it everywhere via DHCP (if Android reads that). Update 2: ended up setting "26,1486" in DHCP server options on the router for now, so that the phone can use it (and apparently the phone indeed does use MTU advertised via DHCP). Update 3: added an optional path MTU discovery setting (the IP_MTU_DISCOVER socket option) into rexmpp, for TCP sockets. Explicitly enabling it actually won't help in this case, and it can be configured system-wide (/proc/sys/net/ipv4/ip_no_pmtu_disc, see the ip(7) man page) as well, but wouldn't harm to have that configurable, and noticed that IP_PMTUDISC_DONT actually does help, since it disables the DF bit (i.e., allows fragmentation), and then the packets get through. So apparently a router on the path doesn't simply drop the packets, but wants to fragment them, and fails to communicate it back. Update 4: tried to poke Hetzner's support (support@hetzner.com) as well, but they replied in 3 days, asking to authenticate on their website and submit a support request via a form, "for security and privacy reasons". Trying that, too, though it begins as it was with Rostelecom: they ask to jump through some silly hoops, which isn't very promising. They replied after checking it, in a week, at which point it was already clear that the issue is at Rostelecom; thanked them and asked to close the ticket. Update 5: no reply from Hetzner by 2023-04-22, but found that the last hop I see with ``traceroute -F 157.90.29.18 1487`` is the last Rostelecom router on the way, which suggests that the issue is between it and the router at DATAIX (which later turned out to be Rostelecom's as well, see below). Maybe it is the time to try writing to the dataix.eu NOC directly (actually in the past I had a better experience with writing to a NOC directly than with getting issues fixed via the ISP's tech support, though I don't remember if it was Rostelecom or a different one). Or that of Rostelecom, if I'll manage to find its address (probably that's cuss-ip@rt.ru; later looked up my mail archive, that's the correct address and they were helpful 7 years ago, when there was an issue between Rostelecom and another network, though the packets weren't getting through at all back then). Update 6: tried that traceroute command a few more times, and another dataix.eu (actually Rostelecom's, at turned out later) router, 178.18.225.153, replied with "!F-1486" (an ICMP "packet too big" message). So, perhaps the issue is at their router through which it usually goes, 178.18.227.8. Update 7: actually even ping by itself eventually discovers correct MTU, receiving a reply about it from 178.18.225.153, though that takes a few minutes (hundreds of ping packets) until an ICMP PTB message. The 178.18.225.153 DATAIX's address also shows up instead of 178.18.227.8 when trying traceroute the other way around (from Hetzner to Rostelecom). Update 8: 2023-04-23, wrote to the DATAIX NOC. They replied quickly that things seem to be fine on their end, mentioned that 178.18.225.153 is Rostelecom's router, and attached ping output showing that it doesn't reply to larger ping packets, while does to smaller ones (to me or to a Hetzner server it doesn't reply to any). Wrote into the Rostelecom tech support's awkward chat again. Had no reply in an hour, then discovered that apparently the message wasn't sent at all, and there were some JavaScript errors on further attempts to send messages from the same web browser tab. Sent it as an attachment (a text file) from another browser tab. Then spent another hour (actually a bit more) convincing some clueless person that there is an issue, that the routers in question seem to be theirs (Rostelecom's, at least according to he DATAIX NOC; otherwise the support person did claim at some point that "it is not our zone of responsibility", as did another one previously), sending them screenshots of traceroute output because they can't read plaintext attachments, reloading the buggy chat, etc, and then they've finally filed a ticket further (though without a ticket monitoring method: I can't see what is its status, and what was filed by that clueless person; I suspect that critical details could have gone missing easily, and something like "hetzner.com is not available" was reported). Update 9: the next day Rostelecom called, but to advertise some junk services and software (they keep trying to push Kaspersky software for years, and something else on top now) instead of being about the ticket; I wonder whether they use tickets as an excuse to spam, or it's just a coincidence. While the issue is still there. Blocked and reported that phone number (to whatever service Google uses by default for spam tracking). Later in the day I missed a call, which apparently was from the technical support. Update 10: 2023-04-25, the issue is finally (and hopefully permanently) fixed: the route is the same, but larger IP packets (up to 1500 bytes) do get through now, to both Hetzner servers and Wikipedia. Tried to confirm that it is fixed via the chat, in case if they need a feedback from the client, but the support told me that the ticket is open, and they will just call again today (they have only called on 2023-04-28, asking whether the issue is still there). Update 11: received a message from DATAIX NOC mentioning that they were unable to obtain more information from Rostelecom, but they've also noticed that the 178.18.225.153 router now responds to larger ICMP echo requests. Thanked them and confirmed that it appears to be fixed (and that neither did I receive any useful information from Rostelecom); it is nice of them to follow through. Not sure what are the lessons learned here: I knew that NOCs/engineers tend to be helpful, and that the first-line tech support tends to be awful (pretty much everywhere; even worse with government services, where I just give up sometimes), though perhaps the memory faded a bit. Maybe I will try to keep less money on the balance, as I do with Beeline/VimpelCom (which likes to silently drain balance with automatically and silently enabled services), so that it would be easier to switch in case if the next time it will be too challenging to get the issues actually looked at and fixed. ---- :Date: 2023-04-14