Telnet Debugging
A connection is closed! And nobody knows why! There are many reasons this may be the case. Obtaining a packet trace (with tcpdump, or equivalent) ideally on both sides of the connection is a good thing to do. Reasons for connection failures include:
- A firewall or router marks the idle telnet session as "inactive" and drops the associated state for the connection. Fixing this may require sending a "keepalive" packet or similar activity now and then.
- A network issue causes keepalive packets to be dropped, thus convincing telnet that the connection is bad and should be dropped. Fixing this may require sending fewer or no "keepalive" packets. Wifi or busy networks drop packets, or a cable might be loose, etc.
- The telnet implementation might be old and bad. Or new and bad, though there are probably not a lot of new implementations of telnet out there?
- A cheap network router has some security feature that closes the connection. This may be more notable for wacky UDP protocols (NFS in particular), but going through the configuration for any routers in the path might be good.
- A router might have some wacky Quality-of-Service profile set, especially if telnet has been moved to some random high port.
- A naughty ISP injects RST packets to close the connection. ISP is vague here, it could be as small as a coffee shop router trying to increase customer turnover, a large national chain perhaps lacking in competition, or an entire country that is screwing around with the connections.
- Some bit of hardware is doing bad things, maybe because the hardware is old, being overloaded with traffic, or both. Load balancers are also known to mess around with connections and forge traffic, which is why a network trace on both sides is important.
- Anything else I've forgotten or don't know about.
There can be false errors here, like some packet traces will report invalid checksums when that checksumming has been offloaded to the network device(, and hopefully the device is doing that aright). Other indicators may be symptoms of the problem, but not the cause. Other debugging problems include people picking some reason (it's the other group's firewall, because that's what it was last time) and sticking with it, especially if there's bad blood between groups, or some group uses a different and thus suspect configuration, or they're busy or don't wanna debug and hope the other group can solve the issue meanwhile. Or maybe they've had a really bad day.
With the above laundry list hopefully you can study and vary or isolate things, like is it only telnet connections that are the problem? Only for a particular host? At certain times of day (when the network is busy and that old switch turns into a hub which causes…)? Etc.