Now will focus on the TCP Port exhaustion problem and how we can deal with it.
This part is focussed on modifying the sysctl settings to get over the port exhaustion limits.
SYSCTL Local Port Range
Port exhaustion is a problem that will cause TCP communications with other machines over the network to fail. Most of the times there is a single process that leads to this problem and restarting it will fix the issue, temporarily. It will however come back to bite in a few hours or days depending on the system load. Port exhaustion simply means that the system does not have any more ephemeral ports left to communicate with other machines / servers.
If these connections are inbound for the HAProxy, then these have to be outbound for the client machines where the connection originated. Any sort of communication from the client requires them to initiate outbound connections to the servers.
that in an outbound connection or rather multiple outbound connections to the SAME backend server, 2things always remain the same i.e. Destination IP and Destination Port. Assuming we are only taking into account a single client machine, the client IP will also remain the same. This means that the number of outbound connections is dependent on the number of client ports that can be used for establishing the connection. While establishing an outbound connection, the source port is randomly selected from the ephemeral port range and this port gets freed up once the connection is destroyed. That’s why such ports are called as ephemeral ports. By default, the total number of local ephemeral ports available are around 28000.
you might be thinking that 28k is a pretty large number and what can possibly cause 28k connections to get used up at a single point of time? In order to understand this, we have to understand the TCP connection lifecycle.
During the TCP handshake, the connection state goes from
SYN_SENT → SYN_RECV → ESTABLISHED. Once the connection is in ESTABLISHED state, it means that the TCP connection is now active. However, once the connection is terminated, the local port that was being used earlier does not become active immediately.
The connection enters a state known as the TIME_WAIT state for a period of 120 seconds before it is finally terminated. This is a kernel level setting that exists to allow any delayed or out of order packets to be ignored by the network.
If you do the math, it won’t take more than 230 concurrent connections per second before the supposedly large limit of 28000 ephemeral ports on the system is reached. This limit is very easy to reach on proxies like HAProxy or NGINX because all the traffic is routed through them to the backend servers.
When a connection enters the TIME_WAIT state, it is known as an orphaned socket because the TCP socket in this case is not help by any socket descriptor but are still held by the system for the designated time i.e. 120 seconds by default.
The socket statistics command is a sort of replacement of the famous netstat command and is much faster than the netstat command in rendering information because it fetches the connections info directly from the kernel space. The `ss -s` command will show the total number of TCP established connections on the machine. If you see this reach the 28000 mark, it is very much possible that the ephemeral ports have been exhausted on that machine. BEWARE: This might be higher than the 28k number if multiple services are running on the same machine on different ports.
One of the most practical approaches to solve this problem and one that you most likely will or rather should end up doing is to increase the local ephemeral port range to the maximum possible value. As mentioned before, the default range is very small.
echo 1024 65535 > /proc/sys/net/ipv4/ip_local_port_range
This will increase the local port range to a bigger value. We cannot increase the range beyond this as there can only be a maximum of 65535 ports and the first 1024 are reserved for select services and purposes. Another simple solution is to enable a Linux TCP option called tcp_tw_reuse.This option enables the Linux kernel to reclaim a connection slot from a connection in TIME_WAIT state and reallocate it to a new connection.
--> vi /etc/sysctl.conf
--> Add the following line in the end # Allow reuse of sockets in TIME_WAIT state for new connections # only when it is safe from the network stack’s perspective. net.ipv4.tcp_tw_reuse = 1
--> Reload sysctl settings sysctl -p