Load Testing? HAProxy? Part1

We needed some definitive answers to the following questions:

1. What is the impact as we shift our traffic from Non-SSL to SSL? CPU should definitely take a hit because SSL handshake is not a normal 3 way handshake, it is rather a 5 way handshake and after the handshake is complete, further communication is encrypted using the secret key generated during the handshake and this is bound to take up CPU.

2. What are some other hardware/software limits that might be reached on production as a result of SSL termination at the HAProxy level. We could also go for the SSL PassThrough option provided by HAProxy which terminates/decrypts the SSL connection at the backend servers. However, SSL termination at the HAProxy level is more performant and so this is what we intend to test.

3. What is the best hardware required on production to support the kind of load that we see today. Will the existing hardware scale or do we need bigger machines?

I will be discussing an important aspect of any load testing exercise that most of us tend to ignore.
If you have ever done any kind of load testing or hosted any server serving a lot of concurrent requests, you definitely would have run into the “Too many open files” issue.
An important part of any stress testing exercise is the ability of your load testing client to establish a lot of concurrent connections to your backend server or to the proxy like HAProxy in between.

A lot of times we end up being bottleneck on the client not being able to generate the amount of load we expect it to generate. The reason for this is not because the client is not performing optimally, but something else entirely on the hardware level.

Ulimit is used to restrict the number of user level resources. For all practical purposes pertaining to load testing environments, ulimit gives us the number of file descriptors that can be opened by a single process on the system. On most machines if you check the limit on file descriptors, it comes out to be this number = 1024. Opening a new TCP connection / socket also counts as an open file or a file descriptor and hence the limitation.

What this generally means is that a single client process can only open 1024 connections to the backend servers and no more. It means you need to increase this limit to a very high number on your load testing environment before proceeding further. There’s a million things that can modify a limits of a process after (or before) you initialized your shell. So what you should do instead is use ‘ps’ command , or whatever you want to use to get the ID of the problematic process, and do a cat /proc/{process_id}/limits

Raising the Limit

There are two ways of changing the ulimit setting on a machine.

  1. ulimit -n <some_value>. This will change the ulimit settings only for the current shell session. As soon as you open another shell session, you are back to square one i.e. 1024 file descriptors. So this is probably not what you want.
  2. fs.file-max = 500000. Add this line to the end of the file /etc/sysctl.conf.And add the following
    * soft nofile 50000
    * hard nofile 
    50000
    root soft nofile 
    50000
    root hard nofile 
    50000
    to the file /etc/security/limits.conf.

The * basically represents that we are setting these values for all the users except root. “soft or hard” basically represent soft or hard limits. The next entry specifies the item for which we want to change the limit values i.e. nofile in this case which means the number of open files . And finally we have the value we wanna set which in this case is 50000. The * here does not apply to a root user, hence the last two lines specially for the root user.

After doing this, you need to take a reboot of the system. Unfortinatly yes 🙁 And the changes should reflect in the ulimit -n command.

It is not necessary that changing this will affect all the user processes running on the system. It is quite possible that even after changing the system wide ulimit, you might find that /etc/<pid>/limits give you a smaller number than what you might expect to find.

In this case, you almost certainly have a process manager, or something similar that is messing up your limits. You need to keep in mind that processes inherit the limits of their parent processes. So if you have something like a Supervisor managing your processes, they will inherit the settings of the Supervisor daemon and this overrides any changes you make to the system level limits.