- 1 Introduction
- 2 Load Generation for Load Testing
- 3 Operating System Tuning
- 4 Network Tuning
- 5 JVM Tuning
- 6 Jetty Tuning
Configuring Jetty for highload, albeit for load testing or for production, requires that the operating system, the JVM, jetty, the application, the network and the load generation all be tuned.
Load Generation for Load Testing
- The load generation machines must have their OS, JVM etc tuned just as much as the server machines.
- The load generation should not be over the local network on the server machine, as this has unrealistic performance and latency as well as different packet sizes and transport characteristics.
- The load generator should generate a realistic load:
- A common mistake is that load generators often open relatively few connections that are kept totally busy sending as many requests as possible over each connection. This causes the measured throughput to be limited by request latency (see Lies Damned Lies and Benchmarks for an analysis of such an issue.
- Another common mistake is to use a TCP/IP for a single request and to open many many short lived connections. This will often result in accept queues filling and limitations due to file descriptor and/or port starvation.
- A load generator should well model the traffic profile from the normal clients of the server. For browsers, this if mostly between 2 and 6 connections that are mostly idle and that are used in sporadic bursts with read times in between. The connections are mostly long held HTTP/1.1 connections.
- Load generators should be written in asynchronous programming style, so that limited threads does not limit the maximum number of users that can be simulated. If the generator is not asynchronous, then a thread pool of 2000 may only be able to simulate 500 or less users. The Jetty HttpClient is an ideal basis for building a load generator, as it is asynchronous and can be used to simulate many thousands of connections (see the Cometd Load Tester for a good example of a realistic load generator).
Operating System Tuning
Both the server machine and any load generating machines need to be tuned to support many TCP/IP connections and high throughput.
Linux does a reasonable job of self configuring TCP/IP, but there are a few limits and defaults that that are best increased. These can mostly be configured in /etc/security/limits.conf or via sysctl
TCP Buffer Sizes
These should be increased to at least 16MB for 10G paths and tune the autotuning (although buffer bloat now needs to be considered).
sysctl -w net.core.rmem_max=16777216 sysctl -w net.core.wmem_max=16777216 sysctl -w net.ipv4.tcp_rmem="4096 87380 16777216" sysctl -w net.ipv4.tcp_wmem="4096 16384 16777216"
net.core.somaxconn controls the size of the connection listening queue. The default value of 128 and if you are running a high-volume server and connections are getting refused at a TCP level, then you want to increase this. This is a very tweakable setting in such a case. Too high and you'll get resource problems as it tries to notify a server of a large number of connections and many will remain pending, and too low and you'll get refused connections:
sysctl -w net.core.somaxconn=4096
The net.core.netdev_max_backlog controls the size of the incoming packet queue for upper-layer (java) processing. The default (2048) may be increased and other related parameters (TODO MORE EXPLANATION) adjusted with:
sysctl -w net.core.netdev_max_backlog=16384 sysctl -w net.ipv4.tcp_max_syn_backlog=8192 sysctl -w net.ipv4.tcp_syncookies=1
If many outgoing connections are made (eg on load generators), then the operating system may run low on ports. Thus it is best to increase the port range used and allow reuse of sockets in TIME_WAIT:
sysctl -w net.ipv4.ip_local_port_range="1024 65535" sysctl -w net.ipv4.tcp_tw_recycle=1
Busy servers and load generators may run out of file descriptors as the system defaults are normally low. These can be increased for a specific user in /etc/security/limits.conf:
theusername hard nofile 40000 theusername soft nofile 40000
Linux supports pluggable congestion control algorithms. To get a list of congestion control algorithms that are available in your kernel run:
If cubic and/or htcp are not listed then you will need to research the control algorithms for your kernel. You can try setting the control to cubic with:
sysctl -w net.ipv4.tcp_congestion_control=cubic
- Intermediaries such as nginx can use non persistent HTTP/1.0 connection. Make sure that persistent HTTP/1.1 connections are used.
- Tune the Garbage Collection
- Allocate sufficient memory
- Use the -server option
acceptors >=1 <= # CPUs
Low Resource Limits
Must not be configured for less than the number of expected connections.
It is very important to limit the task queue of Jetty. By default, the queue is unbounded! As a result, if under high load in excess of the processing power of the webapp, jetty will keep a lot of requests on the queue. Even after the load has stopped, Jetty will appear to have stopped responding to new requests as it still has lots of requests on the queue to handle.
For a high reliability system, it should reject the excess requests immediately (fail fast) by using a queue with a bounded capability. The capability (maximum queue length) should be calculated according to the "no-response" time tolerable. For example, if the webapp can handle 100 requests per second, and if you can allow it one minute to recover from excessive high load, you can set the queue capability to 60*100=6000. If it is set too low, it will reject requests too soon and can't handle normal load spike.
Below is a sample configuration:
<Configure id="Server" class="org.eclipse.jetty.server.Server"> <Set name="ThreadPool"> <New class="org.eclipse.jetty.util.thread.QueuedThreadPool"> <!-- specify a bounded queue --> <Arg> <New class="java.util.concurrent.ArrayBlockingQueue"> <Arg type="int">6000</Arg> </New> </Arg> <Set name="minThreads">10</Set> <Set name="maxThreads">200</Set> <Set name="detailedDump">false</Set> </New> </Set> </Configure>
Configure the number of threads according to the webapp. That is, how many threads it needs in order to achieve the best performance. Configure with mind to limiting memory usage maximum available. Typically >50 and <500.