Jetty/Feature/Stress Testing CometD
These instructions describe how to stress test CometD from Jetty7 running on Unix. The same basic steps apply to Windows or Mac; please feel free to add details and terminology specific to these platforms to this wiki.
The basic steps are:
Configuring/tuning the operating system of the test client and server machines
The operating system must be able to support the number of connections (file descriptors) for the test on both the server machine and the required test client machines.
For a Linux system, change the file descriptor limit in the /etc/security/limit.conf file. Add the following two lines (or change any existing nofile lines):
* hard nofile 40000 * hard nofile 40000
You can tune many other values in the server stack; the zeus ZXTM documentation provides a good overview.
Installing, configuring and running CometD
The CometD client and server are now in the CometD Project at The Dojo Foundation, including downloads and documentation.
Installing, configuring and running the Jetty server
Editing the Jetty configuration for CometD testing
For the purposes of CometD testing, you need to edit the standard configuration of Jetty (etc/jetty.xml to change the connector configuration as follows:
- Increase the max idle time.
- Increase the low resources connections.
The relevant section to update is:
<Call name="addConnector"> <Arg> <New class="org.eclipse.jetty.nio.SelectChannelConnector"> <Set name="host"><SystemProperty name="jetty.host" /></Set> <Set name="port"><SystemProperty name="jetty.port" default="8080"/></Set> <Set name="maxIdleTime">300000</Set> <Set name="Acceptors">2</Set> <Set name="statsOn">false</Set> <Set name="confidentialPort">8443</Set> <Set name="lowResourcesConnections">25000</Set> <Set name="lowResourcesMaxIdleTime">5000</Set> </New> </Arg> </Call>
To run the server with the additional memory needed for the test, use:
java -Xmx2048m -jar start.jar etc/jetty.xml
You should now be able to point a browser at the server at either:
Specifically try out the CometD chat room with your browser to confirm that it is working.
Running the Jetty Bayeux test client
The Jetty CometD Bayeux test client generates load simulating users in a chat room. To run the client:
cd $JETTY_HOME/contrib/cometd/client bin/run.sh
The client has a basic text UI that operates in two phases: 1) global configuration 2) test runs. An example global configuration phase looks like:
# bin/run.sh 2008-04-06 13:43:57.545::INFO: Logging to STDERR via org.eclipse.log.StdErrLog server[localhost]: 184.108.40.206 port: context[/cometd]: base[/chat/demo]: rooms : 10 rooms per client : max Latency :
Use the Enter key to accept the default value, or enter a new value and then press Enter. The parameters and their meaning are:
- server–Host name or IP address of the server running Jetty with CometD
- 8080–Port (8080 unless you have changed it in jetty.xml)
- context–Context of the web application running CometD (CometD in the test server).
- base–Base Bayeux channel name used for chat room. Normally you would not change this.
- rooms–Number of chat rooms to create. This value combines with the number of users to determine the users per room. If you have 100 rooms and 1000 users, then you will have 10 users per room and every message sent is delivered 10 times. For runs with >10k users, 1000 rooms is a reasonable value.
- rooms per client–Allows a simulated user to subscribe to multiple rooms. However, as these are randomly selected, values greater than 1 mean that the client is unable to accurately predict the number of messages that will be delivered. Leave this at 1 unless you are testing something specific.
- max Latency–Instructs Jetty to abort the test if the latency for delivering a message is greater than this value (in ms).
After the global configuration, the test client loops through individual tests cycles.¬† Again, press Enter to accept the default value. Two example iterations of the test cycle follow:
clients : 100 clients = 0010 clients = 0020 clients = 0030 clients = 0040 clients = 0050 clients = 0060 clients = 0070 clients = 0080 clients = 0090 clients = 0100 Clients: 100 subscribed:100 publish : publish size : pause : batch : 0011111111221111111111111111100000000000000000000000000000000000000000000000000000000000000000000000 Got:10000 of 10000 Got 10000 at 901/s, latency min/ave/max =2/41/922ms -- clients : Clients: 100 subscribed:100 publish : publish size : pause : batch : 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 Got:10000 of 10000 Got 10000 at 972/s, latency min/ave/max =3/26/172ms --
The parameters that you can set follow:
- clients–Number of clients to simulate. The clients are kept from one test iteration to the next, so if the number of clients changes, or an incremental number of new clients are created or destroyed, take that into account here. (Currently reducing clients produces a noisy exception as the connection is retried. You can ignore this exception).
- publish–Number of chat messages to publish for this test. The number of messages received is this number multiplied by the users per chat room (which is the number of clients divided by the global number of rooms).
- publish size–Size in bytes of the chat message to publish.
- pause–A period (in ms) to pause between batches of published messages.
- batch–Size of the batch of published messages to send in a burst.
While the test is executing, a series of digits outputs to show progress. The digits represent the current average latency in units of 100 ms. For example, 0 represents < 100 ms latency from the time the client published the message to when it was received. And 1 represents a latency >= 100 ms and < 200 ms. At the end of the test cycle the summary is printed showing the total messages received, the message rate and the min/ave/max latency.
Interpreting the results
Before producing numbers for interpretation, it is important to run a number of trials, which allows the system to "warm up." During the initial runs, the Java JIT compiler optimizes the code and populates object pools with reusable objects. Thus the first runs for a given number of clients is often slower. This can be seen in the test cycle shown above where the average latency initially grew to over 200 ms before it fell back to < 100 ms. The average and maximum latency for the second run were far superior to the first run. It is also important to use long runs for producing results for the following reasons:
- To reduce any statistical effect of the ramp-up and ramp-down periods.
- To ensure that any resources (for example, queues, memory, file descriptors) that are being used in a non-sustainable way have a chance to max out and cause errors, garbage collections or other adverse affects.
- To include in the results any occasional system hiccups caused by other system events
Typically it is best to start with short, low-volume test cycles, and to gradually reduce the pause or increase the batch to determine approximate maximum message rates. Then you can extend the test duration by increasing the number of messages published or the number of clients (which also increases the message rate as there are more users per room). A normal run should report no exceptions or timeouts. For a single server and single test client with one room per simulated client, the number of messages expected should always be the number received. If the server is running clustered, the messages received reduce by a factor equal to the number of servers. Similarly, if you are using multiple clients, since each test client sees messages published from the other test clients, the number of messages received will exceed the number sent.
Testing load balancers
When testing a load balancer, be aware of the following:
- Start with a cluster of one so that you can verify that no messages are being lost. Then increase the cluster size.
- You will not have exact message counts, and must adjust according to the number of nodes.
- It is very important that there is affinity, as the Bayeux client ID must be known on the worker node used, and both connections from the same simulated node must arrive at the same worker node. However, the test does not use HTTP sessions, so the balancer must set any cookies used for affinity (the test client handles set cookies).