Tuesday, July 6, 2010

Performance Testing - TCP Connection Failures

Performance Testing - TCP Connection Failures.

I came across this article on WebPerformanceInc, which explain about establishing TCP connection and different reasons for connection failures…felt interesting.

Load Tester is a web site load testing tool, and as such we deal primarily with the most popular Internet communications protocol: the Hypertext Transfer Protocol, or HTTP, which controls the request and transmission of web pages between browser clients and web servers.  HTTP is based on a lower-level protocol known as the Transmission Control Protocol, or TCP.    For the most part, TCP works in the background, but its proper function is critical to your website, and problems at the TCP level can show up in many different ways during a load test.  These errors can sometimes be difficult to troubleshoot, requiring a packet sniffer such as Wireshark or tcpdump to analyze, while others are simpler.

TCP uses the concept of “ports” to identify and organize connections.  For every TCP connection, there are two ports – the source port, and the destination port.  For our purposes, the most important ports are port 80 and port 443, which are the two most common ports utilized by web servers – 80 for normal HTTP traffic, and 443 for SSL-encrypted traffic.  A typical TCP connection from a client to a webserver will involve a random source port such as 44567, and a destination port on the server of port 80.  Each web server can accept many hundreds of connections on port 80, but each connection must come from a different source port on each client.

To create these connections between ports, TCP relies on a three-way handshake.  The requesting client first sends a packet with the TCP SYN flag set, indicating that it wants to open a connection.  If the server has a process listening on the destination port, it will respond with a packet that has both the SYN flag set and the ACK flag set, which acknowledges the client’s SYN and indicates that a connection can be created on that port.  The client then sends a packet with the ACK flag set back to the server, and the connection is established.  The current connections can be viewed using the netstat tool on both Windows and Linux.

What does it look like when a TCP connection attempt fails?  The TCP packet with the SYN flag is sent from the client, which in our case is a load engine.  If the server sees such a packet, but does not have a process listening on the target port, it will typically respond with a TCP packet that has the ACK and RST flags set – a TCP reset.  This tells the client that connections are not available on this port.
Load Tester showing a connection refused (ACK RST)
Load Tester showing a connection refused (ACK RST)
This screenshot shows the result of a load engine failing to connect to the server.  In this case, you can see that I attempted to connect to TCP port 442, which doesn’t have a web server running on it (or any other service, for that matter).  Note that the response was received quickly, in about 1 second, indicating that the remote server saw the ill-fated packet and responded.  The most important thing to know about this error is that it is one of the most reliable errors that you’ll see – either the Load Tester controller or the load engine really is having trouble connecting to the site.  The most common reason for this is that either the site is down, or there is a firewall that is blocking the load engine but not the controller.
So … what happens when the remote server does not respond?
Load Tester showing a connection timeout (dropped packet)
Load Tester showing a connection timeout (dropped packet)
This screenshot shows the same attempted connection, only this time, no response was received from the target server – not even the TCP reset that indicates connections are not available on the target port.  Note how long it takes for Load Tester to report an error – 21 seconds, in this case.  I induced this error by configuring the Linux iptables firewall to drop all incoming packets on TCP port 442, so the server’s TCP stack never saw the incoming SYN packet and thus did not respond to it – from the server’s perspective, the packet never arrived.  A similar error will occur if the server cannot be reached for some reason; for example if you attempt to connect to the wrong hostname, the server is offline, or your traffic is being misrouted between the client or load engine and the server.  If you see these kinds of errors, then the first thing you should do is make sure that the server is up, and that any HTTP proxy servers necessary to reach the server are configured correctly.

Of course, TCP connections can also fail after a connection has been established.  Here’s an example:
Load Tester showing a server connection termination
Load Tester showing a server connection termination
This error message is much less clear.  Did the server close the connection on purpose?  If so, why?  If not, what happened? Did the process handling the server connection crash or return bad data?  In this case, it’s useful to know what Load Tester considers to be a successful connection.  Load Tester expects there to be HTTP headers, followed by data.  In this case, we did not finish receiving the HTTP headers, and so Load Tester considers the connection incomplete.  Load Tester failed to receive the headers in this case because I induced this error by attempting to elicit an HTTP response from the Secure Shell (ssh) service listening on TCP port 22, which terminated the connection after receiving what it saw as invalid data – Load Tester’s HTTP request.

In a real test, there’s a pretty large number of things that can cause this error, from server process crashes or errors, to overly aggressive firewalls, to reverse proxy failures, to misdirected traffic on a load balancer.  In such a case, a traffic analyzer such as Wireshark or tcpdump can be very helpful in determining what is happening.  Note that you may need to observe traffic in more locations that in front of the load engine or the controller though, as traffic can be altered by firewalls and load balancers.


No comments:

Post a Comment