I'm writing a TCP server on Windows Server 2k8. This servers receives data, parses it and then sinks it on a database. I'm doing now some tests but the results surprises me.
The application is written in C++ and uses, directly, Winsocks and the Windows API. It creates a thread for each client connection. For each client it reads and parses the data, then insert it into the database.
Clients always connect in simultaneous chunks - ie. every once in a while abo开发者_JS百科ut 5 (I control this parameter) clients will connect simultaneously and feed the server with data.
I've been timing both the reading-and-parsing and the database-related stages of each thread.
The first stage (reading-and-parsing) has a curious behavoir. The amount of time each thread takes is roughly equal to each thread but is also proportional to the number of threads connecting. The server is not CPU starved: it has 8 cores and always less than 8 threads are connected to it.
For example, with 3 simultaneous threads 100k rows (for each thread) will be read-and-parsed in about 4,5s. But with 5 threads it will take 9,1s on average!
A friend of mine suggested this scaling behavoir might be related to the fact I'm using blocking sockets. Is this right? If not, what might be the reason for this behavoir?
If it is, I'd be glad if someone can point me out good resources for understanding non blocking sockets on Windows.
Edit:
Each client thread reads a line (ie., all chars untils a '\n') from the socket, then parses it, then read again, until the parse fails or a terminator character is found. My readline routine is based on this:
http://www.cis.temple.edu/~ingargio/cis307/readings/snaderlib/readline.c
With static variables being declared as __declspec(thread).
The parsing, assuming from the non networking version, is efficient (about 2s for 100k rows). I assume therefore the problem is in the multhreaded/network version.
If your lines are ~120–150 characters long, you are actually saturating the network!
There's no issue with sockets. Simply transfering 3 times 100k lines, 150 bytes each, over a 100 Mbps line (1 take 10 bytes/byte to account for headers) will take... 4.5 s! There is no problem with sockets, blocking or otherwise. You've simply hit the limit of how much data you can feed it.
Non-blocking sockets are only useful if you want one thread to service multiple connections. Using non-blocking sockets and a poll/select loop means that your thread does not sit idle while waiting for new connections.
In your case this is not an issue since there is only one connection per thread so there is no issue if your thread is waiting for input.
Which leads to your original questions of why things slow down when you have more connections. Without further information, the most likely culprit is that you are network limited: ie your network cannot feed your server data fast enough.
If you are interested in non-blocking sockets on Windows do a search on MSDN for OVERLAPPED APIs
You could be running into other threading related issues, like deadlocks/race conditions/false sharing, which could be destroying the performance.
One thing to keep in mind is that although you have one thread per client, Windows will not automatically ensure they all run on different cores. If some of them are running on the same core, it is possible (although unlikely) to have your server in a sort of CPU-starved state, with some cores at 100% load and others idle. There are simply no guarantees as to how the OS spreads the load (in the default case).
To explicitly assign threads to particular cores, you can use SetThreadAffinityMask. It may or may not be worth having a bit of a play around with this to see if it helps.
On the other hand, this may not have anything to do with it at all. YMMV.
精彩评论