I want to create a C++ server/client that maximizes the throughput over TCP socket communication on my localhost. As a preparation, I used iperf to find out what the maximum bandwidth is on my i7 MacBookPro.
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 256 KByte (default)
------------------------------------------------------------
[ 4] local 127.0.0.1 port 5001 connected with 127.0.0.1 port 51583
[ 4] 0.0-120.0 sec 329 GBytes 23.6 Gbits/sec
Without any tweaking, ipref showed me that I can reach at least 23.2 GBit/s. Then I did my own C++ server/client implementation, you can find the full code here: https://gist.github.com/1116635
I that code I basically transfer a 1024bytes int array with each read/write operation. So my send loop on the server looks like this:
int n;
int x[256];
//fill int array
for (int i=0;i<256;i++)
{
x[i]=i;
}
for (int i=0;i<(4*1024*1024);i++)
{
n = write(sock,x,sizeof(x));
if (n < 0)开发者_如何学JAVA error("ERROR writing to socket");
}
My receive loop on the client looks like this:
int x[256];
for (int i=0;i<(4*1024*1024);i++)
{
n = read(sockfd,x,((sizeof(int)*256)));
if (n < 0) error("ERROR reading from socket");
}
As mention in the headline, running this (compiled with -O3) results in the following execution time which is about 3 GBit/s:
./client 127.0.0.1 1234
Elapsed time for Reading 4GigaBytes of data over socket on localhost: 9578ms
Where do I loose the bandwidth, what am I doing wrong? Again, the full code can be seen here: https://gist.github.com/1116635
Any help is appreciated!
- Use larger buffers (i.e. make less library/system calls)
- Use asynchronous APIs
- Read the documentation (the return value of read/write is not simply an error condition, it also represents the number of bytes read/written)
My previous answer was mistaken. I have tested your programs and here are the results.
- If I run the original client, I get
0m7.763s
- If I use a buffer 4 times as large, I get
0m5.209s
- With a buffer 8 times as the original I get
0m3.780s
I only changed the client. I suspect more performance can be squeezed if you also change the server.
The fact that I got radically different results than you did (0m7.763s
vs 9578ms
) also suggests this is caused by the number of system calls performed (as we have different processors..). To squeeze even more performance:
- Use scater-gather I/O (
readv
andwritev
) - Use zero-copy mechanisms:
splice(2)
,sendfile(2)
You can use strace -f iperf -s localhost
to find out what iperf is doing differently. It seems that it's using significantly larger buffers (131072 Byte large with 2.0.5) than you.
Also, iperf
uses multiple threads. If you have 4 CPU cores, using two threads on client and server will will result in approximately doubled performance.
If you really want to get max performance use mmap
+ splice/sendfile
, and for localhost communication use unix domain stream sockets (AF_LOCAL
).
精彩评论