Re: Libpq is very slow on windows but fast on linux. - Mailing list pgsql-general

From John R Pierce
Subject Re: Libpq is very slow on windows but fast on linux.
Date
Msg-id 4CD61EBD.8070100@hogranch.com
Whole thread Raw
In response to Re: Libpq is very slow on windows but fast on linux.  (Rob Brown-Bayliss <r.brown.bayliss@gmail.com>)
Responses Re: Libpq is very slow on windows but fast on linux.  (Rob Brown-Bayliss <r.brown.bayliss@gmail.com>)
List pgsql-general
On 11/06/10 6:13 PM, Rob Brown-Bayliss wrote:
> Ok, So I did that, in the windows capture file are many many lines of
> Red text on a black background, I assume thats a bad thing.  It's
> every second or third line vs just a handful on the linux capture.
>
> Most of these lines are like:
>
> postgresql [ACK]  Seq=5429 Ack=##### Win=65700 Len=0
>
> Where the ACK number is always different.
>
> As I said before I really don't know what I am looking at.
>


While I come from a purely software background, over the years I've had
to debug exactly these sorts of situations, and determined that I needed
a better understanding of TCP/IP.   I found the book "Internetworking
with TCP/IP" by Douglas Comer et al extremely useful, particularlly
volume II.    Understanding how the underlying network protocols
function has been handy time after time.   I don't pretend to be a
network engineer, but I can communicate with them in their own language
and thats been extremely helpful too.


In Wireshark, the color coding itself is pretty arbitrary, and sometimes
excessively lurid :)


every IP (internet protocol) packet has a source IP address, a
destination IP address, and a protocol.    TCP is protocol #6  UDP is
17, etc.
every TCP packet has an IP packet header (above) plus a source port, a
destination port, a sequence number, a ack value, a offset, some flags,
a checksum, etc.

once a TCP socket session has been fully established, the general
sequence is, one side sends data packets, and the other side acknowleges
them[1].   Data packets should be around 1500 bytes each unless a
smaller MTU has been negotiated.  If one 1500 byte packet was sent and
the server then had to wait for an acknowlege before sending another,
you'd only be able to send one data packet every (PING TIME)
milliseconds, which in yoru case here would be *excruciatingly* slow, so
instead, TCP keeps sending data up to RWIN (TCP Recieve Window Size)
bytes before it requires an ACK be sent back.  That #### value on the
ACK is what data its acknowledging, and undoubtably increases with each
ACK.    The RWIN value is negotiated between the client and the server,
and Windows is notorious for using values too small for a wide area long
pingtime fast network like this.

If you're seeing a lot of ACKs it suggests that the RWIN value may be
too low.    Of particular importance is the timing of the packets coming
from the remote server, do you get several fast, then a delay, or do
they come in back to back at wire speed?     Its actually quite OK and
common for a client to send ACKs back 'early', as soon as it gets a
packet to keep the data flowing, the key is if the sender is waiting for
those ACKs, then RWIN is too small.

You might have 1Mbyte/sec download bandwidth between this server and
your client (arbitrary value I picked for sake of discussion).

your round trip packet latency is 350mS, I believe you said.   thats
about 1/3rd of a second, which means about 300Kbytes can be "in the
pipe".   If your RWIN is, say, 64K (typical value used on a local
network), the sender will only be able to send 64k before having to wait
350mS for a ACK.   This will slow your pipe down about 5:1.   If
instead, the RWIN was above 300k, then the client will be able to keep
up with the ACKs without the sender having to wait.    Generally RWIN
should be about twice as high as the data rate * the round trip latency.

I haven't kept up with the details, but I definately remember having to
manually crank RWIN up to the 500kbyte range as Windows wouldn't
automatically go over 64k (in fact, going above 64k requires a TCP
extension called "Window Scaling" which wasn't supported by earlier OSs
at all)


here's a typical sequence...

server   ->   client
 > data 0 len 1500   {1500 bytes of data}
 > data 1500 len 1500   {1500 bytes of data}
 > data 3000 len 1500   {1500 bytes of data}
 > data 4500 len 1500   {1500 bytes of data}
< ACK=3000
 >data 6000 len 1500   {1500 bytes of data)
 >data 7500 len 1500   {1500 bytes of data)
< ACK=6000


acking 3000 implicitly acknowleges everything before that.

anyways, this article http://en.wikipedia.org/wiki/TCP_Tuning and
various linked articles may help you here.

and this, http://technet.microsoft.com/en-us/magazine/2007.01.cableguy.aspx
explains how to manually crank up a higher RWIN in Windows XP in
particular which isn't very smart about it.





[1] simplification.  sockets are bidirectional, and both ends can send
data to the other end, and the acknowledgments for received data can be
sent with outbound data packets.   Torrent makes extensive use of this.



pgsql-general by date:

Previous
From: Rob Brown-Bayliss
Date:
Subject: Re: Libpq is very slow on windows but fast on linux.
Next
From: John R Pierce
Date:
Subject: Re: Libpq is very slow on windows but fast on linux.