Thread: Libpq is very slow on windows but fast on linux.
Hi I have a problem with libpq on windows. Connecting to a db and running a "select * from some_table;" is very slow. The table has only 1800 rows, 7 columns. No blobs etc. The query is taking around 3500ms, in linux it takes around 800ms. (About 500ms is network time, the server is on the opposite side of the world from my location.) The hardware is identical (dual boot) Why does this so long in windows? I tested in pqsl, and pgadmin to rule out errors in the app code. Any advice or clues? I need the app to work on both platforms. Thanks -- -- Rob
On 11/06/2010 04:54 PM, Rob Brown-Bayliss wrote: > Hi > > I have a problem with libpq on windows. Connecting to a db and running > a "select * from some_table;" is very slow. > > The table has only 1800 rows, 7 columns. No blobs etc. > > The query is taking around 3500ms, in linux it takes around 800ms. > (About 500ms is network time, the server is on the opposite side of > the world from my location.) > > The hardware is identical (dual boot) > > Why does this so long in windows? I tested in pqsl, and pgadmin to > rule out errors in the app code. > > Any advice or clues? I need the app to work on both platforms. Just a guess, but maybe DNS lookup time? Try using IP address instead of FQDN. HTH, Joe -- Joe Conway credativ LLC: http://www.credativ.us Linux, PostgreSQL, and general Open Source Training, Service, Consulting, & 24x7 Support
Attachment
On 11/06/10 4:54 PM, Rob Brown-Bayliss wrote: > Hi > > I have a problem with libpq on windows. Connecting to a db and running > a "select * from some_table;" is very slow. > > The table has only 1800 rows, 7 columns. No blobs etc. > > The query is taking around 3500ms, in linux it takes around 800ms. > (About 500ms is network time, the server is on the opposite side of > the world from my location.) > > The hardware is identical (dual boot) > > Why does this so long in windows? I tested in pqsl, and pgadmin to > rule out errors in the app code. > > Any advice or clues? I need the app to work on both platforms. to be clear, the database SERVER is the same in both cases? only the client varies? when you say 500mS, thats the round trip ping time? how about if you do something like, SELECT * FROM SOME_TABLE INTO SOME_OTHER_TABLE; which doesn't involve returning data? I think I'd run a packet sniffer like Wireshark on both client platforms, sniffing 5432/tcp and see if there's something unusual in the timing of the packets, incorrect MTU or something, lost packets and extra retries, or whatever. Perhaps Windows isn't using a sufficiently large TCP Recieve Window (RWIN) value, or something.
Thanks, but no, it's using ip, and after the connection has been established etc. the 3500ms it the total query time on windows vs 800ms on linux. Same server, so I think it is related to libpq implementation on windows. I also get the same results running XP in a virtualbox on a linux machine, while the linux host machine is getting 800ms. On Sun, Nov 7, 2010 at 12:59 PM, Joe Conway <mail@joeconway.com> wrote: > On 11/06/2010 04:54 PM, Rob Brown-Bayliss wrote: >> Hi >> >> I have a problem with libpq on windows. Connecting to a db and running >> a "select * from some_table;" is very slow. >> >> The table has only 1800 rows, 7 columns. No blobs etc. >> >> The query is taking around 3500ms, in linux it takes around 800ms. >> (About 500ms is network time, the server is on the opposite side of >> the world from my location.) >> >> The hardware is identical (dual boot) >> >> Why does this so long in windows? I tested in pqsl, and pgadmin to >> rule out errors in the app code. >> >> Any advice or clues? I need the app to work on both platforms. > > Just a guess, but maybe DNS lookup time? Try using IP address instead of > FQDN. > > HTH, > > Joe > > -- > Joe Conway > credativ LLC: http://www.credativ.us > Linux, PostgreSQL, and general Open Source > Training, Service, Consulting, & 24x7 Support > > -- -- Rob
On Sun, Nov 7, 2010 at 1:06 PM, John R Pierce <pierce@hogranch.com> wrote: > when you say 500mS, thats the round trip ping time? It's a bit less, for example SELECT max(id) on the same table takes about 350ms. Yes, I am in New Zealand, the server is in Canada. pings take about 275ms average. > I think I'd run a packet sniffer like Wireshark on both client platforms, > sniffing 5432/tcp and see if there's something unusual in the timing of the > packets, incorrect MTU or something, lost packets and extra retries, or > whatever. Perhaps Windows isn't using a sufficiently large TCP Recieve > Window (RWIN) value, or something. I am a bit out of my depth there, but for all other network use the machine seems the same in either os. That is no noticiable difference in download/upload speed. -- Rob
On 11/06/10 5:12 PM, Rob Brown-Bayliss wrote: > On Sun, Nov 7, 2010 at 1:06 PM, John R Pierce<pierce@hogranch.com> wrote: >> I think I'd run a packet sniffer like Wireshark on both client platforms, >> sniffing 5432/tcp and see if there's something unusual in the timing of the >> packets, incorrect MTU or something, lost packets and extra retries, or >> whatever. Perhaps Windows isn't using a sufficiently large TCP Recieve >> Window (RWIN) value, or something. > I am a bit out of my depth there, but for all other network use the > machine seems the same in either os. That is no noticiable difference > in download/upload speed. download/uploads from/to this same server in Canada ? Wireshark is pretty easy to use. tell it you only want to capture TCP traffic on port 5432 (to or from 5432), then start its capture and run your query. it shows the network packets with a ton of info including a millisecond accurate timestamp on each packet. do this on each OS and compare the timings on these captures (you can save the capture from one system and load it on wireshark on the other OS so you can do side by side compares) studying the details of a TCP connection via Wireshark should lead to a good deal of understanding of the underpinnings of TCP/IP networking.
Ok, So I did that, in the windows capture file are many many lines of Red text on a black background, I assume thats a bad thing. It's every second or third line vs just a handful on the linux capture. Most of these lines are like: postgresql [ACK] Seq=5429 Ack=##### Win=65700 Len=0 Where the ACK number is always different. As I said before I really don't know what I am looking at. On Sun, Nov 7, 2010 at 1:19 PM, John R Pierce <pierce@hogranch.com> wrote: > On 11/06/10 5:12 PM, Rob Brown-Bayliss wrote: >> >> On Sun, Nov 7, 2010 at 1:06 PM, John R Pierce<pierce@hogranch.com> wrote: >>> >>> I think I'd run a packet sniffer like Wireshark on both client platforms, >>> sniffing 5432/tcp and see if there's something unusual in the timing of >>> the >>> packets, incorrect MTU or something, lost packets and extra retries, or >>> whatever. Perhaps Windows isn't using a sufficiently large TCP Recieve >>> Window (RWIN) value, or something. >> >> I am a bit out of my depth there, but for all other network use the >> machine seems the same in either os. That is no noticiable difference >> in download/upload speed. > > download/uploads from/to this same server in Canada ? > > Wireshark is pretty easy to use. tell it you only want to capture TCP > traffic on port 5432 (to or from 5432), then start its capture and run your > query. it shows the network packets with a ton of info including a > millisecond accurate timestamp on each packet. do this on each OS and > compare the timings on these captures (you can save the capture from one > system and load it on wireshark on the other OS so you can do side by side > compares) > > studying the details of a TCP connection via Wireshark should lead to a good > deal of understanding of the underpinnings of TCP/IP networking. > > > > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > -- -- Rob
On Sun, Nov 7, 2010 at 1:06 PM, John R Pierce <pierce@hogranch.com> wrote: > how about if you do something like, SELECT * FROM SOME_TABLE INTO > SOME_OTHER_TABLE; which doesn't involve returning data? In this case the times are as close to equal as to make no difference, within a couple of ms of each other. About 300ms each. -- Rob
On 11/06/10 6:13 PM, Rob Brown-Bayliss wrote: > Ok, So I did that, in the windows capture file are many many lines of > Red text on a black background, I assume thats a bad thing. It's > every second or third line vs just a handful on the linux capture. > > Most of these lines are like: > > postgresql [ACK] Seq=5429 Ack=##### Win=65700 Len=0 > > Where the ACK number is always different. > > As I said before I really don't know what I am looking at. > While I come from a purely software background, over the years I've had to debug exactly these sorts of situations, and determined that I needed a better understanding of TCP/IP. I found the book "Internetworking with TCP/IP" by Douglas Comer et al extremely useful, particularlly volume II. Understanding how the underlying network protocols function has been handy time after time. I don't pretend to be a network engineer, but I can communicate with them in their own language and thats been extremely helpful too. In Wireshark, the color coding itself is pretty arbitrary, and sometimes excessively lurid :) every IP (internet protocol) packet has a source IP address, a destination IP address, and a protocol. TCP is protocol #6 UDP is 17, etc. every TCP packet has an IP packet header (above) plus a source port, a destination port, a sequence number, a ack value, a offset, some flags, a checksum, etc. once a TCP socket session has been fully established, the general sequence is, one side sends data packets, and the other side acknowleges them[1]. Data packets should be around 1500 bytes each unless a smaller MTU has been negotiated. If one 1500 byte packet was sent and the server then had to wait for an acknowlege before sending another, you'd only be able to send one data packet every (PING TIME) milliseconds, which in yoru case here would be *excruciatingly* slow, so instead, TCP keeps sending data up to RWIN (TCP Recieve Window Size) bytes before it requires an ACK be sent back. That #### value on the ACK is what data its acknowledging, and undoubtably increases with each ACK. The RWIN value is negotiated between the client and the server, and Windows is notorious for using values too small for a wide area long pingtime fast network like this. If you're seeing a lot of ACKs it suggests that the RWIN value may be too low. Of particular importance is the timing of the packets coming from the remote server, do you get several fast, then a delay, or do they come in back to back at wire speed? Its actually quite OK and common for a client to send ACKs back 'early', as soon as it gets a packet to keep the data flowing, the key is if the sender is waiting for those ACKs, then RWIN is too small. You might have 1Mbyte/sec download bandwidth between this server and your client (arbitrary value I picked for sake of discussion). your round trip packet latency is 350mS, I believe you said. thats about 1/3rd of a second, which means about 300Kbytes can be "in the pipe". If your RWIN is, say, 64K (typical value used on a local network), the sender will only be able to send 64k before having to wait 350mS for a ACK. This will slow your pipe down about 5:1. If instead, the RWIN was above 300k, then the client will be able to keep up with the ACKs without the sender having to wait. Generally RWIN should be about twice as high as the data rate * the round trip latency. I haven't kept up with the details, but I definately remember having to manually crank RWIN up to the 500kbyte range as Windows wouldn't automatically go over 64k (in fact, going above 64k requires a TCP extension called "Window Scaling" which wasn't supported by earlier OSs at all) here's a typical sequence... server -> client > data 0 len 1500 {1500 bytes of data} > data 1500 len 1500 {1500 bytes of data} > data 3000 len 1500 {1500 bytes of data} > data 4500 len 1500 {1500 bytes of data} < ACK=3000 >data 6000 len 1500 {1500 bytes of data) >data 7500 len 1500 {1500 bytes of data) < ACK=6000 acking 3000 implicitly acknowleges everything before that. anyways, this article http://en.wikipedia.org/wiki/TCP_Tuning and various linked articles may help you here. and this, http://technet.microsoft.com/en-us/magazine/2007.01.cableguy.aspx explains how to manually crank up a higher RWIN in Windows XP in particular which isn't very smart about it. [1] simplification. sockets are bidirectional, and both ends can send data to the other end, and the acknowledgments for received data can be sent with outbound data packets. Torrent makes extensive use of this.
On 11/06/10 6:27 PM, Rob Brown-Bayliss wrote: > On Sun, Nov 7, 2010 at 1:06 PM, John R Pierce<pierce@hogranch.com> wrote: > >> how about if you do something like, SELECT * FROM SOME_TABLE INTO >> SOME_OTHER_TABLE; which doesn't involve returning data? > In this case the times are as close to equal as to make no difference, > within a couple of ms of each other. About 300ms each. k, so its definitely the data transmission thats hurting you. you said downloads were fine. does that include downloads from this specific server ?
Thanks, I will a look at that latter, but I am not too hopeful as I get the same results on Vista as XP. One thought that has occurred to me is could it be libpq.dll versions? I am not able to test until later on today, but the server is postgresql 8.3 and the XP machine is using libpq frim the 8.4 and the vista is using 9.0 Any way, I will try that this afternoon. On Sun, Nov 7, 2010 at 4:36 PM, John R Pierce <pierce@hogranch.com> wrote: > On 11/06/10 6:13 PM, Rob Brown-Bayliss wrote: >> >> Ok, So I did that, in the windows capture file are many many lines of >> Red text on a black background, I assume thats a bad thing. It's >> every second or third line vs just a handful on the linux capture. >> >> Most of these lines are like: >> >> postgresql [ACK] Seq=5429 Ack=##### Win=65700 Len=0 >> >> Where the ACK number is always different. >> >> As I said before I really don't know what I am looking at. >> > > > While I come from a purely software background, over the years I've had to > debug exactly these sorts of situations, and determined that I needed a > better understanding of TCP/IP. I found the book "Internetworking with > TCP/IP" by Douglas Comer et al extremely useful, particularlly volume II. > Understanding how the underlying network protocols function has been handy > time after time. I don't pretend to be a network engineer, but I can > communicate with them in their own language and thats been extremely helpful > too. > > > In Wireshark, the color coding itself is pretty arbitrary, and sometimes > excessively lurid :) > > > every IP (internet protocol) packet has a source IP address, a destination > IP address, and a protocol. TCP is protocol #6 UDP is 17, etc. > every TCP packet has an IP packet header (above) plus a source port, a > destination port, a sequence number, a ack value, a offset, some flags, a > checksum, etc. > > once a TCP socket session has been fully established, the general sequence > is, one side sends data packets, and the other side acknowleges them[1]. > Data packets should be around 1500 bytes each unless a smaller MTU has been > negotiated. If one 1500 byte packet was sent and the server then had to > wait for an acknowlege before sending another, you'd only be able to send > one data packet every (PING TIME) milliseconds, which in yoru case here > would be *excruciatingly* slow, so instead, TCP keeps sending data up to > RWIN (TCP Recieve Window Size) bytes before it requires an ACK be sent back. > That #### value on the ACK is what data its acknowledging, and undoubtably > increases with each ACK. The RWIN value is negotiated between the client > and the server, and Windows is notorious for using values too small for a > wide area long pingtime fast network like this. > > If you're seeing a lot of ACKs it suggests that the RWIN value may be too > low. Of particular importance is the timing of the packets coming from > the remote server, do you get several fast, then a delay, or do they come in > back to back at wire speed? Its actually quite OK and common for a > client to send ACKs back 'early', as soon as it gets a packet to keep the > data flowing, the key is if the sender is waiting for those ACKs, then RWIN > is too small. > > You might have 1Mbyte/sec download bandwidth between this server and your > client (arbitrary value I picked for sake of discussion). > > your round trip packet latency is 350mS, I believe you said. thats about > 1/3rd of a second, which means about 300Kbytes can be "in the pipe". If > your RWIN is, say, 64K (typical value used on a local network), the sender > will only be able to send 64k before having to wait 350mS for a ACK. This > will slow your pipe down about 5:1. If instead, the RWIN was above 300k, > then the client will be able to keep up with the ACKs without the sender > having to wait. Generally RWIN should be about twice as high as the data > rate * the round trip latency. > > I haven't kept up with the details, but I definately remember having to > manually crank RWIN up to the 500kbyte range as Windows wouldn't > automatically go over 64k (in fact, going above 64k requires a TCP extension > called "Window Scaling" which wasn't supported by earlier OSs at all) > > > here's a typical sequence... > > server -> client >> data 0 len 1500 {1500 bytes of data} >> data 1500 len 1500 {1500 bytes of data} >> data 3000 len 1500 {1500 bytes of data} >> data 4500 len 1500 {1500 bytes of data} > < ACK=3000 >>data 6000 len 1500 {1500 bytes of data) >>data 7500 len 1500 {1500 bytes of data) > < ACK=6000 > > > acking 3000 implicitly acknowleges everything before that. > > anyways, this article http://en.wikipedia.org/wiki/TCP_Tuning and various > linked articles may help you here. > > and this, http://technet.microsoft.com/en-us/magazine/2007.01.cableguy.aspx > explains how to manually crank up a higher RWIN in Windows XP in particular > which isn't very smart about it. > > > > > > [1] simplification. sockets are bidirectional, and both ends can send data > to the other end, and the acknowledgments for received data can be sent with > outbound data packets. Torrent makes extensive use of this. > > > -- -- Rob
On 07/11/10 09:13, Rob Brown-Bayliss wrote: > Ok, So I did that, in the windows capture file are many many lines of > Red text on a black background, I assume thats a bad thing. If you examine the packet it'll say "invalid checksum". This will be because your network card is doing TCP checksum offloading, and you can almost certainly ignore the warning or turn it off. See: http://wiki.wireshark.org/TCP_Checksum_Verification http://www.wireshark.org/docs/wsug_html_chunked/ChAdvChecksums.html > As I said before I really don't know what I am looking at. Use the wireshark throughput graph for a starter. If you don't understand TCP/IP, wireshark probably won't help you much - but then, I learned about TCP/IP with wireshark. Consider doing the same - dive in, read about it, and have a play. I won't be too surprised if the issue turns out to be related to TCP window sizes and window scaling differences between the two machines. -- Craig Ringer Tech-related writing: http://soapyfrogs.blogspot.com/
Hi No Joy. The tcp window stuff didn't make any difference. I tried the unscientific speedtest.net to a server in Canada and all 3 machines (several times each)(. Linux, XP and Vista. The results are all too close to call. Yet psql or pgadmin are 4 or 5 times slower on the windows machines. Dropping back to 8.3 made no difference. The XP machine is a fresh install, it only has MS office and postgresql 8.3 installed, just to test this. I have come across many similar posts on the net, but there were several years ago. Is any one else seeing similar results? On Mon, Nov 8, 2010 at 3:19 PM, Craig Ringer <craig@postnewspapers.com.au> wrote: > On 07/11/10 09:13, Rob Brown-Bayliss wrote: >> Ok, So I did that, in the windows capture file are many many lines of >> Red text on a black background, I assume thats a bad thing. > > If you examine the packet it'll say "invalid checksum". This will be > because your network card is doing TCP checksum offloading, and you can > almost certainly ignore the warning or turn it off. See: > > http://wiki.wireshark.org/TCP_Checksum_Verification > http://www.wireshark.org/docs/wsug_html_chunked/ChAdvChecksums.html > >> As I said before I really don't know what I am looking at. > > Use the wireshark throughput graph for a starter. If you don't > understand TCP/IP, wireshark probably won't help you much - but then, I > learned about TCP/IP with wireshark. Consider doing the same - dive in, > read about it, and have a play. > > I won't be too surprised if the issue turns out to be related to TCP > window sizes and window scaling differences between the two machines. > > -- > Craig Ringer > > Tech-related writing: http://soapyfrogs.blogspot.com/ > -- -- Rob
Further testing shows it is windows networking causing the issue. Copying files to and from the server is 5 to 6 times slower on a Windows client compared to the Linux client. The issue is not specific to libpq. -- Rob
> -----Original Message----- > From: pgsql-general-owner@postgresql.org [mailto:pgsql-general- > owner@postgresql.org] On Behalf Of Rob Brown-Bayliss > Sent: Tuesday, November 09, 2010 8:20 PM > To: pgsql-general@postgresql.org > Subject: Re: [GENERAL] Libpq is very slow on windows but fast on linux. > > Further testing shows it is windows networking causing the issue. > Copying files to and from the server is 5 to 6 times slower on a > Windows client compared to the Linux client. > > The issue is not specific to libpq. Check your Nagle setting. http://support.microsoft.com/kb/138831 See also: http://apocryph.org/2006/07/08/achieving_maximum_tcp_performance_on_windows_xp_2k3/ P.S. Iperf is really useful to see how well things are going as a diagnostic tool: http://sourceforge.net/projects/iperf/files/
On 11/09/10 9:53 PM, Dann Corbit wrote: > Check your Nagle setting. > http://support.microsoft.com/kb/138831 that link works for SNA Server, which is some real obscure stuff, but not the general case try this, http://www.speedguide.net/articles/windows-2kxp-more-tweaks-158 where it says _Gaming Tweak - Disable Nagle's algorithm_ This page is talking about games, but this is the exact same thing. and, I still bet the OP need to increase his tcp recieve window to like 320k or something. with the kind of latency mentioned. * *
On 10 Nov 2010, at 5:19, Rob Brown-Bayliss wrote: > Further testing shows it is windows networking causing the issue. > Copying files to and from the server is 5 to 6 times slower on a > Windows client compared to the Linux client. > > The issue is not specific to libpq. Do both machines have similar hardware? If the Linux machine has a proper NIC (intel, for example) while the Windows machinehas a poor NIC (Realtek!), then of course you would get differences in performance. I'm not saying Microsofts network stack is particularly good, mind you, just that it isn't necessarily the main cause. Oneof the problems Windows is facing is that, while supporting a lot of hardware is a good thing in general, it also supportsall the crap hardware, crap drivers and crap ACPI implementations. Alban Hertroys -- Screwing up is an excellent way to attach something to the ceiling. !DSPAM:737,4cda484310265265216060!
Op 10-11-10 08:22, Alban Hertroys schreef: > On 10 Nov 2010, at 5:19, Rob Brown-Bayliss wrote: > >> Further testing shows it is windows networking causing the issue. >> Copying files to and from the server is 5 to 6 times slower on a >> Windows client compared to the Linux client. >> >> The issue is not specific to libpq. > > Do both machines have similar hardware? If the Linux machine has a proper NIC (intel, for example) while the Windows machine > has a poor NIC (Realtek!), then of course you would get differences in performance. > Not that much, in the past I have seen differences in performance between Intel and Realtek, 20-30% yes, but never by a factor of 5 or 6. Antonio > I'm not saying Microsofts network stack is particularly good, mind you, just that it isn't necessarily the main cause. > One of the problems Windows is facing is that, while supporting a lot of hardware is a good thing in general, it also > supports all the crap hardware, crap drivers and crap ACPI implementations. > > Alban Hertroys > > -- > Screwing up is an excellent way to attach something to the ceiling. > > > !DSPAM:737,4cda484310265265216060! > > >
On 11/09/10 11:22 PM, Alban Hertroys wrote: > Do both machines have similar hardware? If the Linux machine has a > proper NIC (intel, for example) while the Windows machine has a poor > NIC (Realtek!), then of course you would get differences in performance. > I'm not saying Microsofts network stack is particularly good, mind you, just that it isn't necessarily the main cause.One of the problems Windows is facing is that, while supporting a lot of hardware is a good thing in general, it alsosupports all the crap hardware, crap drivers and crap ACPI implementations. He has high speed connections with 350ms ping times to the server thats half way around the world. Windows XP, at least, did not default to an RWIN over 64k. He needs about 300K for that link, give or take, depending on how fast the wires are. He can go into the registry (of the XP client) and bump is RWIN to something larger than pingtime (in seconds) * wirespeed (in byte/sec) ... example: 0.350 ping * 700K byte/sec = ~250K, so use 300k or 400k for TCP Recieve Window Size)