Thread: Slow query execution over WAN network
Hi, I experience really slow query execution over remote networks, especially for a query which returns a lot of columns (~255) caused by joins. Executing the same query over different networks gives the following times: Local: 20ms DSL: 400ms UMTS: 1500ms The query returns 12 rows, and 255 column - most of them with only very little data. A wireshark-snapshot of a request is only ~30kb large, so I guess there isn't a lot of data transported. I've uploaded a netbeans-profiler screenshot as well as a wireshark profile to: http://93.190.88.182/jdbc_traffic.wireshark http://93.190.88.182/jdbc_profile.png Any idea whats causing the high latency? Are there multiple round-trips going on here? I already tried adjusting fetch size, however didn't change anything. Thank you in advance, Clemens PS: I've asked the question already on pgsql-performance, unfourtunatly there wasn't a lot of feedback. Sorry for cross-posting.
Clemens Eisserer <linuxhippy@gmail.com> Thursday 24 February 2011 15:56:10 > Hi, > > I experience really slow query execution over remote networks, > especially for a query which returns a lot of columns (~255) caused by > joins. > > Executing the same query over different networks gives the following times: > Local: 20ms > DSL: 400ms > UMTS: 1500ms > > The query returns 12 rows, and 255 column - most of them with only > very little data. A wireshark-snapshot of a request is only ~30kb > large, so I guess there isn't a lot of data transported. > > I've uploaded a netbeans-profiler screenshot as well as a wireshark profile > to: http://93.190.88.182/jdbc_traffic.wireshark > http://93.190.88.182/jdbc_profile.png > > Any idea whats causing the high latency? Are there multiple > round-trips going on here? > I already tried adjusting fetch size, however didn't change anything. > > Thank you in advance, Clemens > > PS: I've asked the question already on pgsql-performance, > unfourtunatly there wasn't a lot of feedback. Sorry for cross-posting. Hi, You truncated times in profiler window. May I ask what time was measured for socket.read(), and what kind of connection did you used for this (dsl, umts), connection speed and min/max/avg ping time to database server? What other operations are performed inside benchamrkQuery(), commit/rollback, connection open? Did fetch size wa,s when you was making this test, enaught to get all records at once? It is highly probable that something is wrong. Regards, Radek
On Thu, Feb 24, 2011 at 9:56 AM, Clemens Eisserer <linuxhippy@gmail.com> wrote: > Hi, > > I experience really slow query execution over remote networks, > especially for a query which returns a lot of columns (~255) caused by > joins. > > Executing the same query over different networks gives the following times: > Local: 20ms > DSL: 400ms > UMTS: 1500ms > > The query returns 12 rows, and 255 column - most of them with only > very little data. A wireshark-snapshot of a request is only ~30kb > large, so I guess there isn't a lot of data transported. > > I've uploaded a netbeans-profiler screenshot as well as a wireshark profile to: > http://93.190.88.182/jdbc_traffic.wireshark > http://93.190.88.182/jdbc_profile.png > > Any idea whats causing the high latency? Are there multiple > round-trips going on here? > I already tried adjusting fetch size, however didn't change anything. > > Thank you in advance, Clemens > > PS: I've asked the question already on pgsql-performance, > unfourtunatly there wasn't a lot of feedback. Sorry for cross-posting. The problem is the data coming back I'm sure the WAN is slower than your DSL. Dave
Hi, I tried to analyze it further, this time the query executed over UMTS - where it took 2000ms. (ping tp server 130ms, about 2mbit/s down): http://93.190.88.182/wireshark_umts.tcpdump 0-490ms: Some TCP magic going on (tcp segment of a reassembled PDU??) 491 ms - Postgres query is sent 680ms - 1472ms - TCP ACKs 1642ms - Postgres - parse/bind completition 1873ms - Postgres - Row description 1893ms - First data Row Row / ACK 2053ms - Last data Row What makes me wonder is: - Why does it take 500ms until the query is sent, what is that tcp stuff happening before - It seems until 1472ms the server is sending ACKs for the query request, so does it take ~1000ms for the server to receive the query? So is the query-upload the problem? - Actual Row-Data transmission only takes 150ms. If its indeed the query-data upload, are there any options to compress the stream between the jdbc-driver and postgres-server? Thanks, Clemens > You truncated times in profiler window. May I ask what time was measured for > socket.read(), and what kind of connection did you used for this (dsl, umts), > connection speed and min/max/avg ping time to database server? Sorry, I've deleted the profile - but about 70% were spent in socket.read(). Ping to the db-server was about ~30ms, it was a 8mbit ADSL line with 1M upstream. Looking at the profile it seems reading back the data (~25kb/request) is the problem. > What other operations are performed inside benchamrkQuery(), commit/rollback, > connection open? Nothing, its just a: > ResultSet rs = st.executeQuery(query); > while (rs.next()); > Did fetch size wa,s when you was making this test, enaught to get all records > at once? Yes, I tried setting it higher a bit but nothing changed. However the query only returned 12 rows - so I guess it should be covered by default fetch size. Thanks, Clemens
btw, a simple "select count(*) from table" takes 140ms over the same umts connection. - Clemens
Clemens Eisserer <linuxhippy@gmail.com> Thursday 24 February 2011 22:06:13 > Hi, > > I tried to analyze it further, this time the query executed over UMTS > - where it took 2000ms. (ping tp server 130ms, about 2mbit/s down): > http://93.190.88.182/wireshark_umts.tcpdump > > 0-490ms: Some TCP magic going on (tcp segment of a reassembled PDU??) > 491 ms - Postgres query is sent > 680ms - 1472ms - TCP ACKs > 1642ms - Postgres - parse/bind completition > 1873ms - Postgres - Row description > 1893ms - First data Row > Row / ACK > 2053ms - Last data Row > > What makes me wonder is: > - Why does it take 500ms until the query is sent, what is that tcp > stuff happening before > - It seems until 1472ms the server is sending ACKs for the query > request, so does it take ~1000ms for the server to receive the query? > So is the query-upload the problem? > - Actual Row-Data transmission only takes 150ms. > > If its indeed the query-data upload, are there any options to compress > the stream between the jdbc-driver and postgres-server? > > Thanks, Clemens > > > You truncated times in profiler window. May I ask what time was measured > > for socket.read(), and what kind of connection did you used for this > > (dsl, umts), connection speed and min/max/avg ping time to database > > server? > > Sorry, I've deleted the profile - but about 70% were spent in > socket.read(). Ping to the db-server was about ~30ms, it was a 8mbit ADSL > line with 1M upstream. > Looking at the profile it seems reading back the data (~25kb/request) > is the problem. > > > What other operations are performed inside benchamrkQuery(), > > commit/rollback, connection open? > > Nothing, its just a: > > ResultSet rs = st.executeQuery(query); > > while (rs.next()); > > > > Did fetch size wa,s when you was making this test, enaught to get all > > records at once? > > Yes, I tried setting it higher a bit but nothing changed. > However the query only returned 12 rows - so I guess it should be > covered by default fetch size. > > Thanks, Clemens I thought about ACK, and all of this TCP stuff. Query execution should take no more then ping*4 + processing time. 1sek for send 8k and receive 30k is to big value. Execution of such simple query, with base types, works in same way like for HTTP request, request and response. No itermediate talking. Tomorrow I check exactly TCP dump. What version of PG do you use? Regards, Radek
Rados*aw Smogura<rsmogura@softperience.eu> wrote: > I thought about ACK, and all of this TCP stuff. Hopefully we're disabling Nagle's algorithm by setting TCP_NODELAY. I have seen it cause this sort of problem with other DBMS products. -Kevin
Kevin Grittner wrote: > Rados*aw Smogura<rsmogura@softperience.eu> wrote: > >> I thought about ACK, and all of this TCP stuff. > > Hopefully we're disabling Nagle's algorithm by setting TCP_NODELAY. > I have seen it cause this sort of problem with other DBMS products. The driver does disable Nagle. Oliver
Clemens Eisserer wrote: > Hi, > > I experience really slow query execution over remote networks, > especially for a query which returns a lot of columns (~255) caused by > joins. > > Executing the same query over different networks gives the following times: > Local: 20ms > DSL: 400ms > UMTS: 1500ms > > The query returns 12 rows, and 255 column - most of them with only > very little data. A wireshark-snapshot of a request is only ~30kb > large, so I guess there isn't a lot of data transported. > > I've uploaded a netbeans-profiler screenshot as well as a wireshark profile to: > http://93.190.88.182/jdbc_traffic.wireshark > http://93.190.88.182/jdbc_profile.png > > Any idea whats causing the high latency? Are there multiple > round-trips going on here? > I already tried adjusting fetch size, however didn't change anything. I only see the query taking 400ms or so in the wireshark capture you attached, so I assume this the DSL case? There's only one round trip happening at the application level (client sends Parse/Bind/Describe/Execute/Sync in one go; server sends ParseComplete/BindComplete/RowDescription/DataRows/CommandComplete/ReadyForQuery in response) so there's not much else the driver can do here, you're at the mercy of how fast TCP can push that data around. What are the sustained data rates and latency you see for your DSL and UMTS connections? You have about 10kB to send to the server; and about 23kB to receive from the server. I suspect much of the query time is explained just by your connection speed. 33kB is not a "lot of data" for a LAN, but you'd be surprised how long it takes to push even small amounts of data across a WAN, especially something asymmetric like DSL/UMTS. In the DSL case it takes around ~230ms just to get the query to the server (this is about right for an upstream data rate of ~400-500Kbit); then the query itself takes ~80ms to execute (seems high, but it's hard to break that latency down into query execution vs. network latency); then ~105ms to get the results back to the client, which is a data rate of around 2.2Mbit. TCP is probably still in slow-start too, which won't be helping. I would guess that the UMTS case is so much worse because of a combination of higher base latency + slower data rates. You might want to look at a wireshark capture on the server side, too - with only one side, it's hard to tell the difference between inherent latency in the connection, time the server spends processing things, and latency due to pushing data around. Also you can miss dropped packets / retransmissions unless you look at both sides. Oliver
Hi, > I only see the query taking 400ms or so in the wireshark capture you > attached, so I assume this the DSL case? Right, this was the DSL case. > In the DSL case it takes around ~230ms just to get the query to the server > (this is about right for an upstream data rate of ~400-500Kbit); then the > query itself takes ~80ms to execute (seems high, but it's hard to break that > latency down into query execution vs. network latency); then ~105ms to get > the results back to the client, which is a data rate of around 2.2Mbit. TCP > is probably still in slow-start too, which won't be helping. > > I would guess that the UMTS case is so much worse because of a combination > of higher base latency + slower data rates. You are right - it was really a bandwith issue, especially with UMTS were upload is extremly slow. Port-Forewarded over a compressed ssh-tunnel brought the time down to about 1/8 - instead of 2000ms the query executes in ~220ms - and ssh statistics report data was compressed down to ~10-15%. This is great news, without any code-changes I can push response latencies back to an acceptable value =) Thanks a lot, Clemens