Thread: Slow query execution over WAN network

Slow query execution over WAN network

From
Clemens Eisserer
Date:
Hi,

I experience really slow query execution over remote networks,
especially for a query which returns a lot of columns (~255) caused by
joins.

Executing the same query over different networks gives the following times:
Local:      20ms
DSL:      400ms
UMTS: 1500ms

The query returns 12 rows, and 255 column - most of them with only
very little data. A wireshark-snapshot of a request is only ~30kb
large, so I guess there isn't a lot of data transported.

I've uploaded a netbeans-profiler screenshot as well as a wireshark profile to:
http://93.190.88.182/jdbc_traffic.wireshark
http://93.190.88.182/jdbc_profile.png

Any idea whats causing the high latency? Are there multiple
round-trips going on here?
I already tried adjusting fetch size, however didn't change anything.

Thank you in advance, Clemens

PS:  I've asked the question already on pgsql-performance,
unfourtunatly there wasn't a lot of feedback. Sorry for cross-posting.

Re: Slow query execution over WAN network

From
Radosław Smogura
Date:
Clemens Eisserer <linuxhippy@gmail.com> Thursday 24 February 2011 15:56:10
> Hi,
>
> I experience really slow query execution over remote networks,
> especially for a query which returns a lot of columns (~255) caused by
> joins.
>
> Executing the same query over different networks gives the following times:
> Local:      20ms
> DSL:      400ms
> UMTS: 1500ms
>
> The query returns 12 rows, and 255 column - most of them with only
> very little data. A wireshark-snapshot of a request is only ~30kb
> large, so I guess there isn't a lot of data transported.
>
> I've uploaded a netbeans-profiler screenshot as well as a wireshark profile
> to: http://93.190.88.182/jdbc_traffic.wireshark
> http://93.190.88.182/jdbc_profile.png
>
> Any idea whats causing the high latency? Are there multiple
> round-trips going on here?
> I already tried adjusting fetch size, however didn't change anything.
>
> Thank you in advance, Clemens
>
> PS:  I've asked the question already on pgsql-performance,
> unfourtunatly there wasn't a lot of feedback. Sorry for cross-posting.
Hi,

You truncated times in profiler window. May I ask what time was measured for
socket.read(), and what kind of connection did you used for this (dsl, umts),
connection speed and min/max/avg ping time to database server?

What other operations are performed inside benchamrkQuery(), commit/rollback,
connection open?
Did fetch size wa,s when you was making this test, enaught to get all records
at once?

It is highly probable that something is wrong.

Regards,
Radek

Re: Slow query execution over WAN network

From
Dave Cramer
Date:
On Thu, Feb 24, 2011 at 9:56 AM, Clemens Eisserer <linuxhippy@gmail.com> wrote:
> Hi,
>
> I experience really slow query execution over remote networks,
> especially for a query which returns a lot of columns (~255) caused by
> joins.
>
> Executing the same query over different networks gives the following times:
> Local:      20ms
> DSL:      400ms
> UMTS: 1500ms
>
> The query returns 12 rows, and 255 column - most of them with only
> very little data. A wireshark-snapshot of a request is only ~30kb
> large, so I guess there isn't a lot of data transported.
>
> I've uploaded a netbeans-profiler screenshot as well as a wireshark profile to:
> http://93.190.88.182/jdbc_traffic.wireshark
> http://93.190.88.182/jdbc_profile.png
>
> Any idea whats causing the high latency? Are there multiple
> round-trips going on here?
> I already tried adjusting fetch size, however didn't change anything.
>
> Thank you in advance, Clemens
>
> PS:  I've asked the question already on pgsql-performance,
> unfourtunatly there wasn't a lot of feedback. Sorry for cross-posting.

The problem is the data coming back I'm sure the WAN is slower than your DSL.

Dave

Re: Slow query execution over WAN network

From
Clemens Eisserer
Date:
Hi,

I tried to analyze it further, this time the query executed over UMTS
- where it took 2000ms. (ping tp server 130ms, about 2mbit/s down):
http://93.190.88.182/wireshark_umts.tcpdump

0-490ms: Some TCP magic going on (tcp segment of a reassembled PDU??)
491 ms - Postgres query is sent
680ms - 1472ms - TCP ACKs
1642ms - Postgres - parse/bind completition
1873ms - Postgres - Row description
1893ms - First data Row
Row / ACK
2053ms - Last data Row

What makes me wonder is:
- Why does it take 500ms until the query is sent, what is that tcp
stuff happening before
- It seems until 1472ms the server is sending ACKs for the query
request, so does it take ~1000ms for the server to receive the query?
So is the query-upload the problem?
- Actual Row-Data transmission only takes 150ms.

If its indeed the query-data upload, are there any options to compress
the stream between the jdbc-driver and postgres-server?

Thanks, Clemens


> You truncated times in profiler window. May I ask what time was measured for
> socket.read(), and what kind of connection did you used for this (dsl, umts),
> connection speed and min/max/avg ping time to database server?

Sorry, I've deleted the profile - but about 70% were spent in socket.read().
Ping to the db-server was about ~30ms, it was a 8mbit ADSL line with
1M upstream.
Looking at the profile it seems reading back the data (~25kb/request)
is the problem.

> What other operations are performed inside benchamrkQuery(), commit/rollback,
> connection open?

Nothing, its just a:
> ResultSet rs = st.executeQuery(query);
> while (rs.next());

> Did fetch size wa,s when you was making this test, enaught to get all records
> at once?
Yes, I tried setting it higher a bit but nothing changed.
However the query only returned 12 rows - so I guess it should be
covered by default fetch size.

Thanks, Clemens

Re: Slow query execution over WAN network

From
Clemens Eisserer
Date:
btw, a simple "select count(*) from table" takes 140ms over the same
umts connection.

- Clemens

Re: Slow query execution over WAN network

From
Radosław Smogura
Date:
Clemens Eisserer <linuxhippy@gmail.com> Thursday 24 February 2011 22:06:13
> Hi,
>
> I tried to analyze it further, this time the query executed over UMTS
> - where it took 2000ms. (ping tp server 130ms, about 2mbit/s down):
> http://93.190.88.182/wireshark_umts.tcpdump
>
> 0-490ms: Some TCP magic going on (tcp segment of a reassembled PDU??)
> 491 ms - Postgres query is sent
> 680ms - 1472ms - TCP ACKs
> 1642ms - Postgres - parse/bind completition
> 1873ms - Postgres - Row description
> 1893ms - First data Row
> Row / ACK
> 2053ms - Last data Row
>
> What makes me wonder is:
> - Why does it take 500ms until the query is sent, what is that tcp
> stuff happening before
> - It seems until 1472ms the server is sending ACKs for the query
> request, so does it take ~1000ms for the server to receive the query?
> So is the query-upload the problem?
> - Actual Row-Data transmission only takes 150ms.
>
> If its indeed the query-data upload, are there any options to compress
> the stream between the jdbc-driver and postgres-server?
>
> Thanks, Clemens
>
> > You truncated times in profiler window. May I ask what time was measured
> > for socket.read(), and what kind of connection did you used for this
> > (dsl, umts), connection speed and min/max/avg ping time to database
> > server?
>
> Sorry, I've deleted the profile - but about 70% were spent in
> socket.read(). Ping to the db-server was about ~30ms, it was a 8mbit ADSL
> line with 1M upstream.
> Looking at the profile it seems reading back the data (~25kb/request)
> is the problem.
>
> > What other operations are performed inside benchamrkQuery(),
> > commit/rollback, connection open?
>
> Nothing, its just a:
> > ResultSet rs = st.executeQuery(query);
> > while (rs.next());
> >
> > Did fetch size wa,s when you was making this test, enaught to get all
> > records at once?
>
> Yes, I tried setting it higher a bit but nothing changed.
> However the query only returned 12 rows - so I guess it should be
> covered by default fetch size.
>
> Thanks, Clemens

I thought about ACK, and all of this TCP stuff. Query execution should take no
more then ping*4 + processing time. 1sek for send 8k and receive 30k is to big
value. Execution of such simple query, with base types, works in same way like
for HTTP request, request and response. No itermediate talking.

Tomorrow I check exactly TCP dump.

What version of PG do you use?

Regards,
Radek

Re: Slow query execution over WAN network

From
"Kevin Grittner"
Date:
Rados*aw Smogura<rsmogura@softperience.eu> wrote:

> I thought about ACK, and all of this TCP stuff.

Hopefully we're disabling Nagle's algorithm by setting TCP_NODELAY.
I have seen it cause this sort of problem with other DBMS products.

-Kevin

Re: Slow query execution over WAN network

From
Oliver Jowett
Date:
Kevin Grittner wrote:
> Rados*aw Smogura<rsmogura@softperience.eu> wrote:
>
>> I thought about ACK, and all of this TCP stuff.
>
> Hopefully we're disabling Nagle's algorithm by setting TCP_NODELAY.
> I have seen it cause this sort of problem with other DBMS products.

The driver does disable Nagle.

Oliver

Re: Slow query execution over WAN network

From
Oliver Jowett
Date:
Clemens Eisserer wrote:
> Hi,
>
> I experience really slow query execution over remote networks,
> especially for a query which returns a lot of columns (~255) caused by
> joins.
>
> Executing the same query over different networks gives the following times:
> Local:      20ms
> DSL:      400ms
> UMTS: 1500ms
>
> The query returns 12 rows, and 255 column - most of them with only
> very little data. A wireshark-snapshot of a request is only ~30kb
> large, so I guess there isn't a lot of data transported.
>
> I've uploaded a netbeans-profiler screenshot as well as a wireshark profile to:
> http://93.190.88.182/jdbc_traffic.wireshark
> http://93.190.88.182/jdbc_profile.png
>
> Any idea whats causing the high latency? Are there multiple
> round-trips going on here?
> I already tried adjusting fetch size, however didn't change anything.

I only see the query taking 400ms or so in the wireshark capture you
attached, so I assume this the DSL case?

There's only one round trip happening at the application level (client
sends Parse/Bind/Describe/Execute/Sync in one go; server sends
ParseComplete/BindComplete/RowDescription/DataRows/CommandComplete/ReadyForQuery
in response) so there's not much else the driver can do here, you're at
the mercy of how fast TCP can push that data around.

What are the sustained data rates and latency you see for your DSL and
UMTS connections? You have about 10kB to send to the server; and about
23kB to receive from the server. I suspect much of the query time is
explained just by your connection speed. 33kB is not a "lot of data" for
a LAN, but you'd be surprised how long it takes to push even small
amounts of data across a WAN, especially something asymmetric like DSL/UMTS.

In the DSL case it takes around ~230ms just to get the query to the
server (this is about right for an upstream data rate of ~400-500Kbit);
then the query itself takes ~80ms to execute (seems high, but it's hard
to break that latency down into query execution vs. network latency);
then ~105ms to get the results back to the client, which is a data rate
of around 2.2Mbit. TCP is probably still in slow-start too, which won't
be helping.

I would guess that the UMTS case is so much worse because of a
combination of higher base latency + slower data rates.

You might want to look at a wireshark capture on the server side, too -
with only one side, it's hard to tell the difference between inherent
latency in the connection, time the server spends processing things, and
latency due to pushing data around. Also you can miss dropped packets /
retransmissions unless you look at both sides.

Oliver


Re: Slow query execution over WAN network

From
Clemens Eisserer
Date:
Hi,

> I only see the query taking 400ms or so in the wireshark capture you
> attached, so I assume this the DSL case?

Right, this was the DSL case.


> In the DSL case it takes around ~230ms just to get the query to the server
> (this is about right for an upstream data rate of ~400-500Kbit); then the
> query itself takes ~80ms to execute (seems high, but it's hard to break that
> latency down into query execution vs. network latency); then ~105ms to get
> the results back to the client, which is a data rate of around 2.2Mbit. TCP
> is probably still in slow-start too, which won't be helping.
>
> I would guess that the UMTS case is so much worse because of a combination
> of higher base latency + slower data rates.

You are right - it was really a bandwith issue, especially with UMTS
were upload is extremly slow.
Port-Forewarded over a compressed ssh-tunnel brought the time down to
about 1/8 - instead of 2000ms the query executes in ~220ms - and ssh
statistics report data was compressed down to ~10-15%.

This is great news, without any code-changes I can push response
latencies back to an acceptable value =)

Thanks a lot, Clemens