Thread: Query performance

Query performance

From
"Lou O'Quin"
Date:
As a test, I ran a query in the pgAdmin query tool, which returns about 15K records from a PostgreSQL v8.01 table on my Win2K server.

I ran the same query from the local server, from another PC on the same 100 mbit local network, and from a PC on a different network, over the internet.

The times for the query to run and the data to return for each of the three
locations are shown here:
  • Local Server : 571+521 ms
  • Local network: 1187+1266 ms
  • Internet:14579+4016 ms
My question is this: Why does the execution time for the query to run increase so much? Since the query should be running on the server, it's time should be somewhat independent of the network transport delay. (unlike the data transport time) However, it appears to actually be hypersensitive to the transport delay. The ratios of time for the data transport (assuming 1 for the local server) are:
1 : 2.43 : 7.71

whereas the query execution time ratios are:
1 : 2.08 : 25.5  (!!!)

Obviously, the transport times will be greater.  But why does the execution time bloat so?

Re: Query performance

From
Tom Lane
Date:
"Lou O'Quin" <loquin@talleyds.com> writes:
> it appears to actually be hypersensitive to the transport delay. The =
> ratios of time for the data transport (assuming 1 for the local server) =
> are:
> 1 : 2.43 : 7.71

> whereas the query execution time ratios are:
> 1 : 2.08 : 25.5  (!!!)

How do you know that's what the data transport time is --- ie, how can
you measure that separately from the total query time?

            regards, tom lane

Re: Query performance

From
"Lou O'Quin"
Date:
Hi Tom.  I referenced the status line of pgAdmin.  Per the pgAdmin help file:
 
"The status line will show how long the last query took to complete. If a dataset was returned, not only the elapsed time for server execution is displayed, but also the time to retrieve the data from the server to the Data Output page."
 
Lou

>>> Tom Lane <tgl@sss.pgh.pa.us> 3/11/2005 12:10 PM >>>
"Lou O'Quin" <loquin@talleyds.com> writes:
> it appears to actually be hypersensitive to the transport delay. The =
> ratios of time for the data transport (assuming 1 for the local server) =
> are:
> 1 : 2.43 : 7.71

> whereas the query execution time ratios are:
> 1 : 2.08 : 25.5  (!!!)

How do you know that's what the data transport time is --- ie, how can
you measure that separately from the total query time?

            regards, tom lane

Re: Query performance

From
Tom Lane
Date:
"Lou O'Quin" <loquin@talleyds.com> writes:
> Hi Tom.  I referenced the status line of pgAdmin.  Per the pgAdmin help
> file:
>
> "The status line will show how long the last query took to complete. If a
> dataset was returned, not only the elapsed time for server execution is
> displayed, but also the time to retrieve the data from the server to the
> Data Output page."

Well, you should probably ask the pgadmin boys exactly what they are
measuring.  In any case, the Postgres server overlaps query execution
with result sending, so I don't think it's possible to get a pure
measurement of just one of those costs --- certainly not by looking at
it only from the client end.

BTW, one factor to consider is that if the test client machines weren't
all the same speed, that would have some impact on their ability to
absorb 15K records ...

            regards, tom lane

Re: Query performance

From
"Lou O'Quin"
Date:
I'll post there concerning how they determine the query execution time vs. data retrieval time.
 
I did think about the processor/memory when choosing the machines - all three of the processors are similar.  All are Pentium P4s with 512 MB memory.
the server is Win2K, P4, 2.3 gHz
the local network client  is a WinXP Pro, P4, 2.2 gHz
the remote network client is WinXP Pro, P4, 1.9 gHz
 
Lou

>>> Tom Lane <tgl@sss.pgh.pa.us> 3/11/2005 1:21 PM >>>
"Lou O'Quin" <loquin@talleyds.com> writes:
> Hi Tom.  I referenced the status line of pgAdmin.  Per the pgAdmin help
> file:
>
> "The status line will show how long the last query took to complete. If a
> dataset was returned, not only the elapsed time for server execution is
> displayed, but also the time to retrieve the data from the server to the
> Data Output page."

Well, you should probably ask the pgadmin boys exactly what they are
measuring.  In any case, the Postgres server overlaps query execution
with result sending, so I don't think it's possible to get a pure
measurement of just one of those costs --- certainly not by looking at
it only from the client end.

BTW, one factor to consider is that if the test client machines weren't
all the same speed, that would have some impact on their ability to
absorb 15K records ...

            regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
      joining column's datatypes do not match

Re: Query performance

From
"Steinar H. Gunderson"
Date:
antonio.batovanja@humanomed.co.at wrote:
> The above query takes 5 seconds to execute!
>
> [...]
>
>  Total runtime: 96109.571 ms

It sure doesn't look like it...

>  Total runtime: 461.907 ms
>
> [...]
>
> Suddenly the query takes only 0.29 seconds!

How are you timing this, really?

/* Steinar */
--
Homepage: http://www.sesse.net/

Re: Query performance

From
Antonio Batovanja
Date:
Steinar H. Gunderson wrote:

> antonio.batovanja@humanomed.co.at wrote:
>> The above query takes 5 seconds to execute!
>>
>> [...]
>>
>>  Total runtime: 96109.571 ms
>
> It sure doesn't look like it...
>
>>  Total runtime: 461.907 ms
>>
>> [...]
>>
>> Suddenly the query takes only 0.29 seconds!
>
> How are you timing this, really?
>
> /* Steinar */

I'm executing the queries from phpPgAdmin.
The above are for explain analyse. I was referring to the pure query
execution time.
Does anyone have an idea why the OR-query takes so long?
Any server-side tuning possibilities? I wouldn't like to change the code of
ldap's back-sql...

Toni

Re: Query performance

From
Erwin Brandstetter
Date:
Antonio Batovanja wrote:
(...)

> 1) the slooooow query:
> EXPLAIN ANALYZE SELECT DISTINCT ldap_entries.id, organization.id,
> text('organization') AS objectClass, ldap_entries.dn AS dn FROM
> ldap_entries, organization, ldap_entry_objclasses WHERE
> organization.id=ldap_entries.keyval AND ldap_entries.oc_map_id=1 AND
> upper(ldap_entries.dn) LIKE '%DC=HUMANOMED,DC=AT' AND 1=1 OR
> (ldap_entries.id=ldap_entry_objclasses.entry_id AND
> ldap_entry_objclasses.oc_name='organization');


First, presenting your query in any readable form might be helpful if
you want the community to help you. (Hint! Hint!)

SELECT DISTINCT ldap_entries.id, organization.id,
    text('organization') AS objectClass, ldap_entries.dn AS dn
   FROM ldap_entries, organization, ldap_entry_objclasses
  WHERE organization.id=ldap_entries.keyval
    AND ldap_entries.oc_map_id=1
    AND upper(ldap_entries.dn) LIKE '%DC=HUMANOMED,DC=AT'
    AND 1=1
    OR (ldap_entries.id=ldap_entry_objclasses.entry_id
    AND ldap_entry_objclasses.oc_name='organization');

Next, you might want to use aliases to make it more readable.

SELECT DISTINCT e.id, o.id, text('organization') AS objectClass, e.dn AS dn
   FROM ldap_entries AS e, organization AS o, ldap_entry_objclasses AS eo
  WHERE o.id=e.keyval
    AND e.oc_map_id=1
    AND upper(e.dn) LIKE '%DC=HUMANOMED,DC=AT'
    AND 1=1
    OR (e.id=eo.entry_id
    AND eo.oc_name='organization');

There are a couple redundant (nonsensical) items, syntax-wise. Let's
strip these:

SELECT DISTINCT e.id, o.id, text('organization') AS objectClass, e.dn
   FROM ldap_entries AS e, organization AS o, ldap_entry_objclasses AS eo
  WHERE o.id=e.keyval
    AND e.oc_map_id=1
    AND e.dn ILIKE '%DC=HUMANOMED,DC=AT'
     OR e.id=eo.entry_id
    AND eo.oc_name='organization';

And finally, I suspect the lexical precedence of AND and OR might be the
issue here.
http://www.postgresql.org/docs/8.1/static/sql-syntax.html#SQL-PRECEDENCE
Maybe that is what you really want (just guessing):

SELECT DISTINCT e.id, o.id, text('organization') AS objectClass, e.dn
   FROM ldap_entries e
   JOIN organization o ON o.id=e.keyval
   LEFT JOIN ldap_entry_objclasses eo ON eo.entry_id=e.id
  WHERE e.oc_map_id=1
    AND e.dn ILIKE '%DC=HUMANOMED,DC=AT'
     OR eo.oc_name='organization)';

I didn't take the time to read the rest. My appologies if I guessed wrong.


Regards, Erwin

Re: Query performance

From
Erwin Brandstetter
Date:
Antonio Batovanja wrote:
> Laurenz Albe wrote:
>
>> Antonio Batovanja wrote:
>>> I'm having trouble understanding, why a specific query on a small
>>> database is taking so long...
>>>
>> Before I try to understand the execution plans:
>>
>> Have you run ANALYZE on the tables involved before you ran the query?
>
> Hi,
>
> Just to be on the safe side, I've run ANALYZE now.
> Here are the query plans for the two queries:

I suspect a misunderstanding here. What Laurenz probably meant is to run
  analyze on the involved _tables_ so the statistics data is refreshed.
If the query planner runs with outdated statistics, queries may perform
very poorly. Try

    vacuum full analyze yourdatabase

To fully vacuum your database and analyze all tables.
(vacuum full is extra, but can't hurt.)

http://www.postgresql.org/docs/8.1/static/sql-vacuum.html
http://www.postgresql.org/docs/8.1/static/sql-analyze.html

Regards, Erwin

Re: Query performance

From
Christopher Kings-Lynne
Date:
> I'm executing the queries from phpPgAdmin.
> The above are for explain analyse. I was referring to the pure query
> execution time.
> Does anyone have an idea why the OR-query takes so long?
> Any server-side tuning possibilities? I wouldn't like to change the code of
> ldap's back-sql...

If you're using phpPgAdmin's timings, they could be more off than the
real explain analyze timings.  Make sure you're using the figure given
by explain analyze itself.

Chris