Thread: How to interpret this explain analyse?

How to interpret this explain analyse?

From

"Joost Kraaijeveld"

Date:

11 February 2005, 09:19:13

Hi all,

A question on how to read and interpret the explain analyse statement (and what to do)

I have a query "SELECT A.ordernummer, B.klantnummer FROM orders A LEFT OUTER JOIN klt_alg B ON
A.Klantnummer=B.KlantnummerORDER BY A.klantnummer;" 

Both tables have an btree index on klantnummer (int4, the column the join is on). I have vacuumed and analyzed both
tables.The explain analyse is: 

QUERY PLAN
Sort  (cost=220539.32..223291.41 rows=1100836 width=12) (actual time=51834.128..56065.126 rows=1104380 loops=1)
  Sort Key: a.klantnummer
  ->  Hash Left Join  (cost=41557.43..110069.51 rows=1100836 width=12) (actual time=21263.858..42845.158 rows=1104380
loops=1)
        Hash Cond: (""outer"".klantnummer = ""inner"".klantnummer)
        ->  Seq Scan on orders a  (cost=0.00..46495.36 rows=1100836 width=8) (actual time=5.986..7378.488 rows=1104380
loops=1)
        ->  Hash  (cost=40635.14..40635.14 rows=368914 width=4) (actual time=21256.683..21256.683 rows=0 loops=1)
              ->  Seq Scan on klt_alg b  (cost=0.00..40635.14 rows=368914 width=4) (actual time=8.880..18910.120
rows=368914loops=1) 
Total runtime: 61478.077 ms


Questions:
  ->  Hash Left Join  (cost=41557.43..110069.51 rows=1100836 width=12) (actual time=21263.858..42845.158 rows=1104380
loops=1)

0. What exactly are the numbers in "cost=41557.43..110069.51" ( I assume for the other questions that 41557.43 is the
estimatedMS the query will take, what are the others)? 

1. I assume that (cost=41557.43..110069.51 rows=1100836 width=12) is the estimated cost and (actual
time=21263.858..42845.158rows=1104380 loops=1) the actual cost. Is the difference acceptable? 

2. If not, what can I do about it?

Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: J.Kraaijeveld@Askesis.nl
web: www.askesis.nl

Re: How to interpret this explain analyse?

From

Richard Huxton

Date:

11 February 2005, 10:20:22

Joost Kraaijeveld wrote:
> Hi all,
>
> A question on how to read and interpret the explain analyse statement
> (and what to do)
>
> I have a query "SELECT A.ordernummer, B.klantnummer FROM orders A
> LEFT OUTER JOIN klt_alg B ON A.Klantnummer=B.Klantnummer ORDER BY
> A.klantnummer;"
>
> Both tables have an btree index on klantnummer (int4, the column the
> join is on). I have vacuumed and analyzed both tables. The explain
> analyse is:

Indexes not necessarily useful here since you're fetching all rows in A
and presumably much of B

Sort
   Hash Left Join
     Seq Scan on orders a
     Hash
       Seq Scan on klt_alg b

I've trimmed the above from your explain output. It's sequentially
scanning "b" and using a hash to join to "a" before sorting the results.

> Questions: ->  Hash Left Join  (cost=41557.43..110069.51 rows=1100836
> width=12) (actual time=21263.858..42845.158 rows=1104380 loops=1)
>
> 0. What exactly are the numbers in "cost=41557.43..110069.51" ( I
> assume for the other questions that 41557.43 is the estimated MS the
> query will take, what are the others)?

The cost numbers represent "effort" rather than time. They're only
really useful in that you can compare one part of the query to another.
There are two numbers because the first shows startup, the second final
time. So - the "outer" parts of the query will have increasing startup
values since the "inner" parts will have to do their work first.

The "actual time" is measured in ms, but remember to multiply it by the
"loops" value. Oh, and actually measuring the time slows the query down too.

> 1. I assume that (cost=41557.43..110069.51 rows=1100836 width=12) is
> the estimated cost and (actual time=21263.858..42845.158 rows=1104380
> loops=1) the actual cost. Is the difference acceptable?
>
> 2. If not, what can I do about it?

The key thing to look for here is the number of rows. If PG expects say
100 rows but there are instead 10,000 then it may choose the wrong plan.
In this case the estimate is 1,100,836 and the actual is 1,104,380 -
very close.

--
   Richard Huxton
   Archonet Ltd

Re: How to interpret this explain analyse?

From

Tom Lane

Date:

11 February 2005, 16:18:52

Richard Huxton <dev@archonet.com> writes:
> Joost Kraaijeveld wrote:
>> 2. If not, what can I do about it?

> The key thing to look for here is the number of rows. If PG expects say
> 100 rows but there are instead 10,000 then it may choose the wrong plan.
> In this case the estimate is 1,100,836 and the actual is 1,104,380 -
> very close.

On the surface this looks like a reasonable plan choice.  If you like
you can try the other two basic types of join plan by turning off
enable_hashjoin, which will likely drive the planner to use a merge
join, and then also turn off enable_mergejoin to get a nested loop
(or if it thinks nested loop is second best, turn off enable_nestloop
to see the behavior with a merge join).

What's important in comparing different plan alternatives is the ratios
of estimated costs to actual elapsed times.  If the planner is doing its
job well, those ratios should be similar across all the alternatives
(which implies of course that the cheapest-estimate plan is also the
cheapest in reality).  If not, it may be appropriate to fool with the
planner's cost estimate parameters to try to line up estimates and
reality a bit better.

See
http://www.postgresql.org/docs/8.0/static/performance-tips.html
for more detail.

            regards, tom lane

Re: How to interpret this explain analyse?

From

"Joost Kraaijeveld"

Date:

11 February 2005, 19:25:33

Hi Tom,

Tom Lane schreef:
> On the surface this looks like a reasonable plan choice.  If you like
> you can try the other two basic types of join plan by turning off
> enable_hashjoin, which will likely drive the planner to use a merge
> join, and then also turn off enable_mergejoin to get a nested loop
> (or if it thinks nested loop is second best, turn off enable_nestloop
> to see the behavior with a merge join).

The problem is that the query logically requests all records  ( as in "select * from a join") from the database but
actuallydisplays (in practise) in 97% of the time the first 1000 records and at most the first 50.000 records
99.99999999999999%of the time by scrolling (using "page down) in the gui and an occasional "jump to record xxxx"
throughsomething called a locator) (both percentages tested!). 

If I do the same query with a "limit 60.000" or if I do a "set enable_seqscan = off" the query returns in 0.3 secs.
Otherwiseit lasts for 20 secs (which is too much for the user to wait for, given the circumstances). 

I cannot change the query (it is geneated by a tool called Clarion) but it something like (from the psqlodbc_xxx.log):
"...
declare SQL_CUR01 cursor for
SELECT A.ordernummer, B.klantnummer FROM "orders" A LEFT OUTER JOIN "klt_alg" B ON A.Klantnummer=B.Klantnummer ORDER BY
A.klantnummer;
fetch 100 in SQL_CUR01;
..."

PostgreSQL does the planning (and than executes accordingly) to the query and not the "fetch 100". Changing the query
witha "limit whatever" prohibits scrolling after the size of the resultset. If Postgres should delay the planning of
theactual query untill the fetch it could choose the quick solution. Another solution would be to "advise" PostgreSQL
whichindex etc (whatever etc means ;-))  to use ( as in the mailing from Silke Trissl in the performance list on
09-02-05).

> What's important in comparing different plan alternatives is the ratios
> of estimated costs to actual elapsed times.  If the planner is doing its
> job well, those ratios should be similar across all the alternatives
> (which implies of course that the cheapest-estimate plan is also the
> cheapest in reality).  If not, it may be appropriate to fool with the
> planner's cost estimate parameters to try to line up estimates and
> reality a bit better.
I I really do a "select *" and display the result, the planner is right (tested with "set enable_seqscan = off" and
"setenable_seqscan = on). 

Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: J.Kraaijeveld@Askesis.nl
web: www.askesis.nl

Re: How to interpret this explain analyse?

From

Tom Lane

Date:

11 February 2005, 19:41:02

"Joost Kraaijeveld" <J.Kraaijeveld@Askesis.nl> writes:
> I cannot change the query (it is geneated by a tool called Clarion) but it something like (from the
psqlodbc_xxx.log):
> "...
> declare SQL_CUR01 cursor for
> SELECT A.ordernummer, B.klantnummer FROM "orders" A LEFT OUTER JOIN "klt_alg" B ON A.Klantnummer=B.Klantnummer ORDER
BYA.klantnummer; 
> fetch 100 in SQL_CUR01;
> ..."

Well, the planner does put some emphasis on startup time when dealing
with a DECLARE CURSOR plan; the problem you face is just that that
correction isn't large enough.  (From memory, I think it optimizes on
the assumption that 10% of the estimated rows will actually be fetched;
you evidently want a setting of 1% or even less.)

We once talked about setting up a GUC variable to control the percentage
of a cursor that is estimated to be fetched:
http://archives.postgresql.org/pgsql-hackers/2000-10/msg01108.php
It never got done but that seems like the most reasonable solution to
me.

            regards, tom lane

Re: How to interpret this explain analyse?

From

"Joost Kraaijeveld"

Date:

11 February 2005, 23:12:46

Hi Tom,

Tom Lane schreef:
> Well, the planner does put some emphasis on startup time when dealing
> with a DECLARE CURSOR plan; the problem you face is just that that
> correction isn't large enough.  (From memory, I think it optimizes on
> the assumption that 10% of the estimated rows will actually
> be fetched; you evidently want a setting of 1% or even less.)
I wish I had your mnemory ;-) . The tables contain 1.100.000 records by the way  (that is not nearly 10 %, my math is
notthat good)) 


> We once talked about setting up a GUC variable to control the
> percentage of a cursor that is estimated to be fetched:
> http://archives.postgresql.org/pgsql-hackers/2000-10/msg01108.php
> It never got done but that seems like the most reasonable solution to
> me.
If the proposal means that the cursor is not limited to ths limit in the query but is limited to the fetch than I
supportthe proposal. A bit late I presume. 

Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: J.Kraaijeveld@Askesis.nl
web: www.askesis.nl

Re: How to interpret this explain analyse?

From

Kevin Brown

Date:

15 February 2005, 00:11:12

Tom Lane wrote:
> "Joost Kraaijeveld" <J.Kraaijeveld@Askesis.nl> writes:
> > I cannot change the query (it is geneated by a tool called Clarion) but it something like (from the
psqlodbc_xxx.log):
> > "...
> > declare SQL_CUR01 cursor for
> > SELECT A.ordernummer, B.klantnummer FROM "orders" A LEFT OUTER JOIN "klt_alg" B ON A.Klantnummer=B.Klantnummer
ORDERBY A.klantnummer; 
> > fetch 100 in SQL_CUR01;
> > ..."
>
> Well, the planner does put some emphasis on startup time when dealing
> with a DECLARE CURSOR plan; the problem you face is just that that
> correction isn't large enough.  (From memory, I think it optimizes on
> the assumption that 10% of the estimated rows will actually be fetched;
> you evidently want a setting of 1% or even less.)

Ouch.  Is this really a reasonable assumption?  I figured the primary
use of a cursor was to fetch small amounts of data at a time from a
large table, so 10% seems extremely high as an average fetch size.  Or
is the optimization based on the number of rows that will be fetched
by the cursor during the cursor's lifetime (as opposed to in a single
fetch)?

Also, one has to ask what the consequences are of assuming a value too
low versus too high.  Which ends up being worse?

> We once talked about setting up a GUC variable to control the percentage
> of a cursor that is estimated to be fetched:
> http://archives.postgresql.org/pgsql-hackers/2000-10/msg01108.php
> It never got done but that seems like the most reasonable solution to
> me.

Or keep some statistics on cursor usage, and adjust the value
dynamically based on actual cursor queries (this might be easier said
than done, of course).

--
Kevin Brown                          kevin@sysexperts.com

Re: How to interpret this explain analyse?

From

Greg Stark

Date:

15 February 2005, 07:40:22

Kevin Brown <kevin@sysexperts.com> writes:

> Ouch.  Is this really a reasonable assumption?  I figured the primary
> use of a cursor was to fetch small amounts of data at a time from a
> large table, so 10% seems extremely high as an average fetch size.  Or
> is the optimization based on the number of rows that will be fetched
> by the cursor during the cursor's lifetime (as opposed to in a single
> fetch)?
>
> Also, one has to ask what the consequences are of assuming a value too
> low versus too high.  Which ends up being worse?

This is one of the things the planner really cannot know. Ultimately it's the
kind of thing for which hints really are necessary. Oracle distinguishes
between the "minimize total time" versus "minimize startup time" with
/*+ ALL_ROWS */ and /*+ FIRST_ROWS */ hints, for example.

I would also find it reasonable to have hints to specify a selectivity for
expressions the optimizer has no hope of possibly being able to estimate.
Things like "WHERE myfunction(col1,col2,?) /*+ 10% */"

--
greg

Re: How to interpret this explain analyse?

From

Bricklen Anderson

Date:

15 February 2005, 15:12:06

Greg Stark wrote:
> Kevin Brown <kevin@sysexperts.com> writes:
>
>
>>Ouch.  Is this really a reasonable assumption?  I figured the primary
>>use of a cursor was to fetch small amounts of data at a time from a
>>large table, so 10% seems extremely high as an average fetch size.  Or
>>is the optimization based on the number of rows that will be fetched
>>by the cursor during the cursor's lifetime (as opposed to in a single
>>fetch)?
>>
>>Also, one has to ask what the consequences are of assuming a value too
>>low versus too high.  Which ends up being worse?
>
>
> This is one of the things the planner really cannot know. Ultimately it's the
> kind of thing for which hints really are necessary. Oracle distinguishes
> between the "minimize total time" versus "minimize startup time" with
> /*+ ALL_ROWS */ and /*+ FIRST_ROWS */ hints, for example.
>
> I would also find it reasonable to have hints to specify a selectivity for
> expressions the optimizer has no hope of possibly being able to estimate.
> Things like "WHERE myfunction(col1,col2,?) /*+ 10% */"
>
>
Not to mention that hints would be helpful if you want to specify a particular index for a specific
query (case in point, testing plans and response of various indices without having to drop and
create other ones). This is a bit of functionality that I'd like to see.

Re: How to interpret this explain analyse?

From

Kevin Brown

Date:

16 February 2005, 21:10:54

Greg Stark wrote:
>
> Kevin Brown <kevin@sysexperts.com> writes:
> > Also, one has to ask what the consequences are of assuming a value too
> > low versus too high.  Which ends up being worse?
>
> This is one of the things the planner really cannot know. Ultimately it's the
> kind of thing for which hints really are necessary. Oracle distinguishes
> between the "minimize total time" versus "minimize startup time" with
> /*+ ALL_ROWS */ and /*+ FIRST_ROWS */ hints, for example.

Well, the planner *can* know the actual value to use in this case, or
at least a close approximation, but the system would have to gather
some information about cursors during fetches.  At the very least, it
will know how many rows were actually fetched by the cursor in
question, and it will also hopefully know how many rows were returned
by the query being executed.  Store the ratio of the two in a list,
then store the list itself into a table (or something) at backend exit
time.

--
Kevin Brown                          kevin@sysexperts.com