Re: [HACKERS] Optimizer confusion? - Mailing list pgsql-general

From Philip Warner
Subject Re: [HACKERS] Optimizer confusion?
Date
Msg-id 3.0.5.32.20000812174653.01f6adb0@mail.rhyme.com.au
Whole thread Raw
In response to Re: [HACKERS] Optimizer confusion?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [HACKERS] Optimizer confusion?  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
At 02:15 12/08/00 -0400, Tom Lane wrote:
>Philip Warner <pjw@rhyme.com.au> writes:
>> Again. this is OK, although I am a little surprised at the continuing
>> non-linearity of the estimates.
>
>Indexscan estimates are supposed to be nonlinear, actually, to account
>for the effects of caching.  I doubt the shapes of the curves are right
>in detail, but I haven't had time to do any research about it.

Of course - I was thinking it was a seqscan, which is even sillier.

As to the non-linearity, I would have thought the majority of I/Os by far
would be reading rows, and with retrieval by index, you may not get much
buffering benefit on table pages. For a large table, ISTM linear estimates
would be a good estimate.


>> I would be interested to know (a) if there is any way I can influence the
>> optimizer choice when it considers using the index in question,
>
>You could push random_page_cost and effective_cache_size around to try
>to match your platform better.  Let me know if that helps...

Setting it to 0.1 works (it was 4). But this (I think) just highlights the
fact that the index is sorted by date, and the rows were added in date
order. As a result (for this table, in this query), the index scan get's a
much better cache-hit rate, so the actual IO cost is low.

Does that sound reasonable? Does the optimizer know if I have used
clustering? If so, maybe I should just use the clustering command. If not,
then probably it's best to go with an index scan always for this query. Is
there any way I can code the query with "{enable_seqscan=off}" to apply
only to the current query? Or, perhaps more usefully, ask it (politely) to
use a given index for part of the predicate?

ISTM setting random_page_cost to 0.1 would be a bad idea in general...


>> (b) if the fixed seqscan cost estimate is a bug.
>
>I don't think so.

I think you're right; equating cost to row-IO means seqscan cost is fixed.


>  So if you want to try to compare
>the planner's cost estimates to real elapsed time, you might want to
>measure the results for "select count(*) from ..." instead of "select *
>from ..."  so that output costs are held fixed.)

This actually makes the indexscan even more desirable - the time interval
has to be more than 7 months before indexscan is slower, but at this point
it gets hard to tell how much benefit is being made from buffering etc, so
the 'elapsed time' comparison is probably pretty dodgy.

I don't suppose I can get the backend to tell me how many logical IOs and
how much CPU it used?


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|
                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] Optimizer confusion?
Next
From: Louis-David Mitterrand
Date:
Subject: Re: problem with float8 input format