Re: merge>hash>loop - Mailing list pgsql-performance

From Tom Lane
Subject Re: merge>hash>loop
Date
Msg-id 11035.1145033929@sss.pgh.pa.us
Whole thread Raw
In response to Re: merge>hash>loop  (Ian Westmacott <ianw@intellivid.com>)
Responses Re: merge>hash>loop  (Markus Schaber <schabi@logix-tt.com>)
List pgsql-performance
Ian Westmacott <ianw@intellivid.com> writes:
> That's what I feared, thanks.  But then if I simplify things a bit,
> such that the row counts are quite good, and PG still chooses a
> worse plan, can I conclude anything about my configuration settings
> (like random_page_cost)?

Well, the other thing that's going on here is that we know we are
overestimating the cost of nestloop-with-inner-indexscan plans.
The current estimation for that is basically "outer scan cost plus N
times inner scan cost" where N is the estimated number of outer tuples;
in other words the repeated indexscan probes are each assumed to happen
from a cold start.  In reality, caching of the upper levels of the index
means that the later index probes are much cheaper than this model
thinks.  We've known about this for some time but no one's yet proposed
a more reasonable cost model.

In my mind this is tied into another issue, which is that the planner
always costs on the basis of each query starting from zero.  In a real
environment it's much cheaper to use heavily-used indexes than this cost
model suggests, because they'll already be swapped in due to use by
previous queries.  But we haven't got any infrastructure to keep track
of what's been heavily used, let alone a cost model that could make use
of the info.

I think part of the reason that people commonly reduce random_page_cost
to values much lower than physical reality would suggest is that it
provides a crude way of partially compensating for this basic problem.

            regards, tom lane

pgsql-performance by date:

Previous
From: Ian Westmacott
Date:
Subject: Re: merge>hash>loop
Next
From: Francisco Reyes
Date:
Subject: Re: Inserts optimization?