Re: merge>hash>loop - Mailing list pgsql-performance

From Markus Schaber
Subject Re: merge>hash>loop
Date
Msg-id 4444C4CF.2000106@logix-tt.com
Whole thread Raw
In response to Re: merge>hash>loop  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: merge>hash>loop  ("Jim C. Nasby" <jnasby@pervasive.com>)
Re: merge>hash>loop  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: merge>hash>loop  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
Hi, Tom,

Tom Lane wrote:

> Well, the other thing that's going on here is that we know we are
> overestimating the cost of nestloop-with-inner-indexscan plans.
> The current estimation for that is basically "outer scan cost plus N
> times inner scan cost" where N is the estimated number of outer tuples;
> in other words the repeated indexscan probes are each assumed to happen
> from a cold start.  In reality, caching of the upper levels of the index
> means that the later index probes are much cheaper than this model
> thinks.  We've known about this for some time but no one's yet proposed
> a more reasonable cost model.

My spontaneus guess would be to use log(N)*inner instead of N*inner. I
don't have any backings for that, it's just what my intuition tells me
as a first shot.

> In my mind this is tied into another issue, which is that the planner
> always costs on the basis of each query starting from zero.  In a real
> environment it's much cheaper to use heavily-used indexes than this cost
> model suggests, because they'll already be swapped in due to use by
> previous queries.  But we haven't got any infrastructure to keep track
> of what's been heavily used, let alone a cost model that could make use
> of the info.

An easy first approach would be to add a user tunable cache probability
value to each index (and possibly table) between 0 and 1. Then simply
multiply random_page_cost with (1-that value) for each scan.

Later, this value could be automatically tuned by stats analysis or
other means.

> I think part of the reason that people commonly reduce random_page_cost
> to values much lower than physical reality would suggest is that it
> provides a crude way of partially compensating for this basic problem.

I totall agree with this, it's just what we did here from time to time. :-)

Hmm, how does effective_cach_size correspond with it? Shouldn't a high
effective_cache_size have a similar effect?

Thanks,
Markus
--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf.     | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

pgsql-performance by date:

Previous
From: Markus Schaber
Date:
Subject: Re: Inserts optimization?
Next
From: "Magnus Hagander"
Date:
Subject: Re: Inserts optimization?