Re: Forcing the use of particular execution plans - Mailing list pgsql-performance

From Jim C. Nasby
Subject Re: Forcing the use of particular execution plans
Date
Msg-id 20061004005545.GA81937@nasby.net
Whole thread Raw
In response to Re: Forcing the use of particular execution plans  (Ron Mayer <rm_pg@cheapcomplexdevices.com>)
List pgsql-performance
Adding -performance back in.

On Tue, Oct 03, 2006 at 05:10:04PM -0700, Ron Mayer wrote:
> Jim C. Nasby wrote:
> >
> > Index scans are also pretty picky about correlation. If you have really
> > low correlation you don't want to index scan,
>
> I'm still don't think "correlation" is the right metric
> at all for making this decision.
>
> If you have a list of addresses clustered by "zip"
> the "correlation" of State, City, County, etc will all be zero (since
> the zip codes don't match the alphabetical order of state or city names)
> but index scans are still big wins because the data for any given
> state or city will be packed on the same few pages - and in fact
> the pages could be read mostly sequentially.

That's a good point that I don't think has been considered before. I
think correlation is still somewhat important, but what's probably far
more important is data localization.

One possible way to calculate this would be to note the location of
every tuple with a given value in the heap. Calculate the geometric mean
of those locations (I think you could essentially average all the
ctids), and divide that by the average distance of each tuple from that
mean (or maybe the reciprocal of that would be more logical).

Obviously we don't want to scan the whole table to do that, but there
should be some way to do it via sampling as well.

Or perhaps someone knows of a research paper with real data on how to do
this instead of hand-waving. :)

> > but I think our current
> > estimates make it too eager to switch to a seqscan.
--
Jim C. Nasby, Database Architect                   jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net

pgsql-performance by date:

Previous
From: Ron Mayer
Date:
Subject: Re: Forcing the use of particular execution plans
Next
From: "Tomeh, Husam"
Date:
Subject: Re: PostgreSQL Caching