Home > mailing lists

Re: Cross-column statistics revisited - Mailing list pgsql-hackers

From	Ron Mayer
Subject	Re: Cross-column statistics revisited
Date	October 16, 2008 17:35:35
Msg-id	48F7A58D.2090303@cheapcomplexdevices.com Whole thread Raw
In response to	Re: Cross-column statistics revisited ("Robert Haas" <robertmhaas@gmail.com>)
List	pgsql-hackers

Tree view

Robert Haas wrote:
>> I think the real question is: what other kinds of correlation might
>> people be interested in representing?
> 
> Yes, or to phrase that another way: What kinds of queries are being
> poorly optimized now and why?

The one that affects our largest tables are ones where we
have an address (or other geo-data) clustered by zip, but
with other columns (city, county, state, school-zone, police
beat, etc) used in queries.

Postgres considers those unclustered (correlation 0 in the stats),
despite all rows for a given value residing on the same few pages.

I could imagine that this could be handled by either some cross-column
correlation (each zip has only 1-2 cities); or by an enhanced
single-column statistic (even though cities aren't sorted alphabetically,
all rows on a page tend to refer to the same city).

pgsql-hackers by date:

From: Simon Riggs
Date: 16 October 2008, 16:58:59
Subject: Re: Deriving Recovery Snapshots

From: Josh Berkus
Date: 16 October 2008, 17:47:23
Subject: Re: Cross-column statistics revisited

Re: Cross-column statistics revisited - Mailing list pgsql-hackers

Previous

Next