Re: Cross-column statistics revisited - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Cross-column statistics revisited
Date
Msg-id 603c8f070810161034o8333bf3ka08a3230578022f6@mail.gmail.com
Whole thread Raw
In response to Re: Cross-column statistics revisited  (Martijn van Oosterhout <kleptog@svana.org>)
Responses Re: Cross-column statistics revisited
Re: Cross-column statistics revisited
Re: Cross-column statistics revisited
List pgsql-hackers
> I think the real question is: what other kinds of correlation might
> people be interested in representing?

Yes, or to phrase that another way: What kinds of queries are being
poorly optimized now and why?

I suspect that a lot of the correlations people care about are
extreme.  For example, it's fairly common for me to have a table where
column B is only used at all for certain values of column A.  Like,
atm_machine_id is usually or always NULL unless transaction_type is
ATM, or something.  So a clause of the form transaction_type = 'ATM'
and atm_machine_id < 10000 looks more selective than it really is
(because the first half is redundant).

The other half of this is that bad selectivity estimates only matter
if they're bad enough to change the plan, and I'm not sure whether
cases like this are actually a problem in practice.

...Robert


pgsql-hackers by date:

Previous
From: Greg Stark
Date:
Subject: Re: Cross-column statistics revisited
Next
From: Andrew Dunstan
Date:
Subject: Re: minimal update