Re: Cross-column statistics revisited - Mailing list pgsql-hackers

From Greg Stark
Subject Re: Cross-column statistics revisited
Date
Msg-id B71B9E9E-3F8D-48B2-9D99-A342AB043322@enterprisedb.com
Whole thread Raw
In response to Re: Cross-column statistics revisited  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
[sorry for top osting - dam phone]

It's pretty straightforward to to a chi-squared test on all the pairs.  
But that tells you that the product is more likely to be wrong. It  
doesn't tell you whether it's going to be too high or too low...

greg

On 16 Oct 2008, at 07:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Martijn van Oosterhout <kleptog@svana.org> writes:
>> I think you need to go a step back: how are you going to use this  
>> data?
>
> The fundamental issue as the planner sees it is not having to assume
> independence of WHERE clauses.  For instance, given
>
>    WHERE a < 5 AND b > 10
>
> our current approach is to estimate the fraction of rows with a < 5
> (using stats for a), likewise estimate the fraction with b > 10
> (using stats for b), and then multiply these fractions together.
> This is correct if a and b are independent, but can be very bad if
> they aren't.  So if we had joint statistics on a and b, we'd want to
> somehow match that up to clauses for a and b and properly derive
> the joint probability.
>
> (I'm not certain of how to do that efficiently, even if we had the
> right stats :-()
>
>            regards, tom lane
>
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Cross-column statistics revisited
Next
From: "Robert Haas"
Date:
Subject: Re: Cross-column statistics revisited