Home > mailing lists

Re: Cross-column statistics revisited - Mailing list pgsql-hackers

From	Greg Stark
Subject	Re: Cross-column statistics revisited
Date	October 16, 2008 14:32:44
Msg-id	B71B9E9E-3F8D-48B2-9D99-A342AB043322@enterprisedb.com Whole thread Raw
In response to	Re: Cross-column statistics revisited (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-hackers

Tree view

[sorry for top osting - dam phone]

It's pretty straightforward to to a chi-squared test on all the pairs.  
But that tells you that the product is more likely to be wrong. It  
doesn't tell you whether it's going to be too high or too low...

greg

On 16 Oct 2008, at 07:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Martijn van Oosterhout <kleptog@svana.org> writes:
>> I think you need to go a step back: how are you going to use this  
>> data?
>
> The fundamental issue as the planner sees it is not having to assume
> independence of WHERE clauses.  For instance, given
>
>    WHERE a < 5 AND b > 10
>
> our current approach is to estimate the fraction of rows with a < 5
> (using stats for a), likewise estimate the fraction with b > 10
> (using stats for b), and then multiply these fractions together.
> This is correct if a and b are independent, but can be very bad if
> they aren't.  So if we had joint statistics on a and b, we'd want to
> somehow match that up to clauses for a and b and properly derive
> the joint probability.
>
> (I'm not certain of how to do that efficiently, even if we had the
> right stats :-()
>
>            regards, tom lane
>
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

pgsql-hackers by date:

From: Tom Lane
Date: 16 October 2008, 14:20:36
Subject: Re: Cross-column statistics revisited

From: "Robert Haas"
Date: 16 October 2008, 14:35:05
Subject: Re: Cross-column statistics revisited

Re: Cross-column statistics revisited - Mailing list pgsql-hackers

Previous

Next