Home > mailing lists

Re: Cross-column statistics revisited - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: Cross-column statistics revisited
Date	October 16, 2008 17:20:36
Msg-id	17221.1224177630@sss.pgh.pa.us Whole thread Raw
In response to	Re: Cross-column statistics revisited (Martijn van Oosterhout <kleptog@svana.org>)
Responses	Re: Cross-column statistics revisited Re: Cross-column statistics revisited
List	pgsql-hackers

Tree view

Martijn van Oosterhout <kleptog@svana.org> writes:
> I think you need to go a step back: how are you going to use this data?

The fundamental issue as the planner sees it is not having to assume
independence of WHERE clauses.  For instance, given
WHERE a < 5 AND b > 10

our current approach is to estimate the fraction of rows with a < 5
(using stats for a), likewise estimate the fraction with b > 10
(using stats for b), and then multiply these fractions together.
This is correct if a and b are independent, but can be very bad if
they aren't.  So if we had joint statistics on a and b, we'd want to
somehow match that up to clauses for a and b and properly derive
the joint probability.

(I'm not certain of how to do that efficiently, even if we had the
right stats :-()
        regards, tom lane

pgsql-hackers by date:

From: Martijn van Oosterhout
Date: 16 October 2008, 17:11:36
Subject: Re: Cross-column statistics revisited

From: Greg Stark
Date: 16 October 2008, 17:32:44
Subject: Re: Cross-column statistics revisited

Re: Cross-column statistics revisited - Mailing list pgsql-hackers

Previous

Next