Home > mailing lists

Re: WIP: multivariate statistics / proof of concept - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	Re: WIP: multivariate statistics / proof of concept
Date	October 13, 2014 19:47:32
Msg-id	543C2C4C.6040800@fuzzy.cz Whole thread Raw
In response to	Re: WIP: multivariate statistics / proof of concept (Albe Laurenz <laurenz.albe@wien.gv.at>)
List	pgsql-hackers

Tree view

Hi!

On 13.10.2014 09:36, Albe Laurenz wrote:
> Tomas Vondra wrote:
>> attached is a WIP patch implementing multivariate statistics.
> 
> I think that is pretty useful.
> Oracle has an identical feature called "extended statistics".
> 
> That's probably an entirely different thing, but it would be very 
> nice to have statistics to estimate the correlation between columns 
> of different tables, to improve the estimate for the number of rows 
> in a join.

I don't have a clear idea of how that should work, but from the quick
look at how join selectivity estimation is implemented, I believe two
things might be possible:
(a) using conditional probabilities
    Say we have a join "ta JOIN tb ON (ta.x = tb.y)"
    Currently, the selectivity is derived from stats on the two keys.    Essentially probabilities P(x), P(y),
representedby the MCV lists.    But if there are additional WHERE conditions on the tables, and we    have suitable
multivariatestats, it's possible to use conditional    probabilities.
 
    E.g. if the query actually uses
        ... ta JOIN tb ON (ta.x = tb.y) WHERE ta.z = 10
    and we have stats on (ta.x, ta.z), we can use P(x|z=10) instead.    If the two columns are correlated, this might
bemuch different.
 
(b) using this for multi-column conditions
    If the join condition involves multiple columns, e.g.
        ON (ta.x = tb.y AND ta.p = tb.q)
    and we happen to have stats on (ta.x,ta.p) and (tb.y,tb.q), we may    use this to compute the cardinality (pretty
muchas we do today).
 

But I haven't really worked on this so far, I suspect there are various
subtle issues and I certainly don't plan to address this in the first
phase of the patch.

Tomas

pgsql-hackers by date:

From: Bruce Momjian
Date: 13 October 2014, 19:43:26
Subject: Re: On partitioning

From: Bruce Momjian
Date: 13 October 2014, 19:50:03
Subject: Re: psql \watch versus \timing

Re: WIP: multivariate statistics / proof of concept - Mailing list pgsql-hackers

Previous

Next