Re: proposal : cross-column stats - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: proposal : cross-column stats
Date
Msg-id 4D057ADD.3040305@fuzzy.cz
Whole thread Raw
In response to Re: proposal : cross-column stats  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: proposal : cross-column stats  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Dne 13.12.2010 01:05, Robert Haas napsal(a):
> This is a good idea, but I guess the question is what you do next.  If
> you know that the "applicability" is 100%, you can disregard the
> restriction clause on the implied column.  And if it has no
> implicatory power, then you just do what we do now.  But what if it
> has some intermediate degree of implicability?

Well, I think you've missed the e-mail from Florian Pflug - he actually
pointed out that the 'implicativeness' Heikki mentioned is called
conditional probability. And conditional probability can be used to
express the "AND" probability we are looking for (selectiveness).

For two columns, this is actually pretty straighforward - as Florian
wrote, the equation is
  P(A and B) = P(A|B) * P(B) = P(B|A) * P(A)

where P(B) may be estimated from the current histogram, and P(A|B) may
be estimated from the contingency (see the previous mails). And "P(A and
B)" is actually the value we're looking for.

Anyway there really is no "intermediate" degree of aplicability, it just
gives you the right estimate.

And AFAIR this is easily extensible to more than two columns, as
 P(A and B and C) = P(A and (B and C)) = P(A|(B and C)) * P(B and C)

so it's basically a recursion.

Well, I hope my statements are really correct - it's been a few years
since I gained my degree in statistics ;-)

regards
Tomas


pgsql-hackers by date:

Previous
From: Rob Wultsch
Date:
Subject: Re: ALTER TABLE ... ADD FOREIGN KEY ... NOT ENFORCED
Next
From: Robert Haas
Date:
Subject: Re: proposal : cross-column stats