Home > mailing lists

Re: proposal : cross-column stats - Mailing list pgsql-hackers

From	Yeb Havinga
Subject	Re: proposal : cross-column stats
Date	December 13, 2010 05:51:37
Msg-id	4D05EDCA.9070402@gmail.com Whole thread
In response to	Re: proposal : cross-column stats (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: proposal : cross-column stats
List	pgsql-hackers

Tree view

On 2010-12-13 03:28, Robert Haas wrote:
> Well, I'm not real familiar with contingency tables, but it seems like
> you could end up needing to store a huge amount of data to get any
> benefit out of it, in some cases.  For example, in the United States,
> there are over 40,000 postal codes, and some even larger number of
> city names, and doesn't the number of entries go as O(m*n)?  Now maybe
> this is useful enough anyway that we should Just Do It, but it'd be a
> lot cooler if we could find a way to give the planner a meaningful
> clue out of some more compact representation.
A sparse matrix that holds only 'implicative' (P(A|B) <> P(A*B)?) 
combinations? Also, some information might be deduced from others. For 
Heikki's city/region example, for each city it would be known that it is 
100% in one region. In that case it suffices to store only that 
information, since 0% in all other regions ca be deduced. I wouldn't be 
surprized if storing implicatures like this would reduce the size to O(n).

regards,
Yeb Havinga

pgsql-hackers by date:

From: Dmitriy Igrishin
Date: 13 December 2010, 05:47:48
Subject: Re: hstores in pl/python

From: Dimitri Fontaine
Date: 13 December 2010, 06:30:41
Subject: Re: ALTER TABLE ... ADD FOREIGN KEY ... NOT ENFORCED

Re: proposal : cross-column stats - Mailing list pgsql-hackers

Previous

Next