Re: proposal : cross-column stats - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: proposal : cross-column stats
Date
Msg-id 1292790103.1193.9461.camel@ebony
Whole thread Raw
In response to Re: proposal : cross-column stats  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: proposal : cross-column stats  (Tomas Vondra <tv@fuzzy.cz>)
List pgsql-hackers
On Mon, 2010-12-13 at 10:38 -0500, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Sun, Dec 12, 2010 at 9:16 PM, Tomas Vondra <tv@fuzzy.cz> wrote:
> >> The proposed solution is based on contingency tables, built for selected
> >> groups of columns (not for each possible group). And the contingency
> >> table gives you the ability to estimate the probabilities needed to
> >> compute the selectivity. Or am I missing something?
> 
> > Well, I'm not real familiar with contingency tables, but it seems like
> > you could end up needing to store a huge amount of data to get any
> > benefit out of it, in some cases.
> 
> The reason that this wasn't done years ago is precisely that nobody's
> figured out how to do it with a tolerable amount of stats data and a
> tolerable amount of processing time (both at ANALYZE time and during
> query planning).  It's not hard to see what we'd ideally like to do;
> it's getting from there to something useful in production that's hard.

I think we have to face up to the fact that attempting to derive
meaningful cross-column stats will require larger sample sizes.

If we collect everything we're going to have ~10^9 stats slots with
default stats_target 100 and a 100 column table.

We should disconnect sample size from histogram size, and we need to
make the initial column pairings vastly fewer than all combinations.
Manual specification seems like it will be required for the cases where
we decide not to include it automatically, so it seems we'll need manual
specification anyway. In that case, we should do manual specification
first.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services



pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: pg_ctl and port number detection
Next
From: 3dmashup
Date:
Subject: Re: Amazon now supporting GPU focused EC2 instances