On Fri, 2008-12-12 at 13:43 -0500, Tom Lane wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
> > On Fri, 2008-12-12 at 13:18 -0500, Tom Lane wrote:
> >> Could we skip the hyperbole please?
>
> > Some of the ndistinct values are very badly off, and in the common cases
> > I cited previously, consistently so.
>
> > Once I'm certain the rescue helicopter has seen me, I'll stop waving my
> > arms. (But yes, OK).
>
> Well, AFAICT we have agreed in this thread to kick up the default and
> maximum stats targets by a factor of 10 for 8.4. If there's anything
> to your thesis that a bigger sample size will help, that should already
> make a noticeable difference.
That only makes x10 sample size. Since we're using such a low sample
size already, it won't make much difference to ndistinct. It will be
great for histograms and MCVs though.
Please review my detailed test results mentioned here
http://archives.postgresql.org/pgsql-hackers/2006-01/msg00153.php
If you reproduce those results you'll see that the ndistinct machinery
is fundamentally broken for clustered data on large tables. In many
cases those are join keys and so joins are badly handled on the very
tables where good optimisation is most important.
-- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support