On Fri, 2008-12-12 at 13:18 -0500, Tom Lane wrote:
> I seem to recall Greg suggesting that there were ways to estimate
> ndistinct without sorting, but short of a fundamental algorithm change
> there's not going to be a win here.
Hash table? Haas Stokes suggests a Bloom filter.
Why not keep the random algorithm we have now, but scan the block into a
separate hash table for ndistinct estimation. That way we keep the
correct random rows for other purposes.
> > Right now we may as well use a random number generator.
>
> Could we skip the hyperbole please?
Some of the ndistinct values are very badly off, and in the common cases
I cited previously, consistently so.
Once I'm certain the rescue helicopter has seen me, I'll stop waving my
arms. (But yes, OK).
-- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support