On Fri, 2008-12-12 at 11:16 -0500, Tom Lane wrote:
> Perhaps a better plan is to try to de-emphasize use of ndistinct,
> though I concede I have no idea how to do that.
We don't actually care about the accuracy of the ndistinct much, just
the accuracy of our answer to the question "given work_mem = X, is it
better to use a hash plan".
So we just need to scan the table until we can answer that question
accurately enough. i.e. a variable sized sample.
Perhaps we could store a probability distribution for various values of
work_mem, rather than a single ndistinct value.
Anyway, definitely handwaving now to stimulate ideas.
-- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support