Simon,
> It's also worth mentioning that for datatypes that only have an "="
> operator the performance of compute_minimal_stats is O(N^2) when values
> are unique, so increasing sample size is a very bad idea in that case.
> It may be possible to re-sample the sample, so that we get only one row
> per block as with the current row sampling method. Another idea might be
> just to abort the analysis when it looks fairly unique, rather than
> churn through the whole sample.
I'd tend to do the latter. If we haven't had a value repeat in 25 blocks,
how likely is one to appear later?
Hmmm ... does ANALYZE check for UNIQUE constraints? Most unique values
are going to have a constraint, in which case we don't need to sample them
at all for N-distinct.
--
--Josh
Josh Berkus
Aglio Database Solutions
San Francisco