Re: Improving N-Distinct estimation by ANALYZE - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Improving N-Distinct estimation by ANALYZE
Date
Msg-id 1137441946.3180.187.camel@localhost.localdomain
Whole thread Raw
In response to Re: Improving N-Distinct estimation by ANALYZE  ("Jim C. Nasby" <jnasby@pervasive.com>)
Responses Re: Improving N-Distinct estimation by ANALYZE  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Mon, 2006-01-16 at 12:26 -0600, Jim C. Nasby wrote:
> On Fri, Jan 13, 2006 at 11:37:38PM -0500, Tom Lane wrote:
> > Josh Berkus <josh@agliodbs.com> writes:
> > >> It's also worth mentioning that for datatypes that only have an "="
> > >> operator the performance of compute_minimal_stats is O(N^2) when values
> > >> are unique, so increasing sample size is a very bad idea in that case.
> > 
> > > Hmmm ... does ANALYZE check for UNIQUE constraints?
> > 
> > Our only implementation of UNIQUE constraints is btree indexes, which
> > require more than an "=" operator, so this seems irrelevant.
> 
> IIRC, the point was that if we know a field has to be unique, there's no
> sense in doing that part of the analysis on it; you'd only care about
> correlation.

An interesting point: if we know the cardinality by definition there is
no need to ANALYZE for that column. An argument in favour of the
definitional rather than the discovery approach. As a designer I very
often already know the cardinality by definition, so I would appreciate
the opportunity to specify that to the database.

Tom has not spoken against checking for UNIQUE constraints: he is just
pointing out that there never could be a constraint in the case I was
identifying. For other datatypes we could check for them rather than
analyzing the sample, but the effect would be the same either way.

My original point was that behaviour is O(N^2) when unique; it is still
O(N^2) when nearly unique, albeit O(N^2) - O(N). Reducing stats_target
does not help there - the effect varies according to number of rows in
the sample, not num slots in MCV list.

Best Regards, Simon Riggs





pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Large Scale Aggregation (HashAgg Enhancement)
Next
From: Manfred Koizar
Date:
Subject: Re: Improving N-Distinct estimation by ANALYZE