Home > mailing lists

Re: Improving N-Distinct estimation by ANALYZE - Mailing list pgsql-hackers

From	Simon Riggs
Subject	Re: Improving N-Distinct estimation by ANALYZE
Date	January 16, 2006 16:05:53
Msg-id	1137441946.3180.187.camel@localhost.localdomain Whole thread
In response to	Re: Improving N-Distinct estimation by ANALYZE ("Jim C. Nasby" <jnasby@pervasive.com>)
Responses	Re: Improving N-Distinct estimation by ANALYZE
List	pgsql-hackers

Tree view

On Mon, 2006-01-16 at 12:26 -0600, Jim C. Nasby wrote:
> On Fri, Jan 13, 2006 at 11:37:38PM -0500, Tom Lane wrote:
> > Josh Berkus <josh@agliodbs.com> writes:
> > >> It's also worth mentioning that for datatypes that only have an "="
> > >> operator the performance of compute_minimal_stats is O(N^2) when values
> > >> are unique, so increasing sample size is a very bad idea in that case.
> > 
> > > Hmmm ... does ANALYZE check for UNIQUE constraints?
> > 
> > Our only implementation of UNIQUE constraints is btree indexes, which
> > require more than an "=" operator, so this seems irrelevant.
> 
> IIRC, the point was that if we know a field has to be unique, there's no
> sense in doing that part of the analysis on it; you'd only care about
> correlation.

An interesting point: if we know the cardinality by definition there is
no need to ANALYZE for that column. An argument in favour of the
definitional rather than the discovery approach. As a designer I very
often already know the cardinality by definition, so I would appreciate
the opportunity to specify that to the database.

Tom has not spoken against checking for UNIQUE constraints: he is just
pointing out that there never could be a constraint in the case I was
identifying. For other datatypes we could check for them rather than
analyzing the sample, but the effect would be the same either way.

My original point was that behaviour is O(N^2) when unique; it is still
O(N^2) when nearly unique, albeit O(N^2) - O(N). Reducing stats_target
does not help there - the effect varies according to number of rows in
the sample, not num slots in MCV list.

Best Regards, Simon Riggs

pgsql-hackers by date:

From: Tom Lane
Date: 16 January 2006, 15:43:33
Subject: Re: Large Scale Aggregation (HashAgg Enhancement)

From: Manfred Koizar
Date: 16 January 2006, 16:25:02
Subject: Re: Improving N-Distinct estimation by ANALYZE

Re: Improving N-Distinct estimation by ANALYZE - Mailing list pgsql-hackers

Previous

Next