Home > mailing lists

Re: Improving N-Distinct estimation by ANALYZE - Mailing list pgsql-hackers

From	Josh Berkus
Subject	Re: Improving N-Distinct estimation by ANALYZE
Date	January 13, 2006 21:13:22
Msg-id	200601131719.05197.josh@agliodbs.com Whole thread Raw
In response to	Re: Improving N-Distinct estimation by ANALYZE (Simon Riggs <simon@2ndquadrant.com>)
Responses	Re: Improving N-Distinct estimation by ANALYZE
List	pgsql-hackers

Tree view

Simon,

> It's also worth mentioning that for datatypes that only have an "="
> operator the performance of compute_minimal_stats is O(N^2) when values
> are unique, so increasing sample size is a very bad idea in that case.
> It may be possible to re-sample the sample, so that we get only one row
> per block as with the current row sampling method. Another idea might be
> just to abort the analysis when it looks fairly unique, rather than
> churn through the whole sample.

I'd tend to do the latter.   If we haven't had a value repeat in 25 blocks, 
how likely is one to appear later?

Hmmm ... does ANALYZE check for UNIQUE constraints?   Most unique values 
are going to have a constraint, in which case we don't need to sample them 
at all for N-distinct.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

pgsql-hackers by date:

From: "Jonah H. Harris"
Date: 13 January 2006, 18:17:10
Subject: Re: simple utility commands (src/backend/commands)

From: Tom Lane
Date: 14 January 2006, 00:37:47
Subject: Re: Improving N-Distinct estimation by ANALYZE

Re: Improving N-Distinct estimation by ANALYZE - Mailing list pgsql-hackers

Previous

Next