Home > mailing lists

Re: estimating # of distinct values - Mailing list pgsql-hackers

From	Heikki Linnakangas
Subject	Re: estimating # of distinct values
Date	January 20, 2011 07:10:39
Msg-id	4D37EDEE.9070906@enterprisedb.com Whole thread Raw
In response to	Re: estimating # of distinct values (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: estimating # of distinct values (Tomas Vondra <tv@fuzzy.cz>)
List	pgsql-hackers

Tree view

On 20.01.2011 04:36, Robert Haas wrote:
> ... Even better, the
> code changes would be confined to ANALYZE rather than spread out all
> over the system, which has positive implications for robustness and
> likelihood of commit.

Keep in mind that the administrator can already override the ndistinct 
estimate with ALTER TABLE. If he needs to manually run a special ANALYZE 
command to make it scan the whole table, he might as well just use ALTER 
TABLE to tell the system what the real (or good enough) value is. A DBA 
should have a pretty good feeling of what the distribution of his data 
is like.

And how good does the estimate need to be? For a single-column, it's 
usually not that critical, because if the column has only a few distinct 
values then we'll already estimate that pretty well, and OTOH if 
ndistinct is large, it doesn't usually affect the plans much if it's 10% 
of the number of rows or 90%.

It seems that the suggested multi-column selectivity estimator would be 
more sensitive to ndistinct of the individual columns. Is that correct? 
How is it biased? If we routinely under-estimate ndistinct of individual 
columns, for example, does the bias accumulate or cancel itself in the 
multi-column estimate?

I'd like to see some testing of the suggested selectivity estimator with 
the ndistinct estimates we have. Who knows, maybe it works fine in practice.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

pgsql-hackers by date:

From: Pavel Stehule
Date: 20 January 2011, 06:56:08
Subject: Re: REVIEW: patch: remove redundant code from pl_exec.c

From: Dimitri Fontaine
Date: 20 January 2011, 08:00:03
Subject: Re: Extending opfamilies for GIN indexes

Re: estimating # of distinct values - Mailing list pgsql-hackers

Previous

Next