Home > mailing lists

Re: estimating # of distinct values - Mailing list pgsql-hackers

From	Florian Pflug
Subject	Re: estimating # of distinct values
Date	January 19, 2011 22:56:28
Msg-id	0ED6A735-4377-47DC-AEF4-C55F54BD06C4@phlo.org Whole thread Raw
In response to	Re: estimating # of distinct values (Nathan Boley <npboley@gmail.com>)
Responses	Re: estimating # of distinct values (Nathan Boley <npboley@gmail.com>)
List	pgsql-hackers

Tree view

On Jan19, 2011, at 23:44 , Nathan Boley wrote:
> If you think about it, it's a bit ridiculous to look at the whole table
> *just* to "estimate" ndistinct - if we go that far why dont we just
> store the full distribution and be done with it?

The crucial point that you're missing here is that ndistinct provides an
estimate even if you *don't* have a specific value to search for at hand.
This is way more common than you may think, it e.g. happens every you time
PREPARE are statement with parameters. Even knowing the full distribution
has no advantage in this case - the best you could do is to average the
individual probabilities which gives ... well, 1/ndistinct.

best regards,
Florian Pflug

pgsql-hackers by date:

From: Tomas Vondra
Date: 19 January 2011, 22:32:46
Subject: Re: estimating # of distinct values

From: Jan Urbański
Date: 19 January 2011, 23:25:20
Subject: Re: pl/python refactoring

Re: estimating # of distinct values - Mailing list pgsql-hackers

Previous

Next