Home > mailing lists

Re: Top N queries and disbursion - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: Top N queries and disbursion
Date	October 7, 1999 19:17:45
Msg-id	1078.939338218@sss.pgh.pa.us Whole thread Raw
In response to	Top N queries and disbursion (Roberto Cornacchia <rcorna@tin.it>)
Responses	Re: [HACKERS] Re: Top N queries and disbursion
List	pgsql-hackers

Tree view

Roberto Cornacchia <rcorna@tin.it> writes:
> ... since we are working on snapshots of the 6.6
> release (now we are using snapshot dated 9/13/99) we are afraid of
> instability problems to affect the results. Could you give us any
> suggestion about this? We are quite close to the degree day, so we have
> to optimize time usage... 

If you don't want to spend time tracking development changes then you
probably ought to stick with the snapshot you have.  I don't see any
reason that you should try to track changes right now...

> We need to estimate the number of distinct values of an attribute. We
> thought 1/disbursion was the right solution, but the results were quite
> wrong:

No, it's certainly not the right thing.  To my understanding, disbursion
is a measure of the frequency of the most common value of an attribute;
but that tells you very little about how many other values there are.
1/disbursion is a lower bound on the number of values, but it wouldn't
be a good estimate unless you had reason to think that the values were
pretty evenly distributed.  There could be a *lot* of very-infrequent
values.

> with 100 distinct values of an attribute uniformly distribuited in a
> relation of 10000 tuples, disbursion was estimated as 0.002275, giving
> us 440 distinct values.

This is an illustration of the fact that Postgres' disbursion-estimator
is pretty bad :-(.  It usually underestimates the frequency of the most
common value, unless the most common value is really frequent
(probability > 0.2 or so).  I've been trying to think of a more accurate
way of figuring the statistic that wouldn't be unreasonably slow.
Or, perhaps, we should forget all about disbursion and adopt some other
statistic(s).
        regards, tom lane

pgsql-hackers by date:

From: Bruce Momjian
Date: 07 October 1999, 13:52:42
Subject: Re: [HACKERS] Re: psql and comments

From: Tom Lane
Date: 07 October 1999, 19:29:54
Subject: Re: [HACKERS] Re: psql and comments

Re: Top N queries and disbursion - Mailing list pgsql-hackers

Previous

Next