Thread: Re: [PERFORM] Bad n_distinct estimation; hacks suggested?

Re: [PERFORM] Bad n_distinct estimation; hacks suggested?

From
Simon Riggs
Date:
On Tue, 2005-04-26 at 15:00 -0700, Gurmeet Manku wrote:

>  2. In a single scan, it is possible to estimate n_distinct by using
>     a very simple algorithm:
>
>  "Distinct sampling for highly-accurate answers to distinct value
>   queries and event reports" by Gibbons, VLDB 2001.
>
>  http://www.aladdin.cs.cmu.edu/papers/pdfs/y2001/dist_sampl.pdf

That looks like the one...

...though it looks like some more complex changes to the current
algorithm to use it, and we want the other stats as well...

>  3. In fact, Gibbon's basic idea has been extended to "sliding windows"
>     (this extension is useful in streaming systems like Aurora / Stream):
>
>  "Distributed streams algorithms for sliding windows"
>  by Gibbons and Tirthapura, SPAA 2002.
>
>  http://home.eng.iastate.edu/~snt/research/tocs.pdf
>

...and this offers the possibility of calculating statistics at load
time, as part of the COPY command

Best Regards, Simon Riggs