Re: [PERFORM] Bad n_distinct estimation; hacks suggested? - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: [PERFORM] Bad n_distinct estimation; hacks suggested?
Date
Msg-id 1114587910.21529.394.camel@localhost.localdomain
Whole thread Raw
List pgsql-hackers
On Tue, 2005-04-26 at 15:00 -0700, Gurmeet Manku wrote:

>  2. In a single scan, it is possible to estimate n_distinct by using
>     a very simple algorithm:
>
>  "Distinct sampling for highly-accurate answers to distinct value
>   queries and event reports" by Gibbons, VLDB 2001.
>
>  http://www.aladdin.cs.cmu.edu/papers/pdfs/y2001/dist_sampl.pdf

That looks like the one...

...though it looks like some more complex changes to the current
algorithm to use it, and we want the other stats as well...

>  3. In fact, Gibbon's basic idea has been extended to "sliding windows"
>     (this extension is useful in streaming systems like Aurora / Stream):
>
>  "Distributed streams algorithms for sliding windows"
>  by Gibbons and Tirthapura, SPAA 2002.
>
>  http://home.eng.iastate.edu/~snt/research/tocs.pdf
>

...and this offers the possibility of calculating statistics at load
time, as part of the COPY command

Best Regards, Simon Riggs


pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: bitmapscan test, no success, bs is not faster
Next
From: "Greg Sabino Mullane"
Date:
Subject: Re: [PATCHES] Continue transactions after errors in psql