On Tue, 2005-04-26 at 15:00 -0700, Gurmeet Manku wrote:
> 2. In a single scan, it is possible to estimate n_distinct by using
> a very simple algorithm:
>
> "Distinct sampling for highly-accurate answers to distinct value
> queries and event reports" by Gibbons, VLDB 2001.
>
> http://www.aladdin.cs.cmu.edu/papers/pdfs/y2001/dist_sampl.pdf
That looks like the one...
...though it looks like some more complex changes to the current
algorithm to use it, and we want the other stats as well...
> 3. In fact, Gibbon's basic idea has been extended to "sliding windows"
> (this extension is useful in streaming systems like Aurora / Stream):
>
> "Distributed streams algorithms for sliding windows"
> by Gibbons and Tirthapura, SPAA 2002.
>
> http://home.eng.iastate.edu/~snt/research/tocs.pdf
>
...and this offers the possibility of calculating statistics at load
time, as part of the COPY command
Best Regards, Simon Riggs