Citation for "Bad n_distinct estimation; hacks suggested?" - Mailing list pgsql-hackers

From Gurmeet Manku
Subject Citation for "Bad n_distinct estimation; hacks suggested?"
Date
Msg-id Pine.LNX.4.44.0505020858580.28631-100000@xenon.Stanford.EDU
Whole thread Raw
In response to Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Simon Riggs <simon@2ndquadrant.com>)
List pgsql-hackers
 Actually, the earliest paper that solves the distinct_n estimation
 problem in 1 pass is the following:

    "Estimating simple functions on the union of data streams"
    by Gibbons and Tirthapura, SPAA 2001.
    http://home.eng.iastate.edu/~snt/research/streaming.pdf

 The above paper addresses a more difficult problem (1 pass
 _and_ a distributed setting).


 Gibbon's followup paper in VLDB 2001 limits the problem to a
 single machine and contains primarily experimental results (for
 a database audience). The algorithmic breakthrough had already been
 accomplished in the SPAA paper.

 Gurmeet

--
 ----------------------------------------------------
 Gurmeet Singh Manku                      Google Inc.
 http://www.cs.stanford.edu/~manku    (650) 967 1890
 ----------------------------------------------------


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: pg_locks needs a facelift
Next
From: "Joshua D. Drake"
Date:
Subject: Re: [pgsql-advocacy] Increased company involvement