Actually, the earliest paper that solves the distinct_n estimation
problem in 1 pass is the following:
"Estimating simple functions on the union of data streams"
by Gibbons and Tirthapura, SPAA 2001.
http://home.eng.iastate.edu/~snt/research/streaming.pdf
The above paper addresses a more difficult problem (1 pass
_and_ a distributed setting).
Gibbon's followup paper in VLDB 2001 limits the problem to a
single machine and contains primarily experimental results (for
a database audience). The algorithmic breakthrough had already been
accomplished in the SPAA paper.
Gurmeet
--
----------------------------------------------------
Gurmeet Singh Manku Google Inc.
http://www.cs.stanford.edu/~manku (650) 967 1890
----------------------------------------------------