Re: Distinct-Sampling (Gibbons paper) for Postgres - Mailing list pgsql-hackers

From a3a18850@telus.net
Subject Re: Distinct-Sampling (Gibbons paper) for Postgres
Date
Msg-id 1114751418.4271c1ba12544@webmail.telus.net
Whole thread Raw
In response to Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Andrew Dunstan <andrew@dunslane.net>)
List pgsql-hackers
Well, this guy has it nailed. He cites Flajolet and Martin, which was (I
thought) as good as you could get with only a reasonable amount of memory per
statistic. Unfortunately, their hash table is a one-shot deal; there's no way
to maintain it once the table changes. His incremental update doesn't degrade
as the table changes. If there isn't the same wrangle of patent as with the
ARC algorithm, and if the existing stats collector process can stand the extra
traffic, then this one is a winner.

Many thanks to the person who posted this reference in the first place; so
sorry I canned your posting and can't recall your name.

Now, if we can come up with something better than the ARC algorithm ...


pgsql-hackers by date:

Previous
From: Christopher Browne
Date:
Subject: Re: Feature freeze date for 8.1
Next
From: Sokolov Yura
Date:
Subject: PseudoPartitioning and agregates