Re: PATCH: Extending the HyperLogLog API a bit - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: PATCH: Extending the HyperLogLog API a bit
Date
Msg-id CAM3SWZSCfELXb1jocSPF-ZxXic1fSUtZj5oCgeKR6dnnhahrFg@mail.gmail.com
Whole thread Raw
In response to Re: PATCH: Extending the HyperLogLog API a bit  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: PATCH: Extending the HyperLogLog API a bit
List pgsql-hackers
On Tue, Jan 19, 2016 at 2:03 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
>
> FWIW I've been considering adding APPROX_COUNT_DISTINCT() aggregate,
> similarly to what other databases (e.g. Vertica) have built-in. Now, that
> would not require the merge too, but we're currently baking support for
> 'combine' functions, and that's exactly what merge does.
>
> So why not just fix the bug?

You can go from the sparse representation to the dense representation,
so in principle you can merge two of our HLL states, if you are then
satisfied with having a new representation for the merged state. I
don't have time right now to do a full analysis of whether or not it's
possible to just fix the bug without doing all that, but I think it
might not be.

I think we benefit from the simplicity of the sparse representation.
So, in the absence of a clear justification for retaining
mergeHyperLogLog(), ripping it out seems best.

I also think that an expanded set of functionality would be required
for your APPROX_COUNT_DISTINCT() patch anyway, including support for
multiple representations (perhaps this isn't documented in your
APPROX_COUNT_DISTINCT(), but complete implementations seem to switch
from sparse to full at a certain point). So, ripping out
mergeHyperLogLog() doesn't really make that effort any more difficult.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Brar Piening
Date:
Subject: Infer INOUT Parameters from Frontend/Backend Protocol
Next
From: Tomasz Rybak
Date:
Subject: Re: pglogical_output - a general purpose logical decoding output plugin