Re: PATCH: Extending the HyperLogLog API a bit - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: PATCH: Extending the HyperLogLog API a bit
Date
Msg-id 569EB2A8.7080700@2ndquadrant.com
Whole thread Raw
In response to Re: PATCH: Extending the HyperLogLog API a bit  (Peter Geoghegan <pg@heroku.com>)
Responses Re: PATCH: Extending the HyperLogLog API a bit  (Peter Geoghegan <pg@heroku.com>)
List pgsql-hackers

On 01/19/2016 10:54 PM, Peter Geoghegan wrote:
> On Tue, Jan 19, 2016 at 9:37 AM, Alvaro Herrera
> <alvherre@2ndquadrant.com> wrote:
>> Our transcript seems to predate that bugfix commit, so I assume we need
>> to apply this to our copy too.  Sadly, Hideaki-san commit message isn't
>> very descriptive.
>
> Fortunately, the function mergeHyperLogLog() in our hyperloglog.c
> currently has no callers.
>
>> I don't really know how HyperLogLog works, so maybe we can't or
>> shouldn't apply the patch because of how the hash stuff is used.
>
> I think that Hideaki's confusion comes from whether or not this HLL
> state is a sparse or dense/full representation. The distinction is
> explained in the README for postgresql-hll:
>
> https://github.com/aggregateknowledge/postgresql-hll
>
> postgresql-hll has no support for merging HLLs that are sparse:
>
> https://github.com/aggregateknowledge/postgresql-hll/blob/master/hll.c#L1888
>
> Can't we just tear mergeHyperLogLog() out?

FWIW I've been considering adding APPROX_COUNT_DISTINCT() aggregate, 
similarly to what other databases (e.g. Vertica) have built-in. Now, 
that would not require the merge too, but we're currently baking support 
for 'combine' functions, and that's exactly what merge does.

So why not just fix the bug?

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Combining Aggregates
Next
From: Brar Piening
Date:
Subject: Infer INOUT Parameters from Frontend/Backend Protocol