On 01/19/2016 10:54 PM, Peter Geoghegan wrote:
> On Tue, Jan 19, 2016 at 9:37 AM, Alvaro Herrera
> <alvherre@2ndquadrant.com> wrote:
>> Our transcript seems to predate that bugfix commit, so I assume we need
>> to apply this to our copy too. Sadly, Hideaki-san commit message isn't
>> very descriptive.
>
> Fortunately, the function mergeHyperLogLog() in our hyperloglog.c
> currently has no callers.
>
>> I don't really know how HyperLogLog works, so maybe we can't or
>> shouldn't apply the patch because of how the hash stuff is used.
>
> I think that Hideaki's confusion comes from whether or not this HLL
> state is a sparse or dense/full representation. The distinction is
> explained in the README for postgresql-hll:
>
> https://github.com/aggregateknowledge/postgresql-hll
>
> postgresql-hll has no support for merging HLLs that are sparse:
>
> https://github.com/aggregateknowledge/postgresql-hll/blob/master/hll.c#L1888
>
> Can't we just tear mergeHyperLogLog() out?
FWIW I've been considering adding APPROX_COUNT_DISTINCT() aggregate,
similarly to what other databases (e.g. Vertica) have built-in. Now,
that would not require the merge too, but we're currently baking support
for 'combine' functions, and that's exactly what merge does.
So why not just fix the bug?
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services