Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infinite loop - Mailing list pgsql-bugs

From Tom Lane
Subject Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infinite loop
Date
Msg-id 12511.1517008946@sss.pgh.pa.us
Whole thread Raw
In response to Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
List pgsql-bugs
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> I suspect you're right the hash is biased to lohalf bits, as you wrote
> in the 19/12 message.

I don't see any bias in what it's doing, which is basically xoring the
two halves and hashing the result.  It's possible though that Todd's
data set contains values in which corresponding bits of the high and
low halves are correlated somehow, in which case the xor would produce
a lot of cancellation and a relatively small number of distinct outputs.

If we weren't bound by backwards compatibility, we could consider changing
to logic more like "if the value is within the int4 range, apply int4hash,
otherwise hash all 8 bytes normally".  But I don't see how we can change
that now that hash indexes are first-class citizens.

In any case, we still need a fix for the behavior that the hash table size
is blown out by lots of collisions, because that can happen no matter what
the hash function is.  Andres seems to have dropped the ball on doing
something about that.

            regards, tom lane


pgsql-bugs by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
Next
From: Michael Paquier
Date:
Subject: Re: pg_hba_file_rules: "scram-sha256" instead of "scram-sha-256"