On Sun, Nov 10, 2019 at 04:34:09PM -0500, Tom Lane wrote:
>Thomas Munro <thomas.munro@gmail.com> writes:
>> I think I see what's happening: we're running out of hash bits.
>
>Ugh.
>
>> Besides switching to 64 bit hashes so that we don't run out of
>> information (clearly a good idea), what other options do we have?
>
>If the input column is less than 64 bits wide, this is an illusory
>solution, because there can't really be that many independent hash bits.
>I think we need to have some constraint that keeps us from widening
>the batch+bucket counts to more hash bits than we have.
>
That's not quite correct - it's a solution/improvement as long as the
values have more than 32 bits of information. I.e. it works even for
values that are less than 64 bits wide, as long as there's more than 32
bits of information (assuming the hash does not lost the extra bits).
In the case discussed in this text we're dealing with a citext column,
with average length ~8 chars. So it probably has less than 64 bits of
information, because it's likely just alphanumeric, or something like
that. But likely more than 32 bits.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services