Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash - Mailing list pgsql-bugs

From Thomas Munro
Subject Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash
Date
Msg-id CA+hUKG+ZpNdjWxTXfdKz-cAZG3iuK9HV_woNn5HTUoiXErvQrw@mail.gmail.com
Whole thread Raw
In response to Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash  (Andres Freund <andres@anarazel.de>)
Responses Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash  (Thomas Munro <thomas.munro@gmail.com>)
List pgsql-bugs
On Thu, Nov 14, 2019 at 7:48 AM Andres Freund <andres@anarazel.de> wrote:
> I don't really see what the problem is with using just a differently
> mixed hash. There's not really a requirement for there to be a
> tremendous amount of independence between the bits used for the bucket,
> and the bits for the batch. We cannot just reuse exactly the same bits
> for the bucket that we used for the hash, as otherwise we'd just raise
> the number of bucket conflicts (as the whole hashtable will only contain
> tuples with the reduced number of bits). But I don't see why that
> implies a need for full bit indepence.  As long as the mixing guarantees
> that we use the bits for bucketing differently than the ones for
> batches, I think we're good.  All that need is that each batch is is
> roughly equally sized, and contains tuples with hashes that roughly
> equally distributed - you could bitreverse the hash for batch selection,
> and get that, no?

Right, that's what I said earlier:

>     batchno = reverse_bits(hashvalue) & (nbatch - 1);
>     bucketno = hashvalue & (nbuckets - 1);

My theory is that batchno = hash_combine(<constant>, hashvalue) &
(nbatch - 1) has about the same distribution properties.  That is, it
works nicely up until you have more than about 2^32 keys, and then it
begins to break down due to lack of information, but the break down
manifests as more conflicts in buckets, instead of the failure to
repartition once you start reading off the end of the hash value (the
cause of the present bug report).  The bit reverse approach might be a
bit less mysterious, but when I looked at ways to implement that they
looked potentially slower than the xor, shifts and adds used by
hash_combine.  I didn't attempt to test that, though.



pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: [BUG] Uninitializaed configOut.leafType used.
Next
From: Tom Lane
Date:
Subject: Re: Unexpected "cache lookup failed for collation 0" failure