Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash - Mailing list pgsql-bugs

From Andres Freund
Subject Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash
Date
Msg-id 20191113184759.gacxzbxk523vupfc@alap3.anarazel.de
Whole thread Raw
In response to Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash  (Thomas Munro <thomas.munro@gmail.com>)
List pgsql-bugs
Hi,

On 2019-11-11 11:46:05 +0100, Tomas Vondra wrote:
> On Mon, Nov 11, 2019 at 01:33:41PM +1300, Thomas Munro wrote:
> Hmmm, iteresting. Using a hard-coded constant seems a bit weird,
> although I don't have any precise argument why it's wrong yet.
> 
> Essentially, the patch does this:
> 
>   batchno = hash_combine(0x9e3779b9, hashvalue) & (nbatch - 1);
>   bucketno = hashvalue & (nbuckets - 1);
> 
> but adding a constant does not 'propagate' any bits from the original
> value to the hash. So it seems pretty much equivalent to

I was over IM arguing that we ought to also use a different mixing
functions when computing the hash.


>   batchno = hashvalue & (nbatch - 1);
>   bucketno = hashvalue & (nbuckets - 1);

> so we still operate with the same number of bits. It might solve this
> particular case, but I doubt it's a good idea in general ...
> 
> I think we will have to actually compute two hashes, to get more than 32
> bits. But yes, that'll be more invasive, and unlikely backpatchable.

I think it might also just generally not be acceptable, from a
performance POV. Computing the hashes is already a major bottleneck for
HJ. Penalizing everybody for these extreme cases doesn't strike me as
great. It might be ok to compute a 64bit hash, but I don't see how
computing two hashes in parallel wouldn't have significant overhead.


I don't really see what the problem is with using just a differently
mixed hash. There's not really a requirement for there to be a
tremendous amount of independence between the bits used for the bucket,
and the bits for the batch. We cannot just reuse exactly the same bits
for the bucket that we used for the hash, as otherwise we'd just raise
the number of bucket conflicts (as the whole hashtable will only contain
tuples with the reduced number of bits). But I don't see why that
implies a need for full bit indepence.  As long as the mixing guarantees
that we use the bits for bucketing differently than the ones for
batches, I think we're good.  All that need is that each batch is is
roughly equally sized, and contains tuples with hashes that roughly
equally distributed - you could bitreverse the hash for batch selection,
and get that, no?


Greetings,

Andres Freund



pgsql-bugs by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: [BUG] Uninitializaed configOut.leafType used.
Next
From: Mukesh Chhatani
Date:
Subject: Re: BUG #16109: Postgres planning time is high across version - 10.6vs 10.10