Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash - Mailing list pgsql-bugs

From Tomas Vondra
Subject Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash
Date
Msg-id 20191111104605.c4w27as5sq7nm3x4@development
Whole thread Raw
In response to Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash
Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash
List pgsql-bugs
On Mon, Nov 11, 2019 at 01:33:41PM +1300, Thomas Munro wrote:
>On Mon, Nov 11, 2019 at 12:44 PM Tomas Vondra
><tomas.vondra@2ndquadrant.com> wrote:
>> On Sun, Nov 10, 2019 at 02:46:31PM -0800, Andres Freund wrote:
>> >On 2019-11-10 22:50:17 +0100, Tomas Vondra wrote:
>> >> On Sun, Nov 10, 2019 at 10:23:52PM +0100, Tomas Vondra wrote:
>> >> > On Mon, Nov 11, 2019 at 10:08:58AM +1300, Thomas Munro wrote:
>> >> > Can't we simply compute two hash values, using different seeds - one for
>> >> > bucket and the other for batch? Of course, that'll be more expensive.
>> >>
>> >> Meh, I realized that's pretty much just a different way to get 64-bit
>> >> hashes (which is what you mentioned).
>> >
>> >I'm not sure it's really the same, given practical realities in
>> >postgres. Right now the "extended" hash function supporting 64 bit hash
>> >functions is optional. So we couldn't unconditionally rely on it being
>> >present, even in master, unless we're prepared to declare it as
>> >required from now on.
>> >
>> >So computing two different hash values at the same time, by using a
>> >different IV and a different combine function, doesn't seem like an
>> >unreasonable approach.
>>
>> True. I was commenting on the theoretical fact that computing two 32-bit
>> hashes is close to computing a 64-bit hash, but you're right there are
>> implementation details that may make it more usable in our case.
>
>Here is a quick sketch of something like that, for discussion only.  I
>figured that simply mixing the hash value we have with some arbitrary
>bits afterwards would be just as good as having started with a
>different IV, which leads to a very simple change without refactoring.
>From quick experiments with unique keys (generate_series) I seem to
>get approximately even sized partitions, and correct answers, but I
>make no claim to strong hash-math-fu and haven't tested on very large
>inputs.  Thoughts?

Hmmm, iteresting. Using a hard-coded constant seems a bit weird,
although I don't have any precise argument why it's wrong yet.

Essentially, the patch does this:

   batchno = hash_combine(0x9e3779b9, hashvalue) & (nbatch - 1);
   bucketno = hashvalue & (nbuckets - 1);

but adding a constant does not 'propagate' any bits from the original
value to the hash. So it seems pretty much equivalent to

   batchno = hashvalue & (nbatch - 1);
   bucketno = hashvalue & (nbuckets - 1);

so we still operate with the same number of bits. It might solve this
particular case, but I doubt it's a good idea in general ...

I think we will have to actually compute two hashes, to get more than 32
bits. But yes, that'll be more invasive, and unlikely backpatchable.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-bugs by date:

Previous
From: Thomas Munro
Date:
Subject: Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash
Next
From: James Coleman
Date:
Subject: Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash