Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash - Mailing list pgsql-bugs
From | Tomas Vondra |
---|---|
Subject | Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash |
Date | |
Msg-id | 20191111104605.c4w27as5sq7nm3x4@development Whole thread Raw |
In response to | Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash (Thomas Munro <thomas.munro@gmail.com>) |
Responses |
Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash
Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash |
List | pgsql-bugs |
On Mon, Nov 11, 2019 at 01:33:41PM +1300, Thomas Munro wrote: >On Mon, Nov 11, 2019 at 12:44 PM Tomas Vondra ><tomas.vondra@2ndquadrant.com> wrote: >> On Sun, Nov 10, 2019 at 02:46:31PM -0800, Andres Freund wrote: >> >On 2019-11-10 22:50:17 +0100, Tomas Vondra wrote: >> >> On Sun, Nov 10, 2019 at 10:23:52PM +0100, Tomas Vondra wrote: >> >> > On Mon, Nov 11, 2019 at 10:08:58AM +1300, Thomas Munro wrote: >> >> > Can't we simply compute two hash values, using different seeds - one for >> >> > bucket and the other for batch? Of course, that'll be more expensive. >> >> >> >> Meh, I realized that's pretty much just a different way to get 64-bit >> >> hashes (which is what you mentioned). >> > >> >I'm not sure it's really the same, given practical realities in >> >postgres. Right now the "extended" hash function supporting 64 bit hash >> >functions is optional. So we couldn't unconditionally rely on it being >> >present, even in master, unless we're prepared to declare it as >> >required from now on. >> > >> >So computing two different hash values at the same time, by using a >> >different IV and a different combine function, doesn't seem like an >> >unreasonable approach. >> >> True. I was commenting on the theoretical fact that computing two 32-bit >> hashes is close to computing a 64-bit hash, but you're right there are >> implementation details that may make it more usable in our case. > >Here is a quick sketch of something like that, for discussion only. I >figured that simply mixing the hash value we have with some arbitrary >bits afterwards would be just as good as having started with a >different IV, which leads to a very simple change without refactoring. >From quick experiments with unique keys (generate_series) I seem to >get approximately even sized partitions, and correct answers, but I >make no claim to strong hash-math-fu and haven't tested on very large >inputs. Thoughts? Hmmm, iteresting. Using a hard-coded constant seems a bit weird, although I don't have any precise argument why it's wrong yet. Essentially, the patch does this: batchno = hash_combine(0x9e3779b9, hashvalue) & (nbatch - 1); bucketno = hashvalue & (nbuckets - 1); but adding a constant does not 'propagate' any bits from the original value to the hash. So it seems pretty much equivalent to batchno = hashvalue & (nbatch - 1); bucketno = hashvalue & (nbuckets - 1); so we still operate with the same number of bits. It might solve this particular case, but I doubt it's a good idea in general ... I think we will have to actually compute two hashes, to get more than 32 bits. But yes, that'll be more invasive, and unlikely backpatchable. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-bugs by date: