Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash - Mailing list pgsql-bugs

From Tomas Vondra
Subject Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash
Date
Msg-id 20191110215017.prjisnntwhhppnru@development
Whole thread Raw
In response to Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash
List pgsql-bugs
On Sun, Nov 10, 2019 at 10:23:52PM +0100, Tomas Vondra wrote:
>On Mon, Nov 11, 2019 at 10:08:58AM +1300, Thomas Munro wrote:
>>I think I see what's happening: we're running out of hash bits.
>>
>>>Buckets: 4194304 (originally 4194304)  Batches: 32768 (originally 4096)  Memory Usage: 344448kB
>>
>>Here it's using the lower 22 bits for the bucket number, and started
>>out using 12 bits for the batch (!), and increased that until it got
>>to 15 (!!).  After using 22 bits for the bucket, there are only 10
>>bits left, so all the tuples go into the lower 1024 batches.
>>
>
>Ouch!
>
>>I'm not sure how exactly this leads to wildly varying numbers of
>>repartioning cycles (the above-quoted example did it 3 times, the
>>version that crashed into MaxAllocSize did it ~10 times).
>>
>>Besides switching to 64 bit hashes so that we don't run out of
>>information (clearly a good idea), what other options do we have?  (1)
>>We could disable repartitioning (give up on work_mem) after we've run
>>out of bits; this will eat more memory than it should.  (2) You could
>>start stealing bucket bits; this will eat more CPU than it should,
>>because you'd effectively have fewer active buckets (tuples will
>>concentrated on the bits you didn't steal).
>
>Can't we simply compute two hash values, using different seeds - one for
>bucket and the other for batch? Of course, that'll be more expensive.

Meh, I realized that's pretty much just a different way to get 64-bit
hashes (which is what you mentioned).

An I think you're right we should detect cases when we use all the bits,
and stop making it worse. Stealing bucket bits seems reasonable - we
can't stop adding batches, because that would mean we stop enforcing
work_mem. The question is how far that gets us - we enforce nbuckets to
be at least 1024 (i.e. 10 bits), which leaves 32 bits for nbatch. And in
one of the explains we've seen nbatch=2097152 (i.e. 21 bits).

Of course, this is bound to be (extremely) slow. The hashjoin changes in
9.5 which reduced th hash table load factor from 10 to 1 resulted in
speedups close to 3x. And if you go from 4194304 to 1024 buckets, those
will be loooooooong chains in each bucket :-(

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-bugs by date:

Previous
From: James Coleman
Date:
Subject: Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash
Next
From: Tomas Vondra
Date:
Subject: Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash