Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash - Mailing list pgsql-bugs

From Tomas Vondra
Subject Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash
Date
Msg-id 20191210173224.y5vaam6ep7pjmddt@development
Whole thread Raw
In response to Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Responses Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash
Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash
Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash
List pgsql-bugs
On Tue, Dec 03, 2019 at 02:07:04PM -0300, Alvaro Herrera wrote:
>Hello
>
>I can make no useful comment on the correctness of the new bit
>distribution.  But I can make two comments on this part:
>
>On 2019-Nov-21, Thomas Munro wrote:
>
>> So here's a draft steal-the-bucket-bits patch.  Yeah,
>> reverse_bits_u32() may be in the wrong header.
>
>I think pg_bitutils.h is the right place in master, but that file didn't
>exist earlier.  I think access/hash.h is the right place in older
>branches, which is where hash_any() used to be.
>
>> But is it too slow?
>> On my desktop I can call ExecGetBucketAndBatch() 353 million times per
>> second (~2.8ns), and unpatched gets me 656 million/sec (~1.5ns)
>> (though that includes a function call, and when Hash does it it's
>> inlined), but it's only slower when nbatch > 1 due to the condition.
>
>If the whole query takes hours (or even minutes) to run, then adding one
>second of runtime is not going to change things in any noticeable way.
>I'd rather have the extreme cases working and take additional 1.3ns per
>output row, than not work at all.
>
>> To put that number into persepective, I can hash 10 million single-int
>> tuples from a prewarmed seq scan in 2.5s without batching or
>> parallelism, so that's 250ns per tuple.  This'd be +0.4% of that, and
>> I do see it in a few more samples with my profiler, but it's still
>> basically nothing, and lost in the noise with other noisy partitioning
>> overheads like IO.  Thoughts?
>
>That summarizes it well for me: yes, it's a slowdown, yes it's barely
>measurable.
>

OK, I did a bit of testing with this patch today, and I can confirm it
does indeed help quite a bit. I only used a smaller data set with 4.5B
rows, but it was enough to trigger the issue - it allocated ~1TB of temp
space, and then crashed.

The annoying thing is that it's the workers that crash, and the leader
failed to notice that, so it was waiting in WaitForParallelWorkersToExit
forever. Not sure what the issue is.

Anyway, with the patch applied, the query completed in ~2.2 hours,
after allocating ~235GB of temp files. So that's good, the patch clearly
improves the situation quite a bit.

As for the performance impact, I did this:

   create table dim (id int, val text);
   insert into dim select i, md5(i::text) from generate_series(1,1000000) s(i);

   create table fact (id int, val text);
   insert into fact select mod(i,1000000)+1, md5(i::text) from generate_series(1,25000000) s(i);

   set max_parallel_workers_per_gather = 0;
   select count(*) from fact join dim using (id);

So a perfectly regular join between 1M and 25M table. On my machine,
this takes ~8851ms on master and 8979ms with the patch (average of about
20 runs with minimal variability). That's ~1.4% regression, so a bit
more than the 0.4% mentioned before. Not a huge difference though, and
some of it might be due to different binary layout etc.

It's probably still a good idea to do this, although I wonder if we
might start doing this only when actually running out of bits (and use
the current algorithm by default). But I have no idea if that would be
any cheaper, and it would add complexity.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #16158: Check constraints using SQL functions work incorrectly
Next
From: Alvaro Herrera
Date:
Subject: Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash