Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash - Mailing list pgsql-bugs
From | Tomas Vondra |
---|---|
Subject | Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash |
Date | |
Msg-id | 20191210173224.y5vaam6ep7pjmddt@development Whole thread Raw |
In response to | Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash (Alvaro Herrera <alvherre@2ndquadrant.com>) |
Responses |
Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash
Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash |
List | pgsql-bugs |
On Tue, Dec 03, 2019 at 02:07:04PM -0300, Alvaro Herrera wrote: >Hello > >I can make no useful comment on the correctness of the new bit >distribution. But I can make two comments on this part: > >On 2019-Nov-21, Thomas Munro wrote: > >> So here's a draft steal-the-bucket-bits patch. Yeah, >> reverse_bits_u32() may be in the wrong header. > >I think pg_bitutils.h is the right place in master, but that file didn't >exist earlier. I think access/hash.h is the right place in older >branches, which is where hash_any() used to be. > >> But is it too slow? >> On my desktop I can call ExecGetBucketAndBatch() 353 million times per >> second (~2.8ns), and unpatched gets me 656 million/sec (~1.5ns) >> (though that includes a function call, and when Hash does it it's >> inlined), but it's only slower when nbatch > 1 due to the condition. > >If the whole query takes hours (or even minutes) to run, then adding one >second of runtime is not going to change things in any noticeable way. >I'd rather have the extreme cases working and take additional 1.3ns per >output row, than not work at all. > >> To put that number into persepective, I can hash 10 million single-int >> tuples from a prewarmed seq scan in 2.5s without batching or >> parallelism, so that's 250ns per tuple. This'd be +0.4% of that, and >> I do see it in a few more samples with my profiler, but it's still >> basically nothing, and lost in the noise with other noisy partitioning >> overheads like IO. Thoughts? > >That summarizes it well for me: yes, it's a slowdown, yes it's barely >measurable. > OK, I did a bit of testing with this patch today, and I can confirm it does indeed help quite a bit. I only used a smaller data set with 4.5B rows, but it was enough to trigger the issue - it allocated ~1TB of temp space, and then crashed. The annoying thing is that it's the workers that crash, and the leader failed to notice that, so it was waiting in WaitForParallelWorkersToExit forever. Not sure what the issue is. Anyway, with the patch applied, the query completed in ~2.2 hours, after allocating ~235GB of temp files. So that's good, the patch clearly improves the situation quite a bit. As for the performance impact, I did this: create table dim (id int, val text); insert into dim select i, md5(i::text) from generate_series(1,1000000) s(i); create table fact (id int, val text); insert into fact select mod(i,1000000)+1, md5(i::text) from generate_series(1,25000000) s(i); set max_parallel_workers_per_gather = 0; select count(*) from fact join dim using (id); So a perfectly regular join between 1M and 25M table. On my machine, this takes ~8851ms on master and 8979ms with the patch (average of about 20 runs with minimal variability). That's ~1.4% regression, so a bit more than the 0.4% mentioned before. Not a huge difference though, and some of it might be due to different binary layout etc. It's probably still a good idea to do this, although I wonder if we might start doing this only when actually running out of bits (and use the current algorithm by default). But I have no idea if that would be any cheaper, and it would add complexity. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-bugs by date: