Re: Fix overflow of nbatch - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Fix overflow of nbatch |
Date | |
Msg-id | 244dc6c1-3b3d-4de2-b3de-b1511e6a6d10@vondra.me Whole thread Raw |
In response to | Re: Fix overflow of nbatch (Nathan Bossart <nathandbossart@gmail.com>) |
List | pgsql-hackers |
Hi, Here's a couple draft patches fixing the bug: - 0001 adds the missing size_t cast, to fix the overflow - 0002 fixes the balancing, by adjusting the hash table size limit - 0003 adds the missing overflow protection for nbuckets and the hash table limit - 0004 rewords the comment explaining how the balancing works. Reading it after a couple months, I found it overly verbose / not very clear. I'm sure it could be improved even further. 0001 and 0002 are pretty trivial, 0003 is a bit bigger, but most of the code is simply how we clamp nbuckets elsewhere (more or less). At some point I started wondering if this would be simpler if it'd have been better to use the algerbraic solution posted by James Hunter back in February [1]. It'd not need the loop, but it'd still need all this new overflow protection etc. I wanted to make sure the patches actually make it work correctly, so I created a table with 4B rows: create table t (a bigint, b text); insert into t select i, md5(i::text) from generate_series(1,4000000000) s(i); and I added this log message at the end of ExecChooseHashTableSize: elog(WARNING, "wm %d nbatch %d nbucket %d space %ld total %ld", work_mem, nbatch, nbuckets, (*space_allowed)/1024, (*space_allowed + 2 * nbatch * (Size) BLCKSZ)/1024); and I ran an explain on a self-join set enable_mergejoin = off; set max_parallel_workers_per_gather = 0; set work_mem = '...'; explain select * from t t1 join t t2 on (t1.a = t2.a); with work_mem set to values between 64kB and 1GB. On 18.0 I got this: wm 64 nbatch 8 nbucket 2097152 hash 131072 total 131200 wm 128 nbatch 16 nbucket 4194304 hash 262144 total 262400 wm 256 nbatch 32 nbucket 8388608 hash 524288 total 524800 wm 512 nbatch 64 nbucket 16777216 hash 1048576 total 1049600 wm 1024 nbatch 128 nbucket 33554432 hash 2097152 total 2099200 wm 2048 nbatch 256 nbucket 33554432 hash 2097152 total 2101248 wm 4096 nbatch 512 nbucket 16777216 hash 1048576 total 1056768 wm 8192 nbatch 1024 nbucket 8388608 hash 524288 total 540672 wm 16384 nbatch 2048 nbucket 4194304 hash 262144 total 294912 wm 32768 nbatch 4096 nbucket 2097152 hash 131072 total 196608 wm 65536 nbatch 4096 nbucket 2097152 hash 131072 total 196608 wm 131072 nbatch 2048 nbucket 4194304 hash 262144 total 294912 wm 262144 nbatch 1024 nbucket 8388608 hash 524288 total 540672 wm 524288 nbatch 512 nbucket 16777216 hash 1048576 total 1056768 wm 1048576 nbatch 256 nbucket 33554432 hash 2097152 total 2101248 I wanted to know how serious the issue is, compared to what would happen without the balancing. I disabled the balancing (by skipping the loop), and then I get this: wm 64 nbatch 8192 nbucket 2048 hash 128 total 131200 wm 128 nbatch 16384 nbucket 4096 hash 256 total 262400 wm 256 nbatch 32768 nbucket 8192 hash 512 total 524800 wm 512 nbatch 65536 nbucket 16384 hash 1024 total 1049600 wm 1024 nbatch 131072 nbucket 32768 hash 2048 total 2099200 wm 2048 nbatch 131072 nbucket 65536 hash 4096 total 2101248 wm 4096 nbatch 65536 nbucket 131072 hash 8192 total 1056768 wm 8192 nbatch 32768 nbucket 262144 hash 16384 total 540672 wm 16384 nbatch 16384 nbucket 524288 hash 32768 total 294912 wm 32768 nbatch 8192 nbucket 1048576 hash 65536 total 196608 wm 65536 nbatch 4096 nbucket 2097152 hash 131072 total 196608 wm 131072 nbatch 2048 nbucket 4194304 hash 262144 total 294912 wm 262144 nbatch 1024 nbucket 8388608 hash 524288 total 540672 wm 524288 nbatch 512 nbucket 16777216 hash 1048576 total 1056768 wm 1048576 nbatch 256 nbucket 33554432 hash 2097152 total 2101248 The interesting bit is that the expected total memory usage (the last number in the log line) is exactly the same as for 18.0 with and without balancing. IIUC this is due to the "stop" condition using the initial hash table size. It makes me a bit less worried about this triggering OOM crashes - it does not improve the behavior, but it doesn't use more memory than before. Still an embarrassing bug, though. With the attached patches, this looks like this: wm 64 nbatch 256 nbucket 65536 hash 4096 total 8192 wm 128 nbatch 512 nbucket 131072 hash 8192 total 16384 wm 256 nbatch 1024 nbucket 262144 hash 16384 total 32768 wm 512 nbatch 2048 nbucket 524288 hash 32768 total 65536 wm 1024 nbatch 4096 nbucket 1048576 hash 65536 total 131072 wm 2048 nbatch 4096 nbucket 2097152 hash 131072 total 196608 wm 4096 nbatch 4096 nbucket 2097152 hash 131072 total 196608 wm 8192 nbatch 4096 nbucket 2097152 hash 131072 total 196608 wm 16384 nbatch 4096 nbucket 2097152 hash 131072 total 196608 wm 32768 nbatch 4096 nbucket 2097152 hash 131072 total 196608 wm 65536 nbatch 4096 nbucket 2097152 hash 131072 total 196608 wm 131072 nbatch 2048 nbucket 4194304 hash 262144 total 294912 wm 262144 nbatch 1024 nbucket 8388608 hash 524288 total 540672 wm 524288 nbatch 512 nbucket 16777216 hash 1048576 total 1056768 wm 1048576 nbatch 256 nbucket 33554432 hash 2097152 total 2101248 So, this time it actually seems to work correctly and significantly reduces the memory usage ... There's one weird thing remaining - if you look at nbatch, it actually increases for the first couple work_mem steps. That's weird, because after increasing work_mem we should need *fewer* batches. But this has nothing to do with the balancing, it happens even with it disabled. The reason is that when calculating nbatch we do this: dbatch = Min(dbatch, max_pointers); and max_pointers is calculated from work_mem (among other things). It's a bit funny the logica worries about how many batch pointers we have, and refuses to allow more. But at the same time it ignores the BufFiles. AFAICS it's harmless - we may pick low number of batches initially, but then later we'll ramp it up (and the balancing will work too). And if you choose to run huge hash joins with tiny work_mem, I guess you're in for the suffering anyway. In any case, it's unrelated to balancing. regards [1] https://www.postgresql.org/message-id/CAJVSvF6290rJF2MtgSx_SuT9Kn2amZ_%2BzecoZYMU%2Bdn3BVVaZg%40mail.gmail.com -- Tomas Vondra
Attachment
pgsql-hackers by date: