Thread: ERROR: could not resize shared memory segment...No space left ondevice
A customer's report query hit this error. ERROR: could not resize shared memory segment "/PostgreSQL.2011322019" to 134217728 bytes: No space left on device I found: https://www.postgresql.org/message-id/flat/CAEepm%3D2D_JGb8X%3DLa-0PX9C8dBX9%3Dj9wY%2By1-zDWkcJu0%3DBQbA%40mail.gmail.com work_mem | 128MB dynamic_shared_memory_type | posix version | PostgreSQL 12.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23), 64-bit Running centos 6.9 / linux 2.6.32-696.23.1.el6.x86_64 $ free -m total used free shared buffers cached Mem: 7871 7223 648 1531 5 1988 -/+ buffers/cache: 5229 2642 Swap: 4095 2088 2007 $ mount | grep /dev/shm tmpfs on /dev/shm type tmpfs (rw) $ du -hs /dev/shm 0 /dev/shm $ df /dev/shm Filesystem 1K-blocks Used Available Use% Mounted on tmpfs 4030272 24 4030248 1% /dev/shm Later, I see: $ df -h /dev/shm Filesystem Size Used Avail Use% Mounted on tmpfs 3.9G 3.3G 601M 85% /dev/shm I can reproduce the error running a single instance of the query. The query plan is 1300 lines long, and involves 482 "Scan" nodes on a table which currently has 93 partitions, and for which current partitions are "daily". I believe I repartitioned its history earlier this year to "monthly", probably to avoid "OOM with many sorts", as reported here: https://www.postgresql.org/message-id/20190708164401.GA22387%40telsasoft.com $ grep Scan tmp/sql-`date +%F`.1.ex |sed 's/^ *//; s/ on .*//' |sort |uniq -c |sort -nr 227 -> Parallel Bitmap Heap Scan 227 -> Bitmap Index Scan 14 -> Parallel Seq Scan 9 -> Seq Scan 2 -> Subquery Scan 2 -> Index Scan using sites_pkey 1 Subquery Scan There are total of 10 "Workers Planned": grep -o 'Worker.*' tmp/sql-`date +%F`.1.ex Workers Planned: 2 Workers Planned: 2 Workers Planned: 2 Workers Planned: 2 Workers Planned: 2 I will plan to repartition again to month granularity unless someone wants to collect further information or suggest a better solution. (gdb) bt #0 pg_re_throw () at elog.c:1717 #1 0x0000000000886194 in errfinish (dummy=<value optimized out>) at elog.c:464 #2 0x0000000000749453 in dsm_impl_posix (op=<value optimized out>, handle=<value optimized out>, request_size=<value optimized out>, impl_private=<value optimized out>, mapped_address=<value optimized out>, mapped_size=<value optimized out>, elevel=20) at dsm_impl.c:283 #3 dsm_impl_op (op=<value optimized out>, handle=<value optimized out>, request_size=<value optimized out>, impl_private=<value optimized out>, mapped_address=<value optimized out>, mapped_size=<value optimized out>, elevel=20) at dsm_impl.c:170 #4 0x000000000074a7c8 in dsm_create (size=100868096, flags=0) at dsm.c:459 #5 0x00000000008a94a6 in make_new_segment (area=0x1d70208, requested_pages=<value optimized out>) at dsa.c:2156 #6 0x00000000008aa47a in dsa_allocate_extended (area=0x1d70208, size=100663304, flags=5) at dsa.c:712 #7 0x0000000000670b3f in pagetable_allocate (pagetable=<value optimized out>, size=<value optimized out>) at tidbitmap.c:1511 #8 0x000000000067200c in pagetable_grow (tbm=0x7f82274da8e8, pageno=906296) at ../../../src/include/lib/simplehash.h:405 #9 pagetable_insert (tbm=0x7f82274da8e8, pageno=906296) at ../../../src/include/lib/simplehash.h:530 #10 tbm_get_pageentry (tbm=0x7f82274da8e8, pageno=906296) at tidbitmap.c:1225 #11 0x00000000006724a0 in tbm_add_tuples (tbm=0x7f82274da8e8, tids=<value optimized out>, ntids=1, recheck=false) at tidbitmap.c:405 #12 0x00000000004d7f1f in btgetbitmap (scan=0x1d7d948, tbm=0x7f82274da8e8) at nbtree.c:334 #13 0x00000000004d103a in index_getbitmap (scan=0x1d7d948, bitmap=<value optimized out>) at indexam.c:665 #14 0x00000000006323d8 in MultiExecBitmapIndexScan (node=0x1dcbdb8) at nodeBitmapIndexscan.c:105 #15 0x00000000006317f4 in BitmapHeapNext (node=0x1d8a030) at nodeBitmapHeapscan.c:141 #16 0x000000000062405c in ExecScanFetch (node=0x1d8a030, accessMtd=0x6316d0 <BitmapHeapNext>, recheckMtd=0x631440 <BitmapHeapRecheck>) at execScan.c:133 #17 ExecScan (node=0x1d8a030, accessMtd=0x6316d0 <BitmapHeapNext>, recheckMtd=0x631440 <BitmapHeapRecheck>) at execScan.c:200 #18 0x0000000000622900 in ExecProcNodeInstr (node=0x1d8a030) at execProcnode.c:461 #19 0x000000000062c66f in ExecProcNode (pstate=0x1d7ad70) at ../../../src/include/executor/executor.h:239 #20 ExecAppend (pstate=0x1d7ad70) at nodeAppend.c:292 #21 0x0000000000622900 in ExecProcNodeInstr (node=0x1d7ad70) at execProcnode.c:461 #22 0x0000000000637da2 in ExecProcNode (pstate=0x1d7a630) at ../../../src/include/executor/executor.h:239 #23 ExecHashJoinOuterGetTuple (pstate=0x1d7a630) at nodeHashjoin.c:833 #24 ExecHashJoinImpl (pstate=0x1d7a630) at nodeHashjoin.c:356 #25 ExecHashJoin (pstate=0x1d7a630) at nodeHashjoin.c:572 #26 0x0000000000622900 in ExecProcNodeInstr (node=0x1d7a630) at execProcnode.c:461 #27 0x0000000000637da2 in ExecProcNode (pstate=0x1d7bff0) at ../../../src/include/executor/executor.h:239 #28 ExecHashJoinOuterGetTuple (pstate=0x1d7bff0) at nodeHashjoin.c:833 #29 ExecHashJoinImpl (pstate=0x1d7bff0) at nodeHashjoin.c:356 #30 ExecHashJoin (pstate=0x1d7bff0) at nodeHashjoin.c:572 #31 0x0000000000622900 in ExecProcNodeInstr (node=0x1d7bff0) at execProcnode.c:461 #32 0x000000000061eac7 in ExecProcNode (queryDesc=0x7f8228b72198, direction=<value optimized out>, count=0, execute_once=240) at ../../../src/include/executor/executor.h:239 #33 ExecutePlan (queryDesc=0x7f8228b72198, direction=<value optimized out>, count=0, execute_once=240) at execMain.c:1646 #34 standard_ExecutorRun (queryDesc=0x7f8228b72198, direction=<value optimized out>, count=0, execute_once=240) at execMain.c:364 #35 0x00007f8229aa7878 in pgss_ExecutorRun (queryDesc=0x7f8228b72198, direction=ForwardScanDirection, count=0, execute_once=true) at pg_stat_statements.c:893 #36 0x00007f8228f8d9ad in explain_ExecutorRun (queryDesc=0x7f8228b72198, direction=ForwardScanDirection, count=0, execute_once=true) at auto_explain.c:320 #37 0x000000000061f0ce in ParallelQueryMain (seg=0x1c8d3b8, toc=0x7f82291f1000) at execParallel.c:1399 #38 0x00000000004f7daf in ParallelWorkerMain (main_arg=<value optimized out>) at parallel.c:1431 #39 0x00000000006eb2e0 in StartBackgroundWorker () at bgworker.c:834 #40 0x00000000006f52ac in do_start_bgworker () at postmaster.c:5770 #41 maybe_start_bgworkers () at postmaster.c:5996 #42 0x00000000006f867d in sigusr1_handler ( postgres_signal_arg=<value optimized out>) at postmaster.c:5167 #43 <signal handler called> #44 0x0000003049ae1603 in __select_nocancel () from /lib64/libc.so.6 #45 0x00000000006f9d43 in ServerLoop (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:1668 #46 PostmasterMain (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:1377 #47 0x000000000066a6b0 in main (argc=3, argv=0x1c5b950) at main.c:228 bt f: #2 0x0000000000749453 in dsm_impl_posix (op=<value optimized out>, handle=<value optimized out>, request_size=<value optimizedout>, impl_private=<value optimized out>, mapped_address=<value optimized out>, mapped_size=<value optimized out>,elevel=20) at dsm_impl.c:283 save_errno = <value optimized out> st = {st_dev = 26, st_ino = 0, st_nlink = 33554432, st_mode = 4096, st_uid = 0, st_gid = 65536, __pad0 = 0, st_rdev= 8975118, st_size = 496105863, st_blksize = 8974012, st_blocks = 140199374652304, st_atim = {tv_sec = 7593264, tv_nsec= 20}, st_mtim = {tv_sec = 85899345920, tv_nsec = 140198902917632}, st_ctim = {tv_sec = 44417024, tv_nsec = 289}, __unused = {7593264, 0, 140724603453440}} flags = <value optimized out> fd = <value optimized out> name = "/PostgreSQL.1648263397\000\000\024\242m\371\000\000\000\000\000\061w3\202\177\000\000\000\000\000\001\000\000\000\000pe\363\001\000\000\000\000\360\271\305\001\000\000\000" address = <value optimized out> Thanks, Justin
On Mon, Dec 16, 2019 at 12:49:06PM -0600, Justin Pryzby wrote: >A customer's report query hit this error. >ERROR: could not resize shared memory segment "/PostgreSQL.2011322019" to 134217728 bytes: No space left on device > >I found: >https://www.postgresql.org/message-id/flat/CAEepm%3D2D_JGb8X%3DLa-0PX9C8dBX9%3Dj9wY%2By1-zDWkcJu0%3DBQbA%40mail.gmail.com > >work_mem | 128MB >dynamic_shared_memory_type | posix >version | PostgreSQL 12.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23), 64-bit >Running centos 6.9 / linux 2.6.32-696.23.1.el6.x86_64 > >$ free -m > total used free shared buffers cached >Mem: 7871 7223 648 1531 5 1988 >-/+ buffers/cache: 5229 2642 >Swap: 4095 2088 2007 > >$ mount | grep /dev/shm >tmpfs on /dev/shm type tmpfs (rw) > >$ du -hs /dev/shm >0 /dev/shm > >$ df /dev/shm >Filesystem 1K-blocks Used Available Use% Mounted on >tmpfs 4030272 24 4030248 1% /dev/shm > >Later, I see: >$ df -h /dev/shm >Filesystem Size Used Avail Use% Mounted on >tmpfs 3.9G 3.3G 601M 85% /dev/shm > >I can reproduce the error running a single instance of the query. > >The query plan is 1300 lines long, and involves 482 "Scan" nodes on a table >which currently has 93 partitions, and for which current partitions are >"daily". I believe I repartitioned its history earlier this year to "monthly", >probably to avoid "OOM with many sorts", as reported here: >https://www.postgresql.org/message-id/20190708164401.GA22387%40telsasoft.com > Interestingly enough, I ran into the same ERROR (not sure if the same root cause) while investigating bug #16104 [1], i.e. on a much simpler query (single join). This This particular machine is a bit smaller (only 8GB of RAM and less disk space) so I created a smaller table with "just" 1.5B rows: create table test as select generate_series(1, 1500000000)::bigint i; set work_mem = '150MB'; set max_parallel_workers_per_gather = 8; analyze test; explain select count(*) from test t1 join test t2 using (i); QUERY PLAN ------------------------------------------------------------------------------------------------------------ Finalize Aggregate (cost=67527436.36..67527436.37 rows=1 width=8) -> Gather (cost=67527435.53..67527436.34 rows=8 width=8) Workers Planned: 8 -> Partial Aggregate (cost=67526435.53..67526435.54 rows=1 width=8) -> Parallel Hash Join (cost=11586911.03..67057685.47 rows=187500024 width=0) Hash Cond: (t1.i = t2.i) -> Parallel Seq Scan on test t1 (cost=0.00..8512169.24 rows=187500024 width=8) -> Parallel Hash (cost=8512169.24..8512169.24 rows=187500024 width=8) -> Parallel Seq Scan on test t2 (cost=0.00..8512169.24 rows=187500024 width=8) (9 rows) explain analyze select count(*) from test t1 join test t2 using (i); ERROR: could not resize shared memory segment "/PostgreSQL.1743102822" to 536870912 bytes: No space left on device Now, work_mem = 150MB might be a bit too high considering the machine only has 8GB of RAM (1GB of which is shared_buffers). But that's still just 1.2GB of RAM and this is not an OOM. This actually fills the /dev/shm mount, which is limited to 4GB on this box bench ~ # df | grep shm shm 3994752 16 3994736 1% /dev/shm So somewhere in the parallel hash join, we allocate 4GB of shared segments ... The filesystem usage from the moment of the query execution to the failure looks about like this: Time fs 1K-blocks Used Available Use% Mounted on -------------------------------------------------------------- 10:13:34 shm 3994752 34744 3960008 1% /dev/shm 10:13:35 shm 3994752 35768 3958984 1% /dev/shm 10:13:36 shm 3994752 37816 3956936 1% /dev/shm 10:13:39 shm 3994752 39864 3954888 1% /dev/shm 10:13:42 shm 3994752 41912 3952840 2% /dev/shm 10:13:46 shm 3994752 43960 3950792 2% /dev/shm 10:13:49 shm 3994752 48056 3946696 2% /dev/shm 10:13:56 shm 3994752 52152 3942600 2% /dev/shm 10:14:02 shm 3994752 56248 3938504 2% /dev/shm 10:14:09 shm 3994752 60344 3934408 2% /dev/shm 10:14:16 shm 3994752 68536 3926216 2% /dev/shm 10:14:30 shm 3994752 76728 3918024 2% /dev/shm 10:14:43 shm 3994752 84920 3909832 3% /dev/shm 10:14:43 shm 3994752 84920 3909832 3% /dev/shm 10:14:57 shm 3994752 93112 3901640 3% /dev/shm 10:15:11 shm 3994752 109496 3885256 3% /dev/shm 10:15:38 shm 3994752 125880 3868872 4% /dev/shm 10:16:06 shm 3994752 142264 3852488 4% /dev/shm 10:19:57 shm 3994752 683208 3311544 18% /dev/shm 10:19:58 shm 3994752 1338568 2656184 34% /dev/shm 10:20:02 shm 3994752 1600712 2394040 41% /dev/shm 10:20:03 shm 3994752 2125000 1869752 54% /dev/shm 10:20:04 shm 3994752 2649288 1345464 67% /dev/shm 10:20:08 shm 3994752 2518216 1476536 64% /dev/shm 10:20:10 shm 3994752 3173576 821176 80% /dev/shm 10:20:14 shm 3994752 3697864 296888 93% /dev/shm 10:20:15 shm 3994752 3417288 577464 86% /dev/shm 10:20:16 shm 3994752 3697864 296888 93% /dev/shm 10:20:20 shm 3994752 3828936 165816 96% /dev/shm And at the end, the contents of /dev/shm looks like this: -rw------- 1 postgres postgres 33624064 Dec 16 22:19 PostgreSQL.1005341478 -rw------- 1 postgres postgres 1048576 Dec 16 22:20 PostgreSQL.1011142277 -rw------- 1 postgres postgres 1048576 Dec 16 22:20 PostgreSQL.1047241463 -rw------- 1 postgres postgres 16777216 Dec 16 22:16 PostgreSQL.1094702083 -rw------- 1 postgres postgres 268435456 Dec 16 22:20 PostgreSQL.1143288540 -rw------- 1 postgres postgres 536870912 Dec 16 22:20 PostgreSQL.1180709918 -rw------- 1 postgres postgres 7408 Dec 14 15:43 PostgreSQL.1239805533 -rw------- 1 postgres postgres 134217728 Dec 16 22:20 PostgreSQL.1292496162 -rw------- 1 postgres postgres 268435456 Dec 16 22:20 PostgreSQL.138443773 -rw------- 1 postgres postgres 4194304 Dec 16 22:15 PostgreSQL.1442035225 -rw------- 1 postgres postgres 67108864 Dec 16 22:20 PostgreSQL.147930162 -rw------- 1 postgres postgres 16777216 Dec 16 22:20 PostgreSQL.1525896026 -rw------- 1 postgres postgres 67108864 Dec 16 22:20 PostgreSQL.1541133044 -rw------- 1 postgres postgres 33624064 Dec 16 22:14 PostgreSQL.1736434498 -rw------- 1 postgres postgres 134217728 Dec 16 22:20 PostgreSQL.1845631548 -rw------- 1 postgres postgres 33624064 Dec 16 22:19 PostgreSQL.1952212453 -rw------- 1 postgres postgres 134217728 Dec 16 22:20 PostgreSQL.1965950370 -rw------- 1 postgres postgres 8388608 Dec 16 22:15 PostgreSQL.1983158004 -rw------- 1 postgres postgres 33624064 Dec 16 22:19 PostgreSQL.1997631477 -rw------- 1 postgres postgres 16777216 Dec 16 22:20 PostgreSQL.2071391455 -rw------- 1 postgres postgres 2097152 Dec 16 22:20 PostgreSQL.210551357 -rw------- 1 postgres postgres 67108864 Dec 16 22:20 PostgreSQL.2125755117 -rw------- 1 postgres postgres 8388608 Dec 16 22:14 PostgreSQL.2133152910 -rw------- 1 postgres postgres 2097152 Dec 16 22:20 PostgreSQL.255342242 -rw------- 1 postgres postgres 2097152 Dec 16 22:20 PostgreSQL.306663870 -rw------- 1 postgres postgres 536870912 Dec 16 22:20 PostgreSQL.420982703 -rw------- 1 postgres postgres 134217728 Dec 16 22:20 PostgreSQL.443494372 -rw------- 1 postgres postgres 134217728 Dec 16 22:20 PostgreSQL.457417415 -rw------- 1 postgres postgres 4194304 Dec 16 22:20 PostgreSQL.462376479 -rw------- 1 postgres postgres 16777216 Dec 16 22:16 PostgreSQL.512403457 -rw------- 1 postgres postgres 8388608 Dec 16 22:14 PostgreSQL.546049346 -rw------- 1 postgres postgres 196864 Dec 16 22:13 PostgreSQL.554918510 -rw------- 1 postgres postgres 687584 Dec 16 22:13 PostgreSQL.585813590 -rw------- 1 postgres postgres 4194304 Dec 16 22:15 PostgreSQL.612034010 -rw------- 1 postgres postgres 33624064 Dec 16 22:19 PostgreSQL.635077233 -rw------- 1 postgres postgres 7408 Dec 15 17:28 PostgreSQL.69856210 -rw------- 1 postgres postgres 268435456 Dec 16 22:20 PostgreSQL.785623413 -rw------- 1 postgres postgres 4194304 Dec 16 22:14 PostgreSQL.802559608 -rw------- 1 postgres postgres 67108864 Dec 16 22:20 PostgreSQL.825442833 -rw------- 1 postgres postgres 8388608 Dec 16 22:15 PostgreSQL.827813234 -rw------- 1 postgres postgres 268435456 Dec 16 22:20 PostgreSQL.942923396 -rw------- 1 postgres postgres 536870912 Dec 16 22:20 PostgreSQL.948192559 -rw------- 1 postgres postgres 2097152 Dec 16 22:20 PostgreSQL.968081079 That's a lot of shared segments, considering there are only ~8 workers for the parallel hash join. And some of the segments are 512MB, so not exactly tiny/abiding to the work_mem limit :-( I'm not very familiar with the PHJ internals, but this seems a bit excessive. I mean, how am I supposed to limit memory usage in these queries? Why shouldn't this be subject to work_mem? regards [1] https://www.postgresql.org/message-id/flat/16104-dc11ed911f1ab9df%40postgresql.org -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Dec 17, 2019 at 10:53 AM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > Interestingly enough, I ran into the same ERROR (not sure if the same > root cause) while investigating bug #16104 [1], i.e. on a much simpler > query (single join). > > This This particular machine is a bit smaller (only 8GB of RAM and less > disk space) so I created a smaller table with "just" 1.5B rows: > > create table test as select generate_series(1, 1500000000)::bigint i; > set work_mem = '150MB'; > set max_parallel_workers_per_gather = 8; > analyze test; > > explain select count(*) from test t1 join test t2 using (i); > > QUERY PLAN > ------------------------------------------------------------------------------------------------------------ > Finalize Aggregate (cost=67527436.36..67527436.37 rows=1 width=8) > -> Gather (cost=67527435.53..67527436.34 rows=8 width=8) > Workers Planned: 8 > -> Partial Aggregate (cost=67526435.53..67526435.54 rows=1 width=8) > -> Parallel Hash Join (cost=11586911.03..67057685.47 rows=187500024 width=0) > Hash Cond: (t1.i = t2.i) > -> Parallel Seq Scan on test t1 (cost=0.00..8512169.24 rows=187500024 width=8) > -> Parallel Hash (cost=8512169.24..8512169.24 rows=187500024 width=8) > -> Parallel Seq Scan on test t2 (cost=0.00..8512169.24 rows=187500024 width=8) > (9 rows) > > explain analyze select count(*) from test t1 join test t2 using (i); > > ERROR: could not resize shared memory segment "/PostgreSQL.1743102822" to 536870912 bytes: No space left on device > > Now, work_mem = 150MB might be a bit too high considering the machine > only has 8GB of RAM (1GB of which is shared_buffers). But that's still > just 1.2GB of RAM and this is not an OOM. This actually fills the > /dev/shm mount, which is limited to 4GB on this box > > bench ~ # df | grep shm > shm 3994752 16 3994736 1% /dev/shm > > So somewhere in the parallel hash join, we allocate 4GB of shared segments ... > > The filesystem usage from the moment of the query execution to the > failure looks about like this: > > Time fs 1K-blocks Used Available Use% Mounted on > -------------------------------------------------------------- > 10:13:34 shm 3994752 34744 3960008 1% /dev/shm > 10:13:35 shm 3994752 35768 3958984 1% /dev/shm > 10:13:36 shm 3994752 37816 3956936 1% /dev/shm > 10:13:39 shm 3994752 39864 3954888 1% /dev/shm > 10:13:42 shm 3994752 41912 3952840 2% /dev/shm > 10:13:46 shm 3994752 43960 3950792 2% /dev/shm > 10:13:49 shm 3994752 48056 3946696 2% /dev/shm > 10:13:56 shm 3994752 52152 3942600 2% /dev/shm > 10:14:02 shm 3994752 56248 3938504 2% /dev/shm > 10:14:09 shm 3994752 60344 3934408 2% /dev/shm > 10:14:16 shm 3994752 68536 3926216 2% /dev/shm > 10:14:30 shm 3994752 76728 3918024 2% /dev/shm > 10:14:43 shm 3994752 84920 3909832 3% /dev/shm > 10:14:43 shm 3994752 84920 3909832 3% /dev/shm > 10:14:57 shm 3994752 93112 3901640 3% /dev/shm > 10:15:11 shm 3994752 109496 3885256 3% /dev/shm > 10:15:38 shm 3994752 125880 3868872 4% /dev/shm > 10:16:06 shm 3994752 142264 3852488 4% /dev/shm > 10:19:57 shm 3994752 683208 3311544 18% /dev/shm > 10:19:58 shm 3994752 1338568 2656184 34% /dev/shm > 10:20:02 shm 3994752 1600712 2394040 41% /dev/shm > 10:20:03 shm 3994752 2125000 1869752 54% /dev/shm > 10:20:04 shm 3994752 2649288 1345464 67% /dev/shm > 10:20:08 shm 3994752 2518216 1476536 64% /dev/shm > 10:20:10 shm 3994752 3173576 821176 80% /dev/shm > 10:20:14 shm 3994752 3697864 296888 93% /dev/shm > 10:20:15 shm 3994752 3417288 577464 86% /dev/shm > 10:20:16 shm 3994752 3697864 296888 93% /dev/shm > 10:20:20 shm 3994752 3828936 165816 96% /dev/shm > > And at the end, the contents of /dev/shm looks like this: > > -rw------- 1 postgres postgres 33624064 Dec 16 22:19 PostgreSQL.1005341478 > -rw------- 1 postgres postgres 1048576 Dec 16 22:20 PostgreSQL.1011142277 > -rw------- 1 postgres postgres 1048576 Dec 16 22:20 PostgreSQL.1047241463 > -rw------- 1 postgres postgres 16777216 Dec 16 22:16 PostgreSQL.1094702083 > -rw------- 1 postgres postgres 268435456 Dec 16 22:20 PostgreSQL.1143288540 > -rw------- 1 postgres postgres 536870912 Dec 16 22:20 PostgreSQL.1180709918 > -rw------- 1 postgres postgres 7408 Dec 14 15:43 PostgreSQL.1239805533 > -rw------- 1 postgres postgres 134217728 Dec 16 22:20 PostgreSQL.1292496162 > -rw------- 1 postgres postgres 268435456 Dec 16 22:20 PostgreSQL.138443773 > -rw------- 1 postgres postgres 4194304 Dec 16 22:15 PostgreSQL.1442035225 > -rw------- 1 postgres postgres 67108864 Dec 16 22:20 PostgreSQL.147930162 > -rw------- 1 postgres postgres 16777216 Dec 16 22:20 PostgreSQL.1525896026 > -rw------- 1 postgres postgres 67108864 Dec 16 22:20 PostgreSQL.1541133044 > -rw------- 1 postgres postgres 33624064 Dec 16 22:14 PostgreSQL.1736434498 > -rw------- 1 postgres postgres 134217728 Dec 16 22:20 PostgreSQL.1845631548 > -rw------- 1 postgres postgres 33624064 Dec 16 22:19 PostgreSQL.1952212453 > -rw------- 1 postgres postgres 134217728 Dec 16 22:20 PostgreSQL.1965950370 > -rw------- 1 postgres postgres 8388608 Dec 16 22:15 PostgreSQL.1983158004 > -rw------- 1 postgres postgres 33624064 Dec 16 22:19 PostgreSQL.1997631477 > -rw------- 1 postgres postgres 16777216 Dec 16 22:20 PostgreSQL.2071391455 > -rw------- 1 postgres postgres 2097152 Dec 16 22:20 PostgreSQL.210551357 > -rw------- 1 postgres postgres 67108864 Dec 16 22:20 PostgreSQL.2125755117 > -rw------- 1 postgres postgres 8388608 Dec 16 22:14 PostgreSQL.2133152910 > -rw------- 1 postgres postgres 2097152 Dec 16 22:20 PostgreSQL.255342242 > -rw------- 1 postgres postgres 2097152 Dec 16 22:20 PostgreSQL.306663870 > -rw------- 1 postgres postgres 536870912 Dec 16 22:20 PostgreSQL.420982703 > -rw------- 1 postgres postgres 134217728 Dec 16 22:20 PostgreSQL.443494372 > -rw------- 1 postgres postgres 134217728 Dec 16 22:20 PostgreSQL.457417415 > -rw------- 1 postgres postgres 4194304 Dec 16 22:20 PostgreSQL.462376479 > -rw------- 1 postgres postgres 16777216 Dec 16 22:16 PostgreSQL.512403457 > -rw------- 1 postgres postgres 8388608 Dec 16 22:14 PostgreSQL.546049346 > -rw------- 1 postgres postgres 196864 Dec 16 22:13 PostgreSQL.554918510 > -rw------- 1 postgres postgres 687584 Dec 16 22:13 PostgreSQL.585813590 > -rw------- 1 postgres postgres 4194304 Dec 16 22:15 PostgreSQL.612034010 > -rw------- 1 postgres postgres 33624064 Dec 16 22:19 PostgreSQL.635077233 > -rw------- 1 postgres postgres 7408 Dec 15 17:28 PostgreSQL.69856210 > -rw------- 1 postgres postgres 268435456 Dec 16 22:20 PostgreSQL.785623413 > -rw------- 1 postgres postgres 4194304 Dec 16 22:14 PostgreSQL.802559608 > -rw------- 1 postgres postgres 67108864 Dec 16 22:20 PostgreSQL.825442833 > -rw------- 1 postgres postgres 8388608 Dec 16 22:15 PostgreSQL.827813234 > -rw------- 1 postgres postgres 268435456 Dec 16 22:20 PostgreSQL.942923396 > -rw------- 1 postgres postgres 536870912 Dec 16 22:20 PostgreSQL.948192559 > -rw------- 1 postgres postgres 2097152 Dec 16 22:20 PostgreSQL.968081079 > > That's a lot of shared segments, considering there are only ~8 workers > for the parallel hash join. And some of the segments are 512MB, so not > exactly tiny/abiding to the work_mem limit :-( > > I'm not very familiar with the PHJ internals, but this seems a bit > excessive. I mean, how am I supposed to limit memory usage in these > queries? Why shouldn't this be subject to work_mem? It's subject to work_mem per process (leader + workers). So it would like to use 150M * 9 = 1350M, but then there are things that we don't measure at all, including the per batch data as you were complaining about in that other thread, and here that's quite extreme because the bug in question is one that reaches large partition counts. You're also looking at the raw shared memory segments, but there is a level on top of that which is the DSA allocator. It suffers from fragmentation like any other general purpose allocator (so maybe it is backed by something like 2x the memory the client allocated at worst), though unfortunately, unlike typical allocators, we have to force the OS to really allocate the memory pages on Linux only or it fails with SIGBUS later when the kernel can't extend a tmpfs file.