Home > mailing lists

Re: Out of Memory errors are frustrating as heck! - Mailing list pgsql-performance

From	Gunther
Subject	Re: Out of Memory errors are frustrating as heck!
Date	April 20, 2019 23:00:18
Msg-id	e3457109-4500-6c59-10fb-0fa4405cc9e4@gusw.net Whole thread Raw
In response to	Re: Out of Memory errors are frustrating as heck! (Justin Pryzby <pryzby@telsasoft.com>)
Responses	Re: Out of Memory errors are frustrating as heck! (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-performance

Tree view

On Tue, Apr 16, 2019 at 11:46:51PM -0500, Justin Pryzby wrote:

I wonder if it'd be useful to compile with 
./configure CFLAGS=-DHJDEBUG=1

Could you try this, too ?

OK, doing it now, here is what I'm getting in the log file now. I am surprised I get so few rows here when there

2019-04-20 17:12:16.077 UTC [7093] LOG:  database system was shut down at 2019-04-20 17:12:15 UTC
2019-04-20 17:12:16.085 UTC [7091] LOG:  database system is ready to accept connections
Hashjoin 0x118e0c8: initial nbatch = 1, nbuckets = 1024
Hashjoin 0x118e0f8: initial nbatch = 1, nbuckets = 1024
Hashjoin 0x1194e78: initial nbatch = 1, nbuckets = 65536
Hashjoin 0x119b518: initial nbatch = 16, nbuckets = 65536
Hashjoin 0x1194e78: initial nbatch = 1, nbuckets = 65536
Hashjoin 0x119bb38: initial nbatch = 16, nbuckets = 65536
TopMemoryContext: 4347672 total in 9 blocks; 41784 free (19 chunks); 4305888 used HandleParallelMessages: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used pgstat TabStatusArray lookup hash table: 8192 total in 1 blocks; 1456 free (0 chunks); 6736 used TopTransactionContext: 8192 total in 1 blocks; 5416 free (2 chunks); 2776 used Operator lookup cache: 24576 total in 2 blocks; 10760 free (3 chunks); 13816 used TableSpace cache: 8192 total in 1 blocks; 2096 free (0 chunks); 6096 used Type information cache: 24352 total in 2 blocks; 2624 free (0 chunks); 21728 used RowDescriptionContext: 8192 total in 1 blocks; 6896 free (0 chunks); 1296 used MessageContext: 32768 total in 3 blocks; 13488 free (1 chunks); 19280 used Operator class cache: 8192 total in 1 blocks; 560 free (0 chunks); 7632 used smgr relation table: 32768 total in 3 blocks; 16832 free (8 chunks); 15936 used TransactionAbortContext: 32768 total in 1 blocks; 32512 free (0 chunks); 256 used Portal hash: 8192 total in 1 blocks; 560 free (0 chunks); 7632 used TopPortalContext: 8192 total in 1 blocks; 7664 free (0 chunks); 528 used   PortalHoldContext: 24632 total in 2 blocks; 7392 free (0 chunks); 17240 used   PortalContext: 1474560 total in 183 blocks; 6152 free (8 chunks); 1468408 used:     ExecutorState: 2234501600 total in 266274 blocks; 3696112 free (17274 chunks); 2230805488 used       HashTableContext: 32768 total in 3 blocks; 17272 free (8 chunks); 15496 used         HashBatchContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used       HashTableContext: 8192 total in 1 blocks; 7320 free (0 chunks); 872 used         HashBatchContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used       HashTableContext: 8192 total in 1 blocks; 7320 free (0 chunks); 872 used         HashBatchContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used       HashTableContext: 8192 total in 1 blocks; 7752 free (0 chunks); 440 used         HashBatchContext: 90288 total in 4 blocks; 16072 free (6 chunks); 74216 used       HashTableContext: 8192 total in 1 blocks; 7624 free (0 chunks); 568 used         HashBatchContext: 90288 total in 4 blocks; 16072 free (6 chunks); 74216 used       TupleSort main: 286912 total in 8 blocks; 246792 free (39 chunks); 40120 used       TupleSort main: 286912 total in 8 blocks; 246792 free (39 chunks); 40120 used       ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used       HashTableContext: 8454256 total in 6 blocks; 64848 free (32 chunks); 8389408 used         HashBatchContext: 177003344 total in 5387 blocks; 7936 free (0 chunks); 176995408 used       TupleSort main: 452880 total in 8 blocks; 126248 free (27 chunks); 326632 used         Caller tuples: 1048576 total in 8 blocks; 21608 free (14 chunks); 1026968 used       ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used       ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used       ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used       ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used       ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used       ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used       ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used       ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used       ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used       ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used       ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used       ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used       ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used       ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used       ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used       ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used       ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used       ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used       ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used       ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used       ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used       ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used       ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used       ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used       ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used       ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used       ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used Relcache by OID: 16384 total in 2 blocks; 3512 free (2 chunks); 12872 used CacheMemoryContext: 1101328 total in 14 blocks; 236384 free (1 chunks); 864944 used   index info: 2048 total in 2 blocks; 680 free (1 chunks); 1368 used: pg_toast_2619_index   index info: 2048 total in 2 blocks; 968 free (1 chunks); 1080 used: entity_id_fkidx   index info: 2048 total in 2 blocks; 696 free (1 chunks); 1352 used: entity_id_idx   index info: 2048 total in 2 blocks; 968 free (1 chunks); 1080 used: act_id_fkidx   ...   index info: 2048 total in 2 blocks; 680 free (1 chunks); 1368 used: pg_attribute_relid_attnum_index   index info: 2048 total in 2 blocks; 952 free (1 chunks); 1096 used: pg_class_oid_index WAL record construction: 49768 total in 2 blocks; 6368 free (0 chunks); 43400 used PrivateRefCount: 8192 total in 1 blocks; 2624 free (0 chunks); 5568 used MdSmgr: 8192 total in 1 blocks; 6152 free (1 chunks); 2040 used LOCALLOCK hash: 16384 total in 2 blocks; 4600 free (2 chunks); 11784 used Timezones: 104120 total in 2 blocks; 2624 free (0 chunks); 101496 used ErrorContext: 8192 total in 1 blocks; 7936 free (4 chunks); 256 used
Grand total: 2430295856 bytes in 272166 blocks; 5223104 free (17571 chunks); 2425072752 used
2019-04-20 17:28:56.887 UTC [7100] ERROR:  out of memory
2019-04-20 17:28:56.887 UTC [7100] DETAIL:  Failed on request of size 32800 in memory context "HashBatchContext".
2019-04-20 17:28:56.887 UTC [7100] STATEMENT:  explain analyze SELECT * FROM reports.v_BusinessOperation;

There are amazingly few entries in the log.

(gdb) break ExecHashIncreaseNumBatches
Breakpoint 1 at 0x6b7bd6: file nodeHash.c, line 884.
(gdb) cont
Continuing.

Breakpoint 1, ExecHashIncreaseNumBatches (hashtable=0x12d0818) at nodeHash.c:884
884             int                     oldnbatch = hashtable->nbatch;
(gdb) cont
Continuing.

Breakpoint 1, ExecHashIncreaseNumBatches (hashtable=0x12d0818) at nodeHash.c:884
884             int                     oldnbatch = hashtable->nbatch;
(gdb) cont
Continuing.

Breakpoint 1, ExecHashIncreaseNumBatches (hashtable=0x12d0818) at nodeHash.c:884
884             int                     oldnbatch = hashtable->nbatch;
(gdb) cont 100
Will ignore next 99 crossings of breakpoint 1.  Continuing.

But I am surprised now to find that the behavior has changed or what?

First, weirdly enough, now I am not getting any of the HJDEBUG messages any more. And yet my resident memory has already increased to 1.1 GB.

 7100 postgres  20   0 2797036   1.1g 188064 R  17.9 14.9  14:46.00 postgres: postgres integrator [local] EXPLAIN7664 postgres  20   0 1271788  16228  14408 D  17.6  0.2   0:01.96 postgres: parallel worker for PID 71007665 postgres  20   0 1271788  16224  14404 R  17.6  0.2   0:01.95 postgres: parallel worker for PID 7100

so why is it all different now? Are we chasing a red herring?

Finally I had another stop now at the HJDEBUG line

#0  ExecHashIncreaseNumBatches (hashtable=0x12d0818) at nodeHash.c:904
#1  0x00000000006b93b1 in ExecHashTableInsert (hashtable=0x12d0818, slot=0x12bcf98, hashvalue=234960700) at nodeHash.c:1655
#2  0x00000000006bd600 in ExecHashJoinNewBatch (hjstate=0x12a4340) at nodeHashjoin.c:1051
#3  0x00000000006bc999 in ExecHashJoinImpl (pstate=0x12a4340, parallel=false) at nodeHashjoin.c:539
#4  0x00000000006bca23 in ExecHashJoin (pstate=0x12a4340) at nodeHashjoin.c:565
#5  0x00000000006a191f in ExecProcNodeInstr (node=0x12a4340) at execProcnode.c:461
#6  0x00000000006ceaad in ExecProcNode (node=0x12a4340) at ../../../src/include/executor/executor.h:247
#7  0x00000000006cebe7 in ExecSort (pstate=0x12a4230) at nodeSort.c:107
#8  0x00000000006a191f in ExecProcNodeInstr (node=0x12a4230) at execProcnode.c:461
#9  0x00000000006a18f0 in ExecProcNodeFirst (node=0x12a4230) at execProcnode.c:445
#10 0x00000000006cf25c in ExecProcNode (node=0x12a4230) at ../../../src/include/executor/executor.h:247
#11 0x00000000006cf388 in ExecUnique (pstate=0x12a4040) at nodeUnique.c:73
#12 0x00000000006a191f in ExecProcNodeInstr (node=0x12a4040) at execProcnode.c:461
#13 0x00000000006a18f0 in ExecProcNodeFirst (node=0x12a4040) at execProcnode.c:445
#14 0x000000000069728b in ExecProcNode (node=0x12a4040) at ../../../src/include/executor/executor.h:247
#15 0x0000000000699790 in ExecutePlan (estate=0x12a3da0, planstate=0x12a4040, use_parallel_mode=true, operation=CMD_SELECT,   sendTuples=true, numberTuples=0, direction=ForwardScanDirection, dest=0xe811c0 <donothingDR>, execute_once=true)   at execMain.c:1723
#16 0x0000000000697757 in standard_ExecutorRun (queryDesc=0x1404168, direction=ForwardScanDirection, count=0, execute_once=true)   at execMain.c:364
#17 0x00000000006975f4 in ExecutorRun (queryDesc=0x1404168, direction=ForwardScanDirection, count=0, execute_once=true)   at execMain.c:307
#18 0x000000000060d227 in ExplainOnePlan (plannedstmt=0x1402588, into=0x0, es=0x10b03a8,   queryString=0x10866c0 "explain analyze SELECT * FROM reports.v_BusinessOperation;", params=0x0, queryEnv=0x0,   planduration=0x7fff56a0df00) at explain.c:535

So now I am back with checking who calls AllocSetAlloc?

(gdb) break AllocSetAlloc if (int)strcmp(context->name, "ExecutorState") == 0
Breakpoint 4 at 0x9eab11: file aset.c, line 719.
(gdb) cont
Continuing.

Breakpoint 4, AllocSetAlloc (context=0x12a3c90, size=8272) at aset.c:719
719             AllocSet        set = (AllocSet) context;
(gdb) bt 5
#0  AllocSetAlloc (context=0x12a3c90, size=8272) at aset.c:719
#1  0x00000000009f2e47 in palloc (size=8272) at mcxt.c:938
#2  0x000000000082ae84 in makeBufFileCommon (nfiles=1) at buffile.c:116
#3  0x000000000082af14 in makeBufFile (firstfile=34029) at buffile.c:138
#4  0x000000000082b09f in BufFileCreateTemp (interXact=false) at buffile.c:201
(More stack frames follow...)
(gdb) bt 7
#0  AllocSetAlloc (context=0x12a3c90, size=8272) at aset.c:719
#1  0x00000000009f2e47 in palloc (size=8272) at mcxt.c:938
#2  0x000000000082ae84 in makeBufFileCommon (nfiles=1) at buffile.c:116
#3  0x000000000082af14 in makeBufFile (firstfile=34029) at buffile.c:138
#4  0x000000000082b09f in BufFileCreateTemp (interXact=false) at buffile.c:201
#5  0x00000000006bda31 in ExecHashJoinSaveTuple (tuple=0x86069b8, hashvalue=234960700, fileptr=0x8616a30) at nodeHashjoin.c:1220
#6  0x00000000006b80c0 in ExecHashIncreaseNumBatches (hashtable=0x12d0818) at nodeHash.c:1004
(More stack frames follow...)
(gdb) cont
Continuing.

Breakpoint 4, AllocSetAlloc (context=0x12a3c90, size=8) at aset.c:719
719             AllocSet        set = (AllocSet) context;
(gdb) bt 7
#0  AllocSetAlloc (context=0x12a3c90, size=8) at aset.c:719
#1  0x00000000009f2f5d in palloc0 (size=8) at mcxt.c:969
#2  0x000000000082aea2 in makeBufFileCommon (nfiles=1) at buffile.c:119
#3  0x000000000082af14 in makeBufFile (firstfile=34029) at buffile.c:138
#4  0x000000000082b09f in BufFileCreateTemp (interXact=false) at buffile.c:201
#5  0x00000000006bda31 in ExecHashJoinSaveTuple (tuple=0x86069b8, hashvalue=234960700, fileptr=0x8616a30) at nodeHashjoin.c:1220
#6  0x00000000006b80c0 in ExecHashIncreaseNumBatches (hashtable=0x12d0818) at nodeHash.c:1004
(More stack frames follow...)
(gdb) cont
Continuing.

Breakpoint 4, AllocSetAlloc (context=0x12a3c90, size=4) at aset.c:719
719             AllocSet        set = (AllocSet) context;
(gdb) bt 7
#0  AllocSetAlloc (context=0x12a3c90, size=4) at aset.c:719
#1  0x00000000009f2e47 in palloc (size=4) at mcxt.c:938
#2  0x000000000082af22 in makeBufFile (firstfile=34029) at buffile.c:140
#3  0x000000000082b09f in BufFileCreateTemp (interXact=false) at buffile.c:201
#4  0x00000000006bda31 in ExecHashJoinSaveTuple (tuple=0x86069b8, hashvalue=234960700, fileptr=0x8616a30) at nodeHashjoin.c:1220
#5  0x00000000006b80c0 in ExecHashIncreaseNumBatches (hashtable=0x12d0818) at nodeHash.c:1004
#6  0x00000000006b93b1 in ExecHashTableInsert (hashtable=0x12d0818, slot=0x12bcf98, hashvalue=234960700) at nodeHash.c:1655
(More stack frames follow...)
(gdb) cont
Continuing.

Breakpoint 4, AllocSetAlloc (context=0x12a3c90, size=375) at aset.c:719
719             AllocSet        set = (AllocSet) context;
(gdb) bt 7
#0  AllocSetAlloc (context=0x12a3c90, size=375) at aset.c:719
#1  0x00000000009f2e47 in palloc (size=375) at mcxt.c:938
#2  0x00000000006bdbec in ExecHashJoinGetSavedTuple (hjstate=0x12a4340, file=0x13ca418, hashvalue=0x7fff56a0da54, tupleSlot=0x12bcf98)   at nodeHashjoin.c:1277
#3  0x00000000006bd61f in ExecHashJoinNewBatch (hjstate=0x12a4340) at nodeHashjoin.c:1042
#4  0x00000000006bc999 in ExecHashJoinImpl (pstate=0x12a4340, parallel=false) at nodeHashjoin.c:539
#5  0x00000000006bca23 in ExecHashJoin (pstate=0x12a4340) at nodeHashjoin.c:565
#6  0x00000000006a191f in ExecProcNodeInstr (node=0x12a4340) at execProcnode.c:461
(More stack frames follow...)

Now I am adding a breakpoint in AllocSetFree and I see that this gets called right after AllocSetAlloc with the same context, and presumably the pointer previously allocated.

(gdb) finish
Run till exit from #0  AllocSetAlloc (context=0x12a3c90, size=375) at aset.c:719
0x00000000009f2e47 in palloc (size=375) at mcxt.c:938
938             ret = context->methods->alloc(context, size);
Value returned is $1 = (void *) 0x11bd858
(gdb) cont
Continuing.

Breakpoint 5, AllocSetFree (context=0x12a3c90, pointer=0x11bf748) at aset.c:992
992             AllocSet        set = (AllocSet) context;
(gdb) cont
Continuing.

Breakpoint 4, AllocSetAlloc (context=0x12a3c90, size=375) at aset.c:719
719             AllocSet        set = (AllocSet) context;
(gdb) finish
Run till exit from #0  AllocSetAlloc (context=0x12a3c90, size=375) at aset.c:719
0x00000000009f2e47 in palloc (size=375) at mcxt.c:938
938             ret = context->methods->alloc(context, size);
Value returned is $2 = (void *) 0x11bf748
(gdb) cont
Continuing.

Breakpoint 5, AllocSetFree (context=0x12a3c90, pointer=0x11bd858) at aset.c:992
992             AllocSet        set = (AllocSet) context;

See, now that pointer allocated before is being freed. I can already see how I would write a memory leak detection tool. I would keep a cache of a fixed size, say one page, of recently allocated pointers, and when free is called, it would remove the pointer from the cache. Then I would only log the allocated pointer once it has to be evicted from the cache because another one needs to be added, so I don't fill my log file with a bunch of memory allocations that get freed relatively quickly.

This is very confused. I just stepped over

Breakpoint 2, ExecHashIncreaseNumBatches (hashtable=0x12d0818) at nodeHash.c:904
904             printf("Hashjoin %p: increasing nbatch to %d because space = %zu\n",
(gdb) next
908             oldcxt = MemoryContextSwitchTo(hashtable->hashCxt);
(gdb) call MemoryContextStats(TopPortalContext)

and checked my log file and there was nothing before the call MemoryContextStats(TopPortalContext) so I don't understand where this printf stuff is ending up. Clearly *some* of it is in the log, but none of that "increasing nbatch" stuff is. I think there is something wrong with that HJDEBUG stuff. Oops, my calling that MemoryContextStats really seems to help force this buffer to be flushed, because now I got more:

Hashjoin 0x118e4c8: initial nbatch = 1, nbuckets = 1024
Hashjoin 0x118e4f8: initial nbatch = 1, nbuckets = 1024
Hashjoin 0x1195278: initial nbatch = 1, nbuckets = 65536
Hashjoin 0x119b918: initial nbatch = 16, nbuckets = 65536
Hashjoin 0x1195278: initial nbatch = 1, nbuckets = 65536
Hashjoin 0x119b918: initial nbatch = 16, nbuckets = 65536
...
Hashjoin 0x13a8e68: initial nbatch = 16, nbuckets = 8192
Hashjoin 0x13a8e68: increasing nbatch to 32 because space = 4128933
Hashjoin 0x13a8e68: freed 148 of 10584 tuples, space now 4071106
Hashjoin 0x13a8e68: increasing nbatch to 64 because space = 4128826
Hashjoin 0x13a8e68: freed 544 of 10584 tuples, space now 3916296
Hashjoin 0x13a8e68: increasing nbatch to 128 because space = 4128846
Hashjoin 0x13a8e68: freed 10419 of 10585 tuples, space now 65570
Hashjoin 0x13a8e68: increasing nbatch to 256 because space = 4128829
Hashjoin 0x13a8e68: freed 10308 of 10734 tuples, space now 161815
Hashjoin 0x13a8e68: increasing nbatch to 512 because space = 4128908
Hashjoin 0x13a8e68: freed 398 of 10379 tuples, space now 3977787
Hashjoin 0x13a8e68: increasing nbatch to 1024 because space = 4129008
Hashjoin 0x13a8e68: freed 296 of 10360 tuples, space now 4013423
Hashjoin 0x13a8e68: increasing nbatch to 2048 because space = 4129133
Hashjoin 0x13a8e68: freed 154 of 10354 tuples, space now 4068786
Hashjoin 0x13a8e68: increasing nbatch to 4096 because space = 4129035
Hashjoin 0x13a8e68: freed 10167 of 10351 tuples, space now 72849
Hashjoin 0x242c9b0: initial nbatch = 1, nbuckets = 1024
Hashjoin 0x2443ee0: initial nbatch = 1, nbuckets = 1024
Hashjoin 0x2443aa0: initial nbatch = 1, nbuckets = 1024
Hashjoin 0x2443440: initial nbatch = 1, nbuckets = 65536
Hashjoin 0x2443330: initial nbatch = 16, nbuckets = 65536
Hashjoin 0x13a8e68: increasing nbatch to 8192 because space = 4128997
Hashjoin 0x12d0818: freed 10555 of 10560 tuples, space now 1983
Hashjoin 0x12d0818: increasing nbatch to 16384 because space = 4128957
Hashjoin 0x12d0818: freed 10697 of 10764 tuples, space now 25956TopPortalContext: 8192 total in 1 blocks; 7664 free (0 chunks); 528 used
...

And you see here clearly that there is a problem with this printf stdout buffer.

Now finally I am at this stage of the breakpoint firing rapidly

(gdb) cont 100
Will ignore next 99 crossings of breakpoint 1.  Continuing.

Breakpoint 1, ExecHashIncreaseNumBatches (hashtable=0x12d0818) at nodeHash.c:884
884             int                     oldnbatch = hashtable->nbatch;
(gdb) bt 7
#0  ExecHashIncreaseNumBatches (hashtable=0x12d0818) at nodeHash.c:884
#1  0x00000000006b93b1 in ExecHashTableInsert (hashtable=0x12d0818, slot=0x12bcf98, hashvalue=2161368) at nodeHash.c:1655
#2  0x00000000006bd600 in ExecHashJoinNewBatch (hjstate=0x12a4340) at nodeHashjoin.c:1051
#3  0x00000000006bc999 in ExecHashJoinImpl (pstate=0x12a4340, parallel=false) at nodeHashjoin.c:539
#4  0x00000000006bca23 in ExecHashJoin (pstate=0x12a4340) at nodeHashjoin.c:565
#5  0x00000000006a191f in ExecProcNodeInstr (node=0x12a4340) at execProcnode.c:461
#6  0x00000000006ceaad in ExecProcNode (node=0x12a4340) at ../../../src/include/executor/executor.h:247
(More stack frames follow...)

so it's not a red herring after all. But nothing new in the logfile from that HJDEBUG. Trying to flush stdout with this call MemoryContextStats(TopPortalContext)

...   ExecutorState: 782712360 total in 92954 blocks; 3626888 free (3126 chunks); 779085472 used

and nothing heard from HJDEBUG. So you really can't rely on that HJDEBUG stuff. It's not working.

I will summarize that the memory problem with the rapid firing on ExecHashIncreaseNumBatches is still occurring, confirmed as I reported earlier, it wasn't some red herring but that the HJDEBUG stuff doesn't work. Also, that there might be benefit in creating a little resident memory leak detector that keeps track of a single page of AllocSetAlloc pointers cancelled out by matching AllocSetFree and reporting the overflow with a quick little stack trace.

On 4/20/2019 6:53, Justin Pryzby wrote:

The only problem is that I can't test that this actually would trigger the
memory problem, because I can't force the plan to use the right join, it
always reverts to the left join hashing the tmp_q:

I think the table on the "OUTER" side is the one which needs to be iterated
over (not hashed), in order to return each of its rows even if there are no
join partners in the other table.  In your original query, the small table was
being hashed and the large table iterated; maybe that's whats important.

May be so. Trying to wrap my head around the RIGHT vs. LEFT outer join and why there even is a difference though.

       ->  Hash Right Join  (cost=4255031.53..5530808.71 rows=34619 width=1197)             Hash Cond: (((q.documentinternalid)::text = (documentinformationsubject.documentinternalid)::text) AND ((r.targetinternalid)::text = (documentinformationsubject.actinternalid)::text))             ->  Hash Right Join  (cost=1341541.37..2612134.36 rows=13 width=341)                   Hash Cond: (((documentinformationsubject_2.documentinternalid)::text = (q.documentinternalid)::text) AND ((documentinformationsubject_2.actinternalid)::text = (q.actinternalid)::text))             ... from the TINY table tmp_q, the estimate of 13 rows ain't bad             ->  Hash  (cost=2908913.87..2908913.87 rows=34619 width=930)                  ->  Gather Merge  (cost=2892141.40..2908913.87 rows=34619 width=930)                  ....  from the HUGE tmp_r table, with 100 million rows (estimate of 34619 is grossly wrong, but how could it know? ...

Now in a right join, we include all rows from the right table, and only those from the left table that match the join key. I wonder why not transform all of those to left joins then?

Why are we not hashing only the "optional" side?

The plan here seems to tell me without a doubt that it is hashing the big table, 100 million rows get hashed into the hash join and then we iterate over the tiny table and then ... now my mind boggles and I just don't know why there are right joins at all.

But I see the plan is running the Hash index of 100 million rows.

which is of course much better, but when tmp_q and tmp_r are the results of
complex stuff that the planner can't estimate, then it gets it wrong, and
then the issue gets triggered because we are hashing on the big tmp_r, not
tmp_q.

I was able to get something maybe promising ?  "Batches: 65536 (originally 1)"

I didn't get "Out of memory" error yet, but did crash the server with this one:
postgres=# explain analyze WITH v AS( SELECT * FROM generate_series(1,99999999)i WHERE i%10<10 AND i%11<11 AND i%12<12 AND i%13<13 AND i%14<14 AND i%15<15 AND i%16<16 AND i%17<17 AND i%18<18 AND i%19<19 AND i%20<20 AND i%21<21 ) SELECT * FROM generate_series(1,99)k JOIN v ON k=i ;

Note, on pg12dev this needs to be "with v AS MATERIALIZED".

postgres=# SET work_mem='128kB';SET client_min_messages =log;SET log_statement_stats=on;explain(analyze,timing off) WITH v AS( SELECT * FROM generate_series(1,999999)i WHERE i%10<10 AND i%11<11 AND i%12<12 AND i%13<13 AND i%14<14 AND i%15<15 AND i%16<16 AND i%17<17 AND i%18<18 AND i%19<19 AND i%20<20 AND i%21<21 ) SELECT * FROM generate_series(1,99)k JOIN v ON k=i ;Hash Join  (cost=70.04..83.84 rows=5 width=8) (actual rows=99 loops=1)  Hash Cond: (k.k = v.i)  CTE v    ->  Function Scan on generate_series i  (cost=0.00..70.00 rows=1 width=4) (actual rows=999999 loops=1)          Filter: (((i % 10) < 10) AND ((i % 11) < 11) AND ((i % 12) < 12) AND ((i % 13) < 13) AND ((i % 14) < 14) AND ((i % 15) < 15) AND ((i % 16) < 16) AND ((i % 17) < 17) AND ((i % 18) < 18) AND ((i % 19) < 19) AND ((i % 20) < 20) AND ((i % 21) < 21))  ->  Function Scan on generate_series k  (cost=0.00..10.00 rows=1000 width=4) (actual rows=99 loops=1)  ->  Hash  (cost=0.02..0.02 rows=1 width=4) (actual rows=999999 loops=1)        Buckets: 4096 (originally 1024)  Batches: 512 (originally 1)  Memory Usage: 101kB        ->  CTE Scan on v  (cost=0.00..0.02 rows=1 width=4) (actual rows=999999 loops=1)

Yes I thought that with CTEs and functions one might be able to generate a test case, but still not seing how you can trick the planner into this peculiar jointype JOIN_RIGHT and whether that is requisite for triggering the problem.

Finally, I have tried to make a pstate pretty printer in explain.c:

void
DumpPlanState(PlanState *pstate) {       ExplainState *es = NewExplainState();       ExplainNode(pstate, NIL, NULL, NULL, es);       puts(es->str->data);       pfree(es->str->data);
}

but that didn't work, because unfortunately that ExplainNode function is destructive. It would be so nice to refactor this explain code such that there would be a completely conservative function that simply dumps the present pstate with all the information about its estimate and actual situation, how many iterations it has already accomplished, how many it estimates to still have to do, whether its original estimate panned out or not, etc. This would be so tremendously useful for runtime debugging of queries. I think the utility of this can hardly be overstated. I mean even for end-user applications of some data warehouse, where you could probe a long running query every 5 seconds as to where the execution is. Man, I could not think of any more low hanging fruit useful feature. I am sure that if PostgreSQL was originally written in Java, this feature would naturally exist already.

regards and Happy Easter,
-Gunther

pgsql-performance by date:

From: Justin Pryzby
Date: 20 April 2019, 22:30:09
Subject: Re: Out of Memory errors are frustrating as heck!

From: Tomas Vondra
Date: 20 April 2019, 23:01:34
Subject: Re: Out of Memory errors are frustrating as heck!

Re: Out of Memory errors are frustrating as heck! - Mailing list pgsql-performance

Previous

Next