Re: Out of Memory errors are frustrating as heck! - Mailing list pgsql-performance
From | Gunther |
---|---|
Subject | Re: Out of Memory errors are frustrating as heck! |
Date | |
Msg-id | 867396ca-9df7-c873-b095-5df85eaafb71@gusw.net Whole thread Raw |
In response to | Re: Out of Memory errors are frustrating as heck! (Gunther <raj@gusw.net>) |
List | pgsql-performance |
It is confirmed, these two call paths are the only ones. At least probably the only ones to occur with enough of a frequency.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 28576 postgres 20 0 2695304 1.0g 200764 R 11.3 13.8 4:20.13 postgres: postgres integrator [local] EXPLAIN 28580 postgres 20 0 646616 432784 36968 S 98.7 5.5 8:53.28 gdb -p 28576
there is a problem with gdb, it also has a memoy leak and is very expensive with the checking of my conditional breakpoint. So I can't run it all the way through.
Also here captured with
(gdb) call MemoryContextStats(TopPortalContext)
TopPortalContext: 8192 total in 1 blocks; 7664 free (0 chunks); 528 used PortalHoldContext: 24632 total in 2 blocks; 7392 free (0 chunks); 17240 used PortalContext: 1482752 total in 184 blocks; 11216 free (8 chunks); 1471536 used: ExecutorState: 1369337168 total in 163397 blocks; 248840 free (36 chunks); 1369088328 used HashTableContext: 32768 total in 3 blocks; 17304 free (10 chunks); 15464 used HashBatchContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used HashTableContext: 8192 total in 1 blocks; 7320 free (0 chunks); 872 used HashBatchContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used HashTableContext: 8192 total in 1 blocks; 7320 free (0 chunks); 872 used HashBatchContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used HashTableContext: 8192 total in 1 blocks; 7752 free (0 chunks); 440 used HashBatchContext: 90288 total in 4 blocks; 16072 free (6 chunks); 74216 used HashTableContext: 8192 total in 1 blocks; 7624 free (0 chunks); 568 used HashBatchContext: 90288 total in 4 blocks; 16072 free (6 chunks); 74216 used TupleSort main: 32824 total in 2 blocks; 144 free (0 chunks); 32680 used Caller tuples: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used HashTableContext: 8454256 total in 6 blocks; 64848 free (32 chunks); 8389408 used HashBatchContext: 106640 total in 3 blocks; 7936 free (0 chunks); 98704 used TupleSort main: 452880 total in 8 blocks; 126248 free (27 chunks); 326632 used Caller tuples: 4194304 total in 10 blocks; 1496136 free (20 chunks); 2698168 used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used ... ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used Grand total: 1384601904 bytes in 163660 blocks; 2303840 free (145 chunks); 1382298064 used
there is the developing memory leak. Now let's see if we can trace individual increments ...
(gdb) info break Num Type Disp Enb Address What 1 breakpoint keep y 0x0000000000849030 in AllocSetAlloc at aset.c:718 stop only if (int)strcmp(context->name, "ExecutorState") == 0 && *(int *)$rsp != 0x84e7dd && 0x84e8ad != *(int *)$rsp breakpoint already hit 4 times (gdb) delete 1 (gdb) break AllocSetAlloc if (int)strcmp(context->name, "ExecutorState") == 0 && *(int *)$rsp != 0x84e7dd Breakpoint 2 at 0x849030: file aset.c, line 718. (gdb) cont Continuing. ^CError in testing breakpoint condition: Quit Breakpoint 2, AllocSetAlloc (context=0x2a1d190, size=381) at aset.c:718 718 { (gdb) bt 4 #0 AllocSetAlloc (context=0x2a1d190, size=381) at aset.c:718 #1 0x000000000084e7dd in palloc (size=381) at mcxt.c:938 #2 0x00000000006101bc in ExecHashJoinGetSavedTuple (file=file@entry=0x4b4a198, hashvalue=hashvalue@entry=0x7ffcbf92fe5c, tupleSlot=0x2ae0ab8, hjstate=0x2a1d920) at nodeHashjoin.c:1277 #3 0x0000000000610ca3 in ExecHashJoinNewBatch (hjstate=0x2a1d920) at nodeHashjoin.c:1042 (More stack frames follow...) (gdb) call MemoryContextStats(TopPortalContext)
doesn't show an increase of ExecutorState total:
TopPortalContext: 8192 total in 1 blocks; 7664 free (0 chunks); 528 used PortalHoldContext: 24632 total in 2 blocks; 7392 free (0 chunks); 17240 used PortalContext: 1482752 total in 184 blocks; 11216 free (8 chunks); 1471536 used: ExecutorState: 1369337168 total in 163397 blocks; 248840 free (36 chunks); 1369088328 used
exact same as before:
ExecutorState: 1369337168 total in 163397 blocks; 248840 free (36 chunks); 1369088328 used
but now we get an increase to:
ExecutorState: 1369345496 total in 163398 blocks; 248840 free (36 chunks); 1369096656 used
after I did this:
(gdb) cont Continuing. Breakpoint 2, AllocSetAlloc (context=0x2a1d190, size=8) at aset.c:718 718 { (gdb) bt 4 #0 AllocSetAlloc (context=0x2a1d190, size=8) at aset.c:718 #1 0x000000000084e8ad in palloc0 (size=size@entry=8) at mcxt.c:969 #2 0x0000000000702b63 in makeBufFileCommon (nfiles=nfiles@entry=1) at buffile.c:119 #3 0x0000000000702e4c in makeBufFile (firstfile=163423) at buffile.c:138 (More stack frames follow...) (gdb) call MemoryContextStats(TopPortalContext)
So now we have it confirmed don't we? No! No we have not! We stop at the entrance of the allocate method. So when I interrupted, there was no call yet. Then at the next stop the increase was from the previous.
Continuing ... this now is from a stop at the makeBufFileCommon
ExecutorState: 1369345496 total in 163398 blocks; 248816 free (36 chunks); 1369096680 used
And again, now stopped before
ExecutorState: 1369345496 total in 163398 blocks; 248792 free (36 chunks); 1369096704 used ExecutorState: 1369345496 total in 163398 blocks; 248792 free (36 chunks); 1369096704 usedI don't see a growth between individual invocations. Anyway, these are the two ways to get there:
(gdb) bt 4 #0 AllocSetAlloc (context=0x2a1d190, size=4) at aset.c:718 #1 0x000000000084e7dd in palloc (size=size@entry=4) at mcxt.c:938 #2 0x0000000000702e59 in makeBufFile (firstfile=163423) at buffile.c:140 #3 BufFileCreateTemp (interXact=interXact@entry=false) at buffile.c:201 (More stack frames follow...) (gdb) cont Continuing. Breakpoint 3, AllocSetAlloc (context=0x2a1d190, size=394) at aset.c:718 718 { (gdb) bt 3 #0 AllocSetAlloc (context=0x2a1d190, size=394) at aset.c:718 #1 0x000000000084e7dd in palloc (size=394) at mcxt.c:938 #2 0x00000000006101bc in ExecHashJoinGetSavedTuple (file=file@entry=0x4b4a198, hashvalue=hashvalue@entry=0x7ffcbf92fe5c, tupleSlot=0x2ae0ab8, hjstate=0x2a1d920) at nodeHashjoin.c:1277 (More stack frames follow...)
But now it increased
ExecutorState: 1369353824 total in 163399 blocks; 248792 free (36 chunks); 1369105032 usedIt increases every 3 times I stop at the breakpoint.
pgsql-performance by date: