Re: Out of Memory errors are frustrating as heck! - Mailing list pgsql-performance

From Gunther
Subject Re: Out of Memory errors are frustrating as heck!
Date
Msg-id 867396ca-9df7-c873-b095-5df85eaafb71@gusw.net
Whole thread Raw
In response to Re: Out of Memory errors are frustrating as heck!  (Gunther <raj@gusw.net>)
List pgsql-performance

It is confirmed, these two call paths are the only ones. At least probably the only ones to occur with enough of a frequency.

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
28576 postgres  20   0 2695304   1.0g 200764 R  11.3 13.8   4:20.13 postgres: postgres integrator [local] EXPLAIN
28580 postgres  20   0  646616 432784  36968 S  98.7  5.5   8:53.28 gdb -p 28576

there is a problem with gdb, it also has a memoy leak and is very expensive with the checking of my conditional breakpoint. So I can't run it all the way through.

Also here captured with

(gdb) call MemoryContextStats(TopPortalContext)
TopPortalContext: 8192 total in 1 blocks; 7664 free (0 chunks); 528 used PortalHoldContext: 24632 total in 2 blocks; 7392 free (0 chunks); 17240 used PortalContext: 1482752 total in 184 blocks; 11216 free (8 chunks); 1471536 used:   ExecutorState: 1369337168 total in 163397 blocks; 248840 free (36 chunks); 1369088328 used     HashTableContext: 32768 total in 3 blocks; 17304 free (10 chunks); 15464 used       HashBatchContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used     HashTableContext: 8192 total in 1 blocks; 7320 free (0 chunks); 872 used       HashBatchContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used     HashTableContext: 8192 total in 1 blocks; 7320 free (0 chunks); 872 used       HashBatchContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used     HashTableContext: 8192 total in 1 blocks; 7752 free (0 chunks); 440 used       HashBatchContext: 90288 total in 4 blocks; 16072 free (6 chunks); 74216 used     HashTableContext: 8192 total in 1 blocks; 7624 free (0 chunks); 568 used       HashBatchContext: 90288 total in 4 blocks; 16072 free (6 chunks); 74216 used     TupleSort main: 32824 total in 2 blocks; 144 free (0 chunks); 32680 used       Caller tuples: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used     ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used     HashTableContext: 8454256 total in 6 blocks; 64848 free (32 chunks); 8389408 used       HashBatchContext: 106640 total in 3 blocks; 7936 free (0 chunks); 98704 used     TupleSort main: 452880 total in 8 blocks; 126248 free (27 chunks); 326632 used       Caller tuples: 4194304 total in 10 blocks; 1496136 free (20 chunks); 2698168 used      ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used     ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used     ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used     ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used     ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used     ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used     ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used     ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used     ...     ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used     ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used     ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used     ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used     ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
Grand total: 1384601904 bytes in 163660 blocks; 2303840 free (145 chunks); 1382298064 used
there is the developing memory leak. Now let's see if we can trace individual increments ...
(gdb) info break
Num     Type           Disp Enb Address            What
1       breakpoint     keep y   0x0000000000849030 in AllocSetAlloc at aset.c:718       stop only if  (int)strcmp(context->name, "ExecutorState") == 0 && *(int *)$rsp != 0x84e7dd && 0x84e8ad != *(int *)$rsp       breakpoint already hit 4 times
(gdb) delete 1
(gdb) break AllocSetAlloc if (int)strcmp(context->name, "ExecutorState") == 0 && *(int *)$rsp != 0x84e7dd
Breakpoint 2 at 0x849030: file aset.c, line 718.
(gdb) cont
Continuing.
^CError in testing breakpoint condition:
Quit

Breakpoint 2, AllocSetAlloc (context=0x2a1d190, size=381) at aset.c:718
718     {
(gdb) bt 4
#0  AllocSetAlloc (context=0x2a1d190, size=381) at aset.c:718
#1  0x000000000084e7dd in palloc (size=381) at mcxt.c:938
#2  0x00000000006101bc in ExecHashJoinGetSavedTuple (file=file@entry=0x4b4a198, hashvalue=hashvalue@entry=0x7ffcbf92fe5c,   tupleSlot=0x2ae0ab8, hjstate=0x2a1d920) at nodeHashjoin.c:1277
#3  0x0000000000610ca3 in ExecHashJoinNewBatch (hjstate=0x2a1d920) at nodeHashjoin.c:1042
(More stack frames follow...)
(gdb)  call MemoryContextStats(TopPortalContext)
doesn't show an increase of ExecutorState total:
TopPortalContext: 8192 total in 1 blocks; 7664 free (0 chunks); 528 used PortalHoldContext: 24632 total in 2 blocks; 7392 free (0 chunks); 17240 used PortalContext: 1482752 total in 184 blocks; 11216 free (8 chunks); 1471536 used:   ExecutorState: 1369337168 total in 163397 blocks; 248840 free (36 chunks); 1369088328 used

exact same as before:

    ExecutorState: 1369337168 total in 163397 blocks; 248840 free (36 chunks); 1369088328 used

but now we get an increase to:

    ExecutorState: 1369345496 total in 163398 blocks; 248840 free (36 chunks); 1369096656 used

after I did this:
(gdb) cont
Continuing.

Breakpoint 2, AllocSetAlloc (context=0x2a1d190, size=8) at aset.c:718
718     {
(gdb) bt 4
#0  AllocSetAlloc (context=0x2a1d190, size=8) at aset.c:718
#1  0x000000000084e8ad in palloc0 (size=size@entry=8) at mcxt.c:969
#2  0x0000000000702b63 in makeBufFileCommon (nfiles=nfiles@entry=1) at buffile.c:119
#3  0x0000000000702e4c in makeBufFile (firstfile=163423) at buffile.c:138
(More stack frames follow...)
(gdb)  call MemoryContextStats(TopPortalContext)
So now we have it confirmed don't we? No! No we have not! We stop at the entrance of the allocate method. So when I interrupted, there was no call yet. Then at the next stop the increase was from the previous.
Continuing ... this now is from a stop at the makeBufFileCommon
    ExecutorState: 1369345496 total in 163398 blocks; 248816 free (36 chunks); 1369096680 used
And again, now stopped before
    ExecutorState: 1369345496 total in 163398 blocks; 248792 free (36 chunks); 1369096704 used
   ExecutorState: 1369345496 total in 163398 blocks; 248792 free (36 chunks); 1369096704 used
I don't see a growth between individual invocations. Anyway, these are the two ways to get there:
(gdb) bt 4
#0  AllocSetAlloc (context=0x2a1d190, size=4) at aset.c:718
#1  0x000000000084e7dd in palloc (size=size@entry=4) at mcxt.c:938
#2  0x0000000000702e59 in makeBufFile (firstfile=163423) at buffile.c:140
#3  BufFileCreateTemp (interXact=interXact@entry=false) at buffile.c:201
(More stack frames follow...)
(gdb) cont
Continuing.

Breakpoint 3, AllocSetAlloc (context=0x2a1d190, size=394) at aset.c:718
718     {
(gdb) bt 3
#0  AllocSetAlloc (context=0x2a1d190, size=394) at aset.c:718
#1  0x000000000084e7dd in palloc (size=394) at mcxt.c:938
#2  0x00000000006101bc in ExecHashJoinGetSavedTuple (file=file@entry=0x4b4a198, hashvalue=hashvalue@entry=0x7ffcbf92fe5c,   tupleSlot=0x2ae0ab8, hjstate=0x2a1d920) at nodeHashjoin.c:1277
(More stack frames follow...)

But now it increased
    ExecutorState: 1369353824 total in 163399 blocks; 248792 free (36 chunks); 1369105032 used
It increases every 3 times I stop at the breakpoint.

-Gunther

pgsql-performance by date:

Previous
From: Gunther
Date:
Subject: Re: Out of Memory errors are frustrating as heck!
Next
From: Tom Lane
Date:
Subject: Re: Out of Memory errors are frustrating as heck!