Thread: (full) Memory context dump considered harmful

(full) Memory context dump considered harmful

From

Stefan Kaltenbrunner

Date:

20 August 2015, 15:51:11

Hi all!


We just had a case of a very long running process of ours that creates 
does a lot of prepared statements through Perls DBD:Pg running into:

https://rt.cpan.org/Public/Bug/Display.html?id=88827

This resulted in millions of prepared statements created, but not 
removed in the affected backends over the course of 1-2 hours until the 
backends in question ran out of memory.
The "out of memory" condition resulted in one memory context dump 
generated per occurance each consisting of >2M lines(!) (basically a 
line of CachedPlan/CachePlanSource per statement/function).
In the 20 Minutes or it took monitoring to alert and operations to react 
this cause a followup incident because repeated "out of memory" 
conditions caused over 400M(!!) loglines amounting to some 15GB of data 
running the log partition dangerously close to full.

an example memory context dump looks like this:


TopMemoryContext: 582728880 total in 71126 blocks; 6168 free (52 
chunks); 582722712 used  TopTransactionContext: 8192 total in 1 blocks; 6096 free (1 chunks); 
2096 used    ExecutorState: 8192 total in 1 blocks; 5392 free (0 chunks); 2800 used      ExprContext: 8192 total in 1
blocks;8160 free (0 chunks); 32 used    SPI Exec: 0 total in 0 blocks; 0 free (0 chunks); 0 used    SPI Proc: 8192
totalin 1 blocks; 5416 free (0 chunks); 2776 used  PL/pgSQL function context: 8192 total in 1 blocks; 1152 free (1 
 
chunks); 7040 used  PL/pgSQL function context: 24576 total in 2 blocks; 11400 free (1 
chunks); 13176 used  Type information cache: 24576 total in 2 blocks; 11888 free (5 
chunks); 12688 used  PL/pgSQL function context: 8192 total in 1 blocks; 1120 free (1 
chunks); 7072 used  PL/pgSQL function context: 24576 total in 2 blocks; 10984 free (1 
chunks); 13592 used  PL/pgSQL function context: 57344 total in 3 blocks; 29928 free (2 
chunks); 27416 used  PL/pgSQL function context: 57344 total in 3 blocks; 28808 free (2 
chunks); 28536 used  PL/pgSQL function context: 24576 total in 2 blocks; 5944 free (3 
chunks); 18632 used  RI compare cache: 24576 total in 2 blocks; 15984 free (5 chunks); 
8592 used  RI query cache: 24576 total in 2 blocks; 11888 free (5 chunks); 12688 
used  PL/pgSQL function context: 57344 total in 3 blocks; 31832 free (2 
chunks); 25512 used  PL/pgSQL function context: 57344 total in 3 blocks; 29600 free (2 
chunks); 27744 used  PL/pgSQL function context: 57344 total in 3 blocks; 39688 free (5 
chunks); 17656 used  CFuncHash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used  Rendezvous variable hash: 8192
totalin 1 blocks; 1680 free (0 
 
chunks); 6512 used  PLpgSQL function cache: 24520 total in 2 blocks; 3744 free (0 
chunks); 20776 used  Prepared Queries: 125886512 total in 25 blocks; 4764208 free (91 
chunks); 121122304 used  TableSpace cache: 8192 total in 1 blocks; 3216 free (0 chunks); 4976 used  Operator lookup
cache:24576 total in 2 blocks; 11888 free (5 
 
chunks); 12688 used  MessageContext: 8192 total in 1 blocks; 6976 free (0 chunks); 1216 used  Operator class cache:
8192total in 1 blocks; 1680 free (0 chunks); 
 
6512 used  smgr relation table: 24576 total in 2 blocks; 5696 free (4 chunks); 
18880 used  TransactionAbortContext: 32768 total in 1 blocks; 32736 free (0 
chunks); 32 used  Portal hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used  PortalMemory: 8192 total in 1
blocks;7888 free (0 chunks); 304 used    PortalHeapMemory: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
ExecutorState:57344 total in 3 blocks; 21856 free (2 chunks); 
 
35488 used        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used        ExprContext: 8192 total in 1
blocks;8160 free (1 chunks); 32 used  Relcache by OID: 24576 total in 2 blocks; 12832 free (3 chunks); 
 
11744 used  CacheMemoryContext: 42236592 total in 28 blocks; 7160904 free (298 
chunks); 35075688 used    CachedPlan: 7168 total in 3 blocks; 1544 free (0 chunks); 5624 used    CachedPlanSource: 7168
totalin 3 blocks; 3904 free (1 chunks); 
 
3264 used    CachedPlan: 15360 total in 4 blocks; 6440 free (1 chunks); 8920 used    CachedPlanSource: 7168 total in 3
blocks;1352 free (0 chunks); 
 
5816 used    CachedPlan: 15360 total in 4 blocks; 6440 free (1 chunks); 8920 used    CachedPlanSource: 7168 total in 3
blocks;1352 free (0 chunks); 
 
5816 used    CachedPlan: 7168 total in 3 blocks; 1544 free (0 chunks); 5624 used    CachedPlanSource: 7168 total in 3
blocks;3904 free (1 chunks); 
 
3264 used    CachedPlan: 15360 total in 4 blocks; 6440 free (1 chunks); 8920 used    CachedPlanSource: 7168 total in 3
blocks;1352 free (0 chunks); 
 
5816 used    CachedPlan: 15360 total in 4 blocks; 6440 free (1 chunks); 8920 used    CachedPlanSource: 7168 total in 3
blocks;1352 free (0 chunks); 
 
5816 used    CachedPlan: 7168 total in 3 blocks; 1544 free (0 chunks); 5624 used    CachedPlanSource: 7168 total in 3
blocks;3904 free (1 chunks); 
 
3264 used    CachedPlan: 15360 total in 4 blocks; 6440 free (1 chunks); 8920 used    CachedPlanSource: 7168 total in 3
blocks;1352 free (0 chunks); 
 
5816 used    CachedPlan: 15360 total in 4 blocks; 6440 free (1 chunks); 8920 used    CachedPlanSource: 7168 total in 3
blocks;1352 free (0 chunks); 
 
5816 used    CachedPlan: 7168 total in 3 blocks; 1544 free (0 chunks); 5624 used    CachedPlanSource: 7168 total in 3
blocks;3904 free (1 chunks); 
 
3264 used    CachedPlan: 15360 total in 4 blocks; 6440 free (1 chunks); 8920 used    CachedPlanSource: 7168 total in 3
blocks;1352 free (0 chunks); 
 
5816 used    CachedPlan: 7168 total in 3 blocks; 2248 free (0 chunks); 4920 used    CachedPlanSource: 3072 total in 2
blocks;64 free (0 chunks); 3008 used    CachedPlan: 15360 total in 4 blocks; 7576 free (0 chunks); 7784 used
CachedPlanSource:7168 total in 3 blocks; 1888 free (0 chunks); 
 
5280 used    CachedPlan: 15360 total in 4 blocks; 7576 free (0 chunks); 7784 used    CachedPlanSource: 7168 total in 3
blocks;1888 free (0 chunks); 
 
5280 used    CachedPlan: 7168 total in 3 blocks; 2248 free (0 chunks); 4920 used    CachedPlanSource: 3072 total in 2
blocks;64 free (0 chunks); 3008 used    CachedPlan: 15360 total in 4 blocks; 7576 free (0 chunks); 7784 used
 

... goes of for some 2M lines...

given that an "out of memory" situation is already bad enough, but 
essentially mounting a DoS on the rest of the system due to creating 
excessive amounts of logs feels even worse.

I wonder if we should have a default of capping the dump to say 1k lines 
or such and only optionally do a full one.
the other option would be to find an alternative way of displaying the 
information like if there is a large number of repeated entries just 
show the first X of them and say "123456 similiar ones".

This is 9.1.14 on Debian Wheezy/amd64 fwiw - but I dont think we have 
made relevant changes in more recent versions.



regards


Stefan

Re: (full) Memory context dump considered harmful

From

"Joshua D. Drake"

Date:

20 August 2015, 15:57:21

On 08/20/2015 08:51 AM, Stefan Kaltenbrunner wrote:

> This is 9.1.14 on Debian Wheezy/amd64 fwiw - but I dont think we have
> made relevant changes in more recent versions.

It seems we may also want to consider a way to drop those prepared 
queries after a period time of non use.

JD

>
>
>
> regards
>
>
> Stefan
>
>


-- 
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing "I'm offended" is basically telling the world you can't
control your own emotions, so everyone else should do it for you.

Re: (full) Memory context dump considered harmful

From

Tom Lane

Date:

20 August 2015, 16:09:59

Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
> I wonder if we should have a default of capping the dump to say 1k lines 
> or such and only optionally do a full one.

-1.  It's worked like this for the last fifteen years or thereabouts,
and you're the first one to complain.  I suspect some weirdness in
your logging setup, rather than any systemic problem that we
need to lobotomize our diagnostic output in order to prevent.

(The reason I say "lobotomize" is that there's no particularly good
reason to assume that the first N lines will tell you what you need
to know.  And the filter rule would have to be *very* stupid, because
we can't risk trying to allocate any additional memory to track what
we're doing here.)
        regards, tom lane

Re: (full) Memory context dump considered harmful

From

Stefan Kaltenbrunner

Date:

20 August 2015, 21:04:58

On 08/20/2015 06:09 PM, Tom Lane wrote:
> Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
>> I wonder if we should have a default of capping the dump to say 1k lines 
>> or such and only optionally do a full one.
> 
> -1.  It's worked like this for the last fifteen years or thereabouts,
> and you're the first one to complain.  I suspect some weirdness in
> your logging setup, rather than any systemic problem that we
> need to lobotomize our diagnostic output in order to prevent.

not sure what you consider weird in the logging setup here - the context
dump is imho borderline on internal diagnostic output at a debug level
(rather than making sense to an average sysadmin) already (and no way to
control it). But having (like in our case) the backend dumping 2 million
basically identical lines into a general logfile per event seems
excessive and rather abusive towards the rest of the system (just from
an IO perspective for example or from a log file post processing tool
perspective)

> 
> (The reason I say "lobotomize" is that there's no particularly good
> reason to assume that the first N lines will tell you what you need
> to know.  And the filter rule would have to be *very* stupid, because
> we can't risk trying to allocate any additional memory to track what
> we're doing here.)

I do understand that there migt be challenges there but in the last 15
years machines got way faster and pg got way capable and some of those
capabilities might need to get revisited in that regards - and while it
is very nice that pg survives multiple oom cases pretty nicely I dont
think it is entitled to put an imho unreasonable burden on the rest of
the system by writing "insane" amounts of data.

Just from a sysadmin perspective it also means that it is trivial for a
misbehaving app to fill up the logfile on a system because unlike almost
all other actual other logging output there seems to be no way to
control/disabled it on a per role/database level.

Stefan

Re: (full) Memory context dump considered harmful

From

Tomas Vondra

Date:

20 August 2015, 21:33:56

Hi,

On 08/20/2015 11:04 PM, Stefan Kaltenbrunner wrote:
> On 08/20/2015 06:09 PM, Tom Lane wrote:
>> Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
>>> I wonder if we should have a default of capping the dump to say 1k lines
>>> or such and only optionally do a full one.
>>
>> -1. It's worked like this for the last fifteen years or
>> thereabouts, and you're the first one to complain. I suspect some
>> weirdness in your logging setup, rather than any systemic problem
>> that we need to lobotomize our diagnostic output in order to
>> prevent.
>
> not sure what you consider weird in the logging setup here - the
> context dump is imho borderline on internal diagnostic output at a
> debug level (rather than making sense to an average sysadmin) already
> (and no way to control it). But having (like in our case) the backend
> dumping 2 million basically identical lines into a general logfile
> per event seems excessive and rather abusive towards the rest of the
> system (just from an IO perspective for example or from a log file
> post processing tool perspective)

I've seen similar issues too, on systems with perfectly common logging 
setups. So even if your logging setup would be strange, it's not the 
main factor here.

>> (The reason I say "lobotomize" is that there's no particularly
>> good reason to assume that the first N lines will tell you what you
>> need to know. And the filter rule would have to be *very* stupid,
>> because we can't risk trying to allocate any additional memory to
>> track what we're doing here.)
>
> I do understand that there migt be challenges there but in the last
> 15 years machines got way faster and pg got way capable and some of
> those capabilities might need to get revisited in that regards - and
> while it is very nice that pg survives multiple oom cases pretty
> nicely I dont think it is entitled to put an imho unreasonable burden
> on the rest of the system by writing "insane" amounts of data.
>
> Just from a sysadmin perspective it also means that it is trivial for
> a misbehaving app to fill up the logfile on a system because unlike
> almost all other actual other logging output there seems to be no way
> to control/disabled it on a per role/database level.

IMHO the first thing we might do is provide log_memory_stats GUC 
controlling that. I'm not a big fan of adding GUC for everything, but in 
this case it seems appropriate, just like the other log_ options.

I also don't think logging just subset of the stats is a lost case. 
Sure, we can't know which of the lines are important, but for example 
logging just the top-level contexts with a summary of the child contexts 
would be OK. So something like this:

TopMemoryContext: 582728880 total in 71126 blocks; 6168 free  ...  TopTransactionContext: 8192 total in 1 blocks; 6096
free(1 ...    3 child contexts, 10 blocks, 34873 free  PL/pgSQL function context: 8192 total in 1 blocks; 1152 fre ...
PL/pgSQLfunction context: 24576 total in 2 blocks; 11400 f ...  Type information cache: 24576 total in 2 blocks; 11888
free...  PL/pgSQL function context: 8192 total in 1 blocks; 1120 fre ...  PL/pgSQL function context: 24576 total in 2
blocks;10984 f ...  PL/pgSQL function context: 57344 total in 3 blocks; 29928 f ...  PL/pgSQL function context: 57344
totalin 3 blocks; 28808 f ...  PL/pgSQL function context: 24576 total in 2 blocks; 5944 fr ...  RI compare cache: 24576
totalin 2 blocks; 15984 free (5 ch ...  RI query cache: 24576 total in 2 blocks; 11888 free (5 chun ...  PL/pgSQL
functioncontext: 57344 total in 3 blocks; 31832 f ...  PL/pgSQL function context: 57344 total in 3 blocks; 29600 f ...
PL/pgSQLfunction context: 57344 total in 3 blocks; 39688 f ...  CFuncHash: 8192 total in 1 blocks; 1680 free (0
chunks);65 ...  Rendezvous variable hash: 8192 total in 1 blocks; 1680 free ...  PLpgSQL function cache: 24520 total in
2blocks; 3744 free  ...  Prepared Queries: 125886512 total in 25 blocks; 4764208 fre ...  TableSpace cache: 8192 total
in1 blocks; 3216 free (0 chun ...  Operator lookup cache: 24576 total in 2 blocks; 11888 free  ...  MessageContext:
8192total in 1 blocks; 6976 free (0 chunks ...  Operator class cache: 8192 total in 1 blocks; 1680 free (0  ...  smgr
relationtable: 24576 total in 2 blocks; 5696 free (4  ...  TransactionAbortContext: 32768 total in 1 blocks; 32736 fre
... Portal hash: 8192 total in 1 blocks; 1680 free (0 chunks);  ...  PortalMemory: 8192 total in 1 blocks; 7888 free (0
chunks);...    4 child contexts, 34 blocks, 87283 free  Relcache by OID: 24576 total in 2 blocks; 12832 free (3 chu ...
CacheMemoryContext: 42236592 total in 28 blocks; 7160904 fr ...    1487438 child contexts, 14874382 blocks, 3498349
free

instead of the millions of lines with all the datails.

So either
    log_memory_stats = (on|off)

or
    log_memory_stats = (full|short|off)

might work.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: (full) Memory context dump considered harmful

From

Tom Lane

Date:

21 August 2015, 18:37:50

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> On 08/20/2015 11:04 PM, Stefan Kaltenbrunner wrote:
>> On 08/20/2015 06:09 PM, Tom Lane wrote:
>>> (The reason I say "lobotomize" is that there's no particularly
>>> good reason to assume that the first N lines will tell you what you
>>> need to know. And the filter rule would have to be *very* stupid,
>>> because we can't risk trying to allocate any additional memory to
>>> track what we're doing here.)

> IMHO the first thing we might do is provide log_memory_stats GUC 
> controlling that. I'm not a big fan of adding GUC for everything, but in 
> this case it seems appropriate, just like the other log_ options.

I don't think this is advisable.  You would have to either keep it turned
off and risk not being able to debug OOM situations, or keep it turned on
and risk log-volume problems; neither is very satisfactory.

> I also don't think logging just subset of the stats is a lost case. 
> Sure, we can't know which of the lines are important, but for example 
> logging just the top-level contexts with a summary of the child contexts 
> would be OK.

I thought a bit more about this.  Generally, what you want to know about
a given situation is which contexts have a whole lot of allocations
and/or a whole lot of child contexts.  What you suggest above won't work
very well if the problem is buried more than about two levels down in
the context tree.  But suppose we add a parameter to memory context stats
collection that is the maximum number of child contexts to print *per
parent context*.  If there are more than that, summarize the rest as per
your suggestion.  So any given recursion level might look like
    FooContext: m total in n blocks ...      ChildContext1: m total in n blocks ...        possible grandchildren...
 ChildContext2: m total in n blocks ...        possible grandchildren...      ChildContext3: m total in n blocks ...
   possible grandchildren...      k more child contexts containing m total in n blocks ...

This would require a fixed amount of extra state per recursion level,
so it could be done with a few more parameters/local variables in
MemoryContextStats and no need to risk a malloc().

The case where you would lose important data is where the serious bloat
is in some specific child context that is after the first N children of
its direct parent.  I don't believe I've ever seen a case where that was
critical information as long as N isn't too tiny.

I think we could hard-wire N at 100 or something like that and pretty
much fix Stefan's complaint, while losing little if any detail in typical
cases.  Manual invocation of MemoryContextStats could pass larger values
if you really needed it during debugging.

Thoughts?
        regards, tom lane

Re: (full) Memory context dump considered harmful

From

Tom Lane

Date:

21 August 2015, 22:01:02

I wrote:
> I thought a bit more about this.  Generally, what you want to know about
> a given situation is which contexts have a whole lot of allocations
> and/or a whole lot of child contexts.  What you suggest above won't work
> very well if the problem is buried more than about two levels down in
> the context tree.  But suppose we add a parameter to memory context stats
> collection that is the maximum number of child contexts to print *per
> parent context*.  If there are more than that, summarize the rest as per
> your suggestion.

Here's a draft patch along this line.  After playing with it a bit,
I think that a wired-in limit of 100 children per level would be fine.
We could imagine adding a GUC for that, but I think it's overkill.

A couple of notes:

Since the summarization mechanism requires passing totals back up anyway,
I took the opportunity to add a "grand total" line to the end of the
printout.  I'm half tempted to modify that to convert the numbers in the
grand total line to kilobytes or even MB, but it could be argued either way.

This assumes that the total memory allocation couldn't exceed the range of
size_t, a thing which is not promised by the C standard --- but AFAIK it
would be true on any modern machine (and really we were already making
such an assumption in AllocSetStats).  Also, if we ever add a memory
context type that works fundamentally differently from AllocSet, we might
need to rethink what we compute/print as summary stats.  I figure we can
cross that bridge when we come to it.

Comments?

            regards, tom lane
diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index febeb6e..e86c7cb 100644
*** a/src/backend/utils/mmgr/aset.c
--- b/src/backend/utils/mmgr/aset.c
*************** static void AllocSetReset(MemoryContext
*** 253,259 ****
  static void AllocSetDelete(MemoryContext context);
  static Size AllocSetGetChunkSpace(MemoryContext context, void *pointer);
  static bool AllocSetIsEmpty(MemoryContext context);
! static void AllocSetStats(MemoryContext context, int level);

  #ifdef MEMORY_CONTEXT_CHECKING
  static void AllocSetCheck(MemoryContext context);
--- 253,260 ----
  static void AllocSetDelete(MemoryContext context);
  static Size AllocSetGetChunkSpace(MemoryContext context, void *pointer);
  static bool AllocSetIsEmpty(MemoryContext context);
! static void AllocSetStats(MemoryContext context, int level, bool print,
!               MemoryContextCounters *totals);

  #ifdef MEMORY_CONTEXT_CHECKING
  static void AllocSetCheck(MemoryContext context);
*************** AllocSetIsEmpty(MemoryContext context)
*** 1228,1237 ****

  /*
   * AllocSetStats
!  *        Displays stats about memory consumption of an allocset.
   */
  static void
! AllocSetStats(MemoryContext context, int level)
  {
      AllocSet    set = (AllocSet) context;
      Size        nblocks = 0;
--- 1229,1243 ----

  /*
   * AllocSetStats
!  *        Compute stats about memory consumption of an allocset.
!  *
!  * level: recursion level (0 at top level); used for print indentation.
!  * print: true to print stats to stderr.
!  * totals: if not NULL, add stats about this allocset into *totals.
   */
  static void
! AllocSetStats(MemoryContext context, int level, bool print,
!               MemoryContextCounters *totals)
  {
      AllocSet    set = (AllocSet) context;
      Size        nblocks = 0;
*************** AllocSetStats(MemoryContext context, int
*** 1239,1247 ****
      Size        totalspace = 0;
      Size        freespace = 0;
      AllocBlock    block;
-     AllocChunk    chunk;
      int            fidx;
-     int            i;

      for (block = set->blocks; block != NULL; block = block->next)
      {
--- 1245,1251 ----
*************** AllocSetStats(MemoryContext context, int
*** 1251,1256 ****
--- 1255,1262 ----
      }
      for (fidx = 0; fidx < ALLOCSET_NUM_FREELISTS; fidx++)
      {
+         AllocChunk    chunk;
+
          for (chunk = set->freelist[fidx]; chunk != NULL;
               chunk = (AllocChunk) chunk->aset)
          {
*************** AllocSetStats(MemoryContext context, int
*** 1259,1271 ****
          }
      }

!     for (i = 0; i < level; i++)
!         fprintf(stderr, "  ");

!     fprintf(stderr,
              "%s: %zu total in %zd blocks; %zu free (%zd chunks); %zu used\n",
!             set->header.name, totalspace, nblocks, freespace, nchunks,
!             totalspace - freespace);
  }


--- 1265,1289 ----
          }
      }

!     if (print)
!     {
!         int            i;

!         for (i = 0; i < level; i++)
!             fprintf(stderr, "  ");
!         fprintf(stderr,
              "%s: %zu total in %zd blocks; %zu free (%zd chunks); %zu used\n",
!                 set->header.name, totalspace, nblocks, freespace, nchunks,
!                 totalspace - freespace);
!     }
!
!     if (totals)
!     {
!         totals->nblocks += nblocks;
!         totals->nchunks += nchunks;
!         totals->totalspace += totalspace;
!         totals->freespace += freespace;
!     }
  }


diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 12d29f7..ad1421c 100644
*** a/src/backend/utils/mmgr/mcxt.c
--- b/src/backend/utils/mmgr/mcxt.c
*************** MemoryContext CurTransactionContext = NU
*** 52,58 ****
  MemoryContext PortalContext = NULL;

  static void MemoryContextCallResetCallbacks(MemoryContext context);
! static void MemoryContextStatsInternal(MemoryContext context, int level);

  /*
   * You should not do memory allocations within a critical section, because
--- 52,60 ----
  MemoryContext PortalContext = NULL;

  static void MemoryContextCallResetCallbacks(MemoryContext context);
! static void MemoryContextStatsInternal(MemoryContext context, int level,
!                            bool print, int max_children,
!                            MemoryContextCounters *totals);

  /*
   * You should not do memory allocations within a critical section, because
*************** MemoryContextIsEmpty(MemoryContext conte
*** 477,501 ****
   * MemoryContextStats
   *        Print statistics about the named context and all its descendants.
   *
!  * This is just a debugging utility, so it's not fancy.  The statistics
!  * are merely sent to stderr.
   */
  void
  MemoryContextStats(MemoryContext context)
  {
!     MemoryContextStatsInternal(context, 0);
  }

  static void
! MemoryContextStatsInternal(MemoryContext context, int level)
  {
      MemoryContext child;

      AssertArg(MemoryContextIsValid(context));

!     (*context->methods->stats) (context, level);
!     for (child = context->firstchild; child != NULL; child = child->nextchild)
!         MemoryContextStatsInternal(child, level + 1);
  }

  /*
--- 479,584 ----
   * MemoryContextStats
   *        Print statistics about the named context and all its descendants.
   *
!  * This is just a debugging utility, so it's not very fancy.  However, we do
!  * make some effort to summarize when the output would otherwise be very long.
!  * The statistics are sent to stderr.
   */
  void
  MemoryContextStats(MemoryContext context)
  {
!     /* A hard-wired limit on the number of children is usually good enough */
!     MemoryContextStatsDetail(context, 100);
! }
!
! /*
!  * MemoryContextStatsDetail
!  *
!  * Entry point for use if you want to vary the number of child contexts shown.
!  */
! void
! MemoryContextStatsDetail(MemoryContext context, int max_children)
! {
!     MemoryContextCounters grand_totals;
!
!     memset(&grand_totals, 0, sizeof(grand_totals));
!
!     MemoryContextStatsInternal(context, 0, true, max_children, &grand_totals);
!
!     fprintf(stderr,
!     "Grand total: %zu bytes in %zd blocks; %zu free (%zd chunks); %zu used\n",
!             grand_totals.totalspace, grand_totals.nblocks,
!             grand_totals.freespace, grand_totals.nchunks,
!             grand_totals.totalspace - grand_totals.freespace);
  }

+ /*
+  * MemoryContextStatsInternal
+  *        One recursion level for MemoryContextStats
+  *
+  * Print this context if print is true, but in any case accumulate counts into
+  * *totals (if given).
+  */
  static void
! MemoryContextStatsInternal(MemoryContext context, int level,
!                            bool print, int max_children,
!                            MemoryContextCounters *totals)
  {
+     MemoryContextCounters local_totals;
      MemoryContext child;
+     int            ichild;

      AssertArg(MemoryContextIsValid(context));

!     /* Examine the context itself */
!     (*context->methods->stats) (context, level, print, totals);
!
!     /*
!      * Examine children.  If there are more than max_children of them, we do
!      * not print the rest explicitly, but just summarize them.
!      */
!     memset(&local_totals, 0, sizeof(local_totals));
!
!     for (child = context->firstchild, ichild = 0;
!          child != NULL;
!          child = child->nextchild, ichild++)
!     {
!         if (ichild < max_children)
!             MemoryContextStatsInternal(child, level + 1,
!                                        print, max_children,
!                                        totals);
!         else
!             MemoryContextStatsInternal(child, level + 1,
!                                        false, max_children,
!                                        &local_totals);
!     }
!
!     /* Deal with excess children */
!     if (ichild > max_children)
!     {
!         if (print)
!         {
!             int            i;
!
!             for (i = 0; i <= level; i++)
!                 fprintf(stderr, "  ");
!             fprintf(stderr,
!                     "%d more child contexts containing %zu total in %zd blocks; %zu free (%zd chunks); %zu used\n",
!                     ichild - max_children,
!                     local_totals.totalspace,
!                     local_totals.nblocks,
!                     local_totals.freespace,
!                     local_totals.nchunks,
!                     local_totals.totalspace - local_totals.freespace);
!         }
!
!         if (totals)
!         {
!             totals->nblocks += local_totals.nblocks;
!             totals->nchunks += local_totals.nchunks;
!             totals->totalspace += local_totals.totalspace;
!             totals->freespace += local_totals.freespace;
!         }
!     }
  }

  /*
diff --git a/src/include/nodes/memnodes.h b/src/include/nodes/memnodes.h
index 5e036b9..5e6bb60 100644
*** a/src/include/nodes/memnodes.h
--- b/src/include/nodes/memnodes.h
***************
*** 17,22 ****
--- 17,40 ----
  #include "nodes/nodes.h"

  /*
+  * MemoryContextCounters
+  *        Summarization state for MemoryContextStats collection.
+  *
+  * The set of counters in this struct is biased towards AllocSet; if we ever
+  * add any context types that are based on fundamentally different approaches,
+  * we might need more or different counters here.  A possible API spec then
+  * would be to print only nonzero counters, but for now we just summarize in
+  * the format historically used by AllocSet.
+  */
+ typedef struct MemoryContextCounters
+ {
+     Size        nblocks;        /* Total number of malloc blocks */
+     Size        nchunks;        /* Total number of free chunks */
+     Size        totalspace;        /* Total bytes requested from malloc */
+     Size        freespace;        /* The unused portion of totalspace */
+ } MemoryContextCounters;
+
+ /*
   * MemoryContext
   *        A logical context in which memory allocations occur.
   *
*************** typedef struct MemoryContextMethods
*** 44,50 ****
      void        (*delete_context) (MemoryContext context);
      Size        (*get_chunk_space) (MemoryContext context, void *pointer);
      bool        (*is_empty) (MemoryContext context);
!     void        (*stats) (MemoryContext context, int level);
  #ifdef MEMORY_CONTEXT_CHECKING
      void        (*check) (MemoryContext context);
  #endif
--- 62,69 ----
      void        (*delete_context) (MemoryContext context);
      Size        (*get_chunk_space) (MemoryContext context, void *pointer);
      bool        (*is_empty) (MemoryContext context);
!     void        (*stats) (MemoryContext context, int level, bool print,
!                                       MemoryContextCounters *totals);
  #ifdef MEMORY_CONTEXT_CHECKING
      void        (*check) (MemoryContext context);
  #endif
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index f0fe0f4..38106d7 100644
*** a/src/include/utils/memutils.h
--- b/src/include/utils/memutils.h
*************** extern MemoryContext GetMemoryChunkConte
*** 104,109 ****
--- 104,110 ----
  extern MemoryContext MemoryContextGetParent(MemoryContext context);
  extern bool MemoryContextIsEmpty(MemoryContext context);
  extern void MemoryContextStats(MemoryContext context);
+ extern void MemoryContextStatsDetail(MemoryContext context, int max_children);
  extern void MemoryContextAllowInCriticalSection(MemoryContext context,
                                      bool allow);

Re: (full) Memory context dump considered harmful

From

Tomas Vondra

Date:

22 August 2015, 04:26:01

On 08/21/2015 08:37 PM, Tom Lane wrote:
> Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>>
>> I also don't think logging just subset of the stats is a lost case.
>> Sure, we can't know which of the lines are important, but for example
>> logging just the top-level contexts with a summary of the child contexts
>> would be OK.
>
> I thought a bit more about this.  Generally, what you want to know about
> a given situation is which contexts have a whole lot of allocations
> and/or a whole lot of child contexts.  What you suggest above won't work
> very well if the problem is buried more than about two levels down in
> the context tree.  But suppose we add a parameter to memory context stats
> collection that is the maximum number of child contexts to print *per
> parent context*.  If there are more than that, summarize the rest as per
> your suggestion.  So any given recursion level might look like
>
>       FooContext: m total in n blocks ...
>         ChildContext1: m total in n blocks ...
>           possible grandchildren...
>         ChildContext2: m total in n blocks ...
>           possible grandchildren...
>         ChildContext3: m total in n blocks ...
>           possible grandchildren...
>         k more child contexts containing m total in n blocks ...
>
> This would require a fixed amount of extra state per recursion level,
> so it could be done with a few more parameters/local variables in
> MemoryContextStats and no need to risk a malloc().
>
> The case where you would lose important data is where the serious
> bloat is in some specific child context that is after the first N
> children of its direct parent. I don't believe I've ever seen a case
> where that was critical information as long as N isn't too tiny.

Couldn't we make it a bit smarter to handle even cases like this? For 
example we might first count/sum the child contexts, and then print 
either all child contexts (if there are only a few of them) or just 
those that are >5% of the total, 2x the average or something like that.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: (full) Memory context dump considered harmful

From

Tom Lane

Date:

22 August 2015, 16:06:20

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> On 08/21/2015 08:37 PM, Tom Lane wrote:
>> ...  But suppose we add a parameter to memory context stats
>> collection that is the maximum number of child contexts to print *per
>> parent context*.  If there are more than that, summarize the rest as per
>> your suggestion.
>> 
>> The case where you would lose important data is where the serious
>> bloat is in some specific child context that is after the first N
>> children of its direct parent. I don't believe I've ever seen a case
>> where that was critical information as long as N isn't too tiny.

> Couldn't we make it a bit smarter to handle even cases like this? For 
> example we might first count/sum the child contexts, and then print 
> either all child contexts (if there are only a few of them) or just 
> those that are >5% of the total, 2x the average or something like that.

That seems overly complicated.  In the first place, you couldn't possibly
implement that without two passes over the context set, which would be
mighty expensive.  (In the situations where this is getting run, most
likely portions of the address space have been swapped out, and we'll have
to swap them back in again.  Bad enough to iterate over the whole process
address space once, but twice?)  In the second place, it would seem to
require a much wider API between MemoryContextStats and the
per-context-type stats methods, including for example a way to decide
which numbers about a context were the most important ones.  I'm already
concerned about having had to expose a set of accumulatable numbers at all
... don't want the API to get into their semantics in that kind of detail.

As I said, based on my experience in looking at memory context dumps
(and I've seen a few), a show-the-first-N-children heuristic probably
will work fine.  If we find out differently we can work on refining it,
but I don't think a complex design is warranted at this stage.

One thing we could consider doing to improve the odds that it's fine
would be to rearrange things so that child contexts of the same parent
are more likely to be "similar" --- for example, break out all relcache
entries to be children of a RelCacheContext instead of the generic
CacheMemoryContext, likewise for cached plans, etc.  But I'm not yet
convinced that'd be worth the trouble.
        regards, tom lane

Re: (full) Memory context dump considered harmful

From

Tomas Vondra

Date:

22 August 2015, 17:16:34

On 08/22/2015 06:06 PM, Tom Lane wrote:
> Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>>
>> Couldn't we make it a bit smarter to handle even cases like this? For
>> example we might first count/sum the child contexts, and then print
>> either all child contexts (if there are only a few of them) or just
>> those that are >5% of the total, 2x the average or something like that.
>
> That seems overly complicated. In the first place, you couldn't
> possibly implement that without two passes over the context set,
> which would be mighty expensive. (In the situations where this is
> getting run, most likely portions of the address space have been
> swapped out, and we'll have to swap them back in again. Bad enough to
> iterate over the whole process address space once, but twice?) In the
> second place, it would seem to require a much wider API between
> MemoryContextStats and the per-context-type stats methods, including
> for example a way to decide which numbers about a context were the
> most important ones. I'm already concerned about having had to expose
> a set of accumulatable numbers at all ... don't want the API to get
> into their semantics in that kind of detail.

Sure, the two passes seem quite annoying, although I'm not convinced 
it'd be a problem in practice - most modern systems I've been dealing 
with recently were configured with the 'no swapping' goal, and using 
only a tiny swap for extreme cases (e.g. 256GB RAM with 4GB of swap).

If only we had some memory accounting in place ;-) Then we wouldn't have 
to do the two passes ...

> As I said, based on my experience in looking at memory context dumps
> (and I've seen a few), a show-the-first-N-children heuristic
> probably will work fine. If we find out differently we can work on
> refining it, but I don't think a complex design is warranted at this
> stage.

OK, let's go with the simple approach.

I'm still not quite convinced having no GUC for turning this off is a 
good idea, though. From time to time we're dealing with OOM issues at 
customer systems, with very limited access (e.g. no gdb). In those cases 
the memory context is the only information we have initial, and it's 
usually quite difficult to get additional info.

I agree that the 'first-N-children' heuristics is going to work in most 
cases, but if it doesn't it'd be nice to have a way to force "full" 
stats in ad-hoc manner (i.e. disable before query, etc.).

OTOH now that I'm thinking about it, most OOM errors I've seen (at least 
on Linux) were either because of exceptionally large palloc() requests 
(e.g. because of varlena header corruption, overflow bugs, ...) or 
because of OOM killer. In the first case the memory context stats are 
utterly useless because the issue has nothing to do with other memory 
contexts, in the latter case you don't get any stats at all because the 
process is simply killed.

One question regarding the proposed patch though - if I get it right 
(just looking at the diff), it simply limits the output to first 100 
child contexts at each level independently. So if each of those 100 
child contexts has >100 child contexts on it's own, we get 100x100 
lines. And so on, if the hierarchy is deeper. This probably is not 
addressable without introducing some global counter of printed contexts, 
and it may not be an issue at all (all the cases I could remember have a 
single huge context or many sibling contexts).

> One thing we could consider doing to improve the odds that it's fine
>  would be to rearrange things so that child contexts of the same
> parent are more likely to be "similar" --- for example, break out
> all relcache entries to be children of a RelCacheContext instead of
> the generic CacheMemoryContext, likewise for cached plans, etc. But
> I'm not yet convinced that'd be worth the trouble.

That'd be nice but I see that as an independent improvement - it might 
improve the odds for internal contexts, but what about contexts coming 
from user code (e.g. custom aggregates)?

kind regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: (full) Memory context dump considered harmful

From

Tom Lane

Date:

22 August 2015, 17:37:17

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> One question regarding the proposed patch though - if I get it right 
> (just looking at the diff), it simply limits the output to first 100 
> child contexts at each level independently. So if each of those 100 
> child contexts has >100 child contexts on it's own, we get 100x100 
> lines. And so on, if the hierarchy is deeper. This probably is not 
> addressable without introducing some global counter of printed contexts, 
> and it may not be an issue at all (all the cases I could remember have a 
> single huge context or many sibling contexts).

Right.  The situation Stefan was complaining of, almost certainly,
involved a huge number of children of the same context.  This patch
would successfully abbreviate that case, no matter where it happened
in the context tree exactly.  In principle, if you were getting that
sort of expansion at multiple levels of the context tree concurrently,
you could still get a mighty long context dump ... but I really doubt
that would happen in practice.  (And if it did happen, an overall limit
on the number of contexts printed would hide the fact that it was
happening, which wouldn't be desirable.)

>> One thing we could consider doing to improve the odds that it's fine
>> would be to rearrange things so that child contexts of the same
>> parent are more likely to be "similar" --- for example, break out
>> all relcache entries to be children of a RelCacheContext instead of
>> the generic CacheMemoryContext, likewise for cached plans, etc. But
>> I'm not yet convinced that'd be worth the trouble.

> That'd be nice but I see that as an independent improvement - it might 
> improve the odds for internal contexts, but what about contexts coming 
> from user code (e.g. custom aggregates)?

Yeah, cases like custom aggregates would be hard to classify.
        regards, tom lane

Re: (full) Memory context dump considered harmful

From

Stefan Kaltenbrunner

Date:

24 August 2015, 06:33:33

On 08/22/2015 06:25 AM, Tomas Vondra wrote:
> On 08/21/2015 08:37 PM, Tom Lane wrote:
>> Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>>>
>>> I also don't think logging just subset of the stats is a lost case.
>>> Sure, we can't know which of the lines are important, but for example
>>> logging just the top-level contexts with a summary of the child contexts
>>> would be OK.
>>
>> I thought a bit more about this.  Generally, what you want to know about
>> a given situation is which contexts have a whole lot of allocations
>> and/or a whole lot of child contexts.  What you suggest above won't work
>> very well if the problem is buried more than about two levels down in
>> the context tree.  But suppose we add a parameter to memory context stats
>> collection that is the maximum number of child contexts to print *per
>> parent context*.  If there are more than that, summarize the rest as per
>> your suggestion.  So any given recursion level might look like
>>
>>       FooContext: m total in n blocks ...
>>         ChildContext1: m total in n blocks ...
>>           possible grandchildren...
>>         ChildContext2: m total in n blocks ...
>>           possible grandchildren...
>>         ChildContext3: m total in n blocks ...
>>           possible grandchildren...
>>         k more child contexts containing m total in n blocks ...
>>
>> This would require a fixed amount of extra state per recursion level,
>> so it could be done with a few more parameters/local variables in
>> MemoryContextStats and no need to risk a malloc().
>>
>> The case where you would lose important data is where the serious
>> bloat is in some specific child context that is after the first N
>> children of its direct parent. I don't believe I've ever seen a case
>> where that was critical information as long as N isn't too tiny.
>
> Couldn't we make it a bit smarter to handle even cases like this? For
> example we might first count/sum the child contexts, and then print
> either all child contexts (if there are only a few of them) or just
> those that are >5% of the total, 2x the average or something like that.

While having that kind of logic would be nice, i dont think it is 
required. For the case I had the proposed patch from tom seems perfectly 
fine to me - not sure we would want a GUC. From a DBA perspective I dont 
think anybody needs millions of lines of almost duplicated memory 
context dumps and also not sure we need it from a developer perspective 
either (other than the information: "there were more than those I printed")



regards


Stefan