Small overhead run time memory trace (Was Re: shall we have a TRACE_MEMORY mode) - Mailing list pgsql-hackers

From Qingqing Zhou
Subject Small overhead run time memory trace (Was Re: shall we have a TRACE_MEMORY mode)
Date
Msg-id e7fjme$1585$1@news.hub.org
Whole thread Raw
In response to shall we have a TRACE_MEMORY mode  ("Qingqing Zhou" <zhouqq@cs.toronto.edu>)
List pgsql-hackers
"Tom Lane" <tgl@sss.pgh.pa.us> wrote
>
> One idea that comes to mind is to have a compile time option to record
> the palloc __FILE__ and _LINE__ in every AllocChunk header.  Then it
> would not be so hard to identify the culprit while trawling through
> memory.  The overhead costs would be so high that you'd never turn it on
> by default though :-(
>
> Another thing to consider is that the proximate location of the palloc
> is frequently *not* very useful.  For instance, if your memory is
> getting eaten by lists, all the palloc traces will point at
> new_tail_cell().  Not much help.  I don't know what to do about that
> ... any ideas?
>

So basically there are two problems of tracing memory usage:
   1. Memory/CPU overhead;   2. Hidden memory allocation calls;

To address problem 1, I think we can even come up with a run time solution
(instead of compiling option). We can have a userset GUC variable
   int    trace_percent \in [0, 100]

when it is 0, then the trace memory code is non-op, which is used in normal
running mode and this add only two more instructions overhead to each
palloc(). When it is 100, all memory usage are traced. When it is a value
between, this percentage of memory usage are traced --this is good for
*massive* memory leak, since a random probe could catch the suspect. I think
a very small number will do.

To reduce the memory overhead, we have two ways basically. One is that plug
in two uint16 into the AllocChunk, one uint16 for the index of a separeated
maintained __FILE__ list, one for __line__. Another way is that we maintain
all these traces in a totally separate memory context.

For problem 2, the only solution AFAICS for 20 platforms is to redefine
their function.

Regards,
Qingqing




pgsql-hackers by date:

Previous
From: Steve Atkins
Date:
Subject: Re: vacuum, performance, and MVCC
Next
From: Tom Lane
Date:
Subject: Row comparison for tables (was Re: vacuum, performance, and MVCC)