Re: (full) Memory context dump considered harmful - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: (full) Memory context dump considered harmful
Date
Msg-id 55D8AE66.7010800@2ndquadrant.com
Whole thread Raw
In response to Re: (full) Memory context dump considered harmful  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: (full) Memory context dump considered harmful  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On 08/22/2015 06:06 PM, Tom Lane wrote:
> Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>>
>> Couldn't we make it a bit smarter to handle even cases like this? For
>> example we might first count/sum the child contexts, and then print
>> either all child contexts (if there are only a few of them) or just
>> those that are >5% of the total, 2x the average or something like that.
>
> That seems overly complicated. In the first place, you couldn't
> possibly implement that without two passes over the context set,
> which would be mighty expensive. (In the situations where this is
> getting run, most likely portions of the address space have been
> swapped out, and we'll have to swap them back in again. Bad enough to
> iterate over the whole process address space once, but twice?) In the
> second place, it would seem to require a much wider API between
> MemoryContextStats and the per-context-type stats methods, including
> for example a way to decide which numbers about a context were the
> most important ones. I'm already concerned about having had to expose
> a set of accumulatable numbers at all ... don't want the API to get
> into their semantics in that kind of detail.

Sure, the two passes seem quite annoying, although I'm not convinced 
it'd be a problem in practice - most modern systems I've been dealing 
with recently were configured with the 'no swapping' goal, and using 
only a tiny swap for extreme cases (e.g. 256GB RAM with 4GB of swap).

If only we had some memory accounting in place ;-) Then we wouldn't have 
to do the two passes ...

> As I said, based on my experience in looking at memory context dumps
> (and I've seen a few), a show-the-first-N-children heuristic
> probably will work fine. If we find out differently we can work on
> refining it, but I don't think a complex design is warranted at this
> stage.

OK, let's go with the simple approach.

I'm still not quite convinced having no GUC for turning this off is a 
good idea, though. From time to time we're dealing with OOM issues at 
customer systems, with very limited access (e.g. no gdb). In those cases 
the memory context is the only information we have initial, and it's 
usually quite difficult to get additional info.

I agree that the 'first-N-children' heuristics is going to work in most 
cases, but if it doesn't it'd be nice to have a way to force "full" 
stats in ad-hoc manner (i.e. disable before query, etc.).

OTOH now that I'm thinking about it, most OOM errors I've seen (at least 
on Linux) were either because of exceptionally large palloc() requests 
(e.g. because of varlena header corruption, overflow bugs, ...) or 
because of OOM killer. In the first case the memory context stats are 
utterly useless because the issue has nothing to do with other memory 
contexts, in the latter case you don't get any stats at all because the 
process is simply killed.

One question regarding the proposed patch though - if I get it right 
(just looking at the diff), it simply limits the output to first 100 
child contexts at each level independently. So if each of those 100 
child contexts has >100 child contexts on it's own, we get 100x100 
lines. And so on, if the hierarchy is deeper. This probably is not 
addressable without introducing some global counter of printed contexts, 
and it may not be an issue at all (all the cases I could remember have a 
single huge context or many sibling contexts).

> One thing we could consider doing to improve the odds that it's fine
>  would be to rearrange things so that child contexts of the same
> parent are more likely to be "similar" --- for example, break out
> all relcache entries to be children of a RelCacheContext instead of
> the generic CacheMemoryContext, likewise for cached plans, etc. But
> I'm not yet convinced that'd be worth the trouble.

That'd be nice but I see that as an independent improvement - it might 
improve the odds for internal contexts, but what about contexts coming 
from user code (e.g. custom aggregates)?

kind regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: PostgreSQL for VAX on NetBSD/OpenBSD
Next
From: Tom Lane
Date:
Subject: Re: (full) Memory context dump considered harmful