Re: (full) Memory context dump considered harmful - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: (full) Memory context dump considered harmful
Date
Msg-id 55D647BB.2070207@2ndquadrant.com
Whole thread Raw
In response to Re: (full) Memory context dump considered harmful  (Stefan Kaltenbrunner <stefan@kaltenbrunner.cc>)
Responses Re: (full) Memory context dump considered harmful
List pgsql-hackers
Hi,

On 08/20/2015 11:04 PM, Stefan Kaltenbrunner wrote:
> On 08/20/2015 06:09 PM, Tom Lane wrote:
>> Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
>>> I wonder if we should have a default of capping the dump to say 1k lines
>>> or such and only optionally do a full one.
>>
>> -1. It's worked like this for the last fifteen years or
>> thereabouts, and you're the first one to complain. I suspect some
>> weirdness in your logging setup, rather than any systemic problem
>> that we need to lobotomize our diagnostic output in order to
>> prevent.
>
> not sure what you consider weird in the logging setup here - the
> context dump is imho borderline on internal diagnostic output at a
> debug level (rather than making sense to an average sysadmin) already
> (and no way to control it). But having (like in our case) the backend
> dumping 2 million basically identical lines into a general logfile
> per event seems excessive and rather abusive towards the rest of the
> system (just from an IO perspective for example or from a log file
> post processing tool perspective)

I've seen similar issues too, on systems with perfectly common logging 
setups. So even if your logging setup would be strange, it's not the 
main factor here.

>> (The reason I say "lobotomize" is that there's no particularly
>> good reason to assume that the first N lines will tell you what you
>> need to know. And the filter rule would have to be *very* stupid,
>> because we can't risk trying to allocate any additional memory to
>> track what we're doing here.)
>
> I do understand that there migt be challenges there but in the last
> 15 years machines got way faster and pg got way capable and some of
> those capabilities might need to get revisited in that regards - and
> while it is very nice that pg survives multiple oom cases pretty
> nicely I dont think it is entitled to put an imho unreasonable burden
> on the rest of the system by writing "insane" amounts of data.
>
> Just from a sysadmin perspective it also means that it is trivial for
> a misbehaving app to fill up the logfile on a system because unlike
> almost all other actual other logging output there seems to be no way
> to control/disabled it on a per role/database level.

IMHO the first thing we might do is provide log_memory_stats GUC 
controlling that. I'm not a big fan of adding GUC for everything, but in 
this case it seems appropriate, just like the other log_ options.

I also don't think logging just subset of the stats is a lost case. 
Sure, we can't know which of the lines are important, but for example 
logging just the top-level contexts with a summary of the child contexts 
would be OK. So something like this:

TopMemoryContext: 582728880 total in 71126 blocks; 6168 free  ...  TopTransactionContext: 8192 total in 1 blocks; 6096
free(1 ...    3 child contexts, 10 blocks, 34873 free  PL/pgSQL function context: 8192 total in 1 blocks; 1152 fre ...
PL/pgSQLfunction context: 24576 total in 2 blocks; 11400 f ...  Type information cache: 24576 total in 2 blocks; 11888
free...  PL/pgSQL function context: 8192 total in 1 blocks; 1120 fre ...  PL/pgSQL function context: 24576 total in 2
blocks;10984 f ...  PL/pgSQL function context: 57344 total in 3 blocks; 29928 f ...  PL/pgSQL function context: 57344
totalin 3 blocks; 28808 f ...  PL/pgSQL function context: 24576 total in 2 blocks; 5944 fr ...  RI compare cache: 24576
totalin 2 blocks; 15984 free (5 ch ...  RI query cache: 24576 total in 2 blocks; 11888 free (5 chun ...  PL/pgSQL
functioncontext: 57344 total in 3 blocks; 31832 f ...  PL/pgSQL function context: 57344 total in 3 blocks; 29600 f ...
PL/pgSQLfunction context: 57344 total in 3 blocks; 39688 f ...  CFuncHash: 8192 total in 1 blocks; 1680 free (0
chunks);65 ...  Rendezvous variable hash: 8192 total in 1 blocks; 1680 free ...  PLpgSQL function cache: 24520 total in
2blocks; 3744 free  ...  Prepared Queries: 125886512 total in 25 blocks; 4764208 fre ...  TableSpace cache: 8192 total
in1 blocks; 3216 free (0 chun ...  Operator lookup cache: 24576 total in 2 blocks; 11888 free  ...  MessageContext:
8192total in 1 blocks; 6976 free (0 chunks ...  Operator class cache: 8192 total in 1 blocks; 1680 free (0  ...  smgr
relationtable: 24576 total in 2 blocks; 5696 free (4  ...  TransactionAbortContext: 32768 total in 1 blocks; 32736 fre
... Portal hash: 8192 total in 1 blocks; 1680 free (0 chunks);  ...  PortalMemory: 8192 total in 1 blocks; 7888 free (0
chunks);...    4 child contexts, 34 blocks, 87283 free  Relcache by OID: 24576 total in 2 blocks; 12832 free (3 chu ...
CacheMemoryContext: 42236592 total in 28 blocks; 7160904 fr ...    1487438 child contexts, 14874382 blocks, 3498349
free

instead of the millions of lines with all the datails.

So either
    log_memory_stats = (on|off)

or
    log_memory_stats = (full|short|off)

might work.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: David Fetter
Date:
Subject: Re: Declarative partitioning
Next
From: Peter Geoghegan
Date:
Subject: Re: Using quicksort for every external sort run