(full) Memory context dump considered harmful - Mailing list pgsql-hackers

From Stefan Kaltenbrunner
Subject (full) Memory context dump considered harmful
Date
Msg-id 55D5F765.30901@kaltenbrunner.cc
Whole thread Raw
Responses Re: (full) Memory context dump considered harmful
Re: (full) Memory context dump considered harmful
List pgsql-hackers
Hi all!


We just had a case of a very long running process of ours that creates 
does a lot of prepared statements through Perls DBD:Pg running into:

https://rt.cpan.org/Public/Bug/Display.html?id=88827

This resulted in millions of prepared statements created, but not 
removed in the affected backends over the course of 1-2 hours until the 
backends in question ran out of memory.
The "out of memory" condition resulted in one memory context dump 
generated per occurance each consisting of >2M lines(!) (basically a 
line of CachedPlan/CachePlanSource per statement/function).
In the 20 Minutes or it took monitoring to alert and operations to react 
this cause a followup incident because repeated "out of memory" 
conditions caused over 400M(!!) loglines amounting to some 15GB of data 
running the log partition dangerously close to full.

an example memory context dump looks like this:


TopMemoryContext: 582728880 total in 71126 blocks; 6168 free (52 
chunks); 582722712 used  TopTransactionContext: 8192 total in 1 blocks; 6096 free (1 chunks); 
2096 used    ExecutorState: 8192 total in 1 blocks; 5392 free (0 chunks); 2800 used      ExprContext: 8192 total in 1
blocks;8160 free (0 chunks); 32 used    SPI Exec: 0 total in 0 blocks; 0 free (0 chunks); 0 used    SPI Proc: 8192
totalin 1 blocks; 5416 free (0 chunks); 2776 used  PL/pgSQL function context: 8192 total in 1 blocks; 1152 free (1 
 
chunks); 7040 used  PL/pgSQL function context: 24576 total in 2 blocks; 11400 free (1 
chunks); 13176 used  Type information cache: 24576 total in 2 blocks; 11888 free (5 
chunks); 12688 used  PL/pgSQL function context: 8192 total in 1 blocks; 1120 free (1 
chunks); 7072 used  PL/pgSQL function context: 24576 total in 2 blocks; 10984 free (1 
chunks); 13592 used  PL/pgSQL function context: 57344 total in 3 blocks; 29928 free (2 
chunks); 27416 used  PL/pgSQL function context: 57344 total in 3 blocks; 28808 free (2 
chunks); 28536 used  PL/pgSQL function context: 24576 total in 2 blocks; 5944 free (3 
chunks); 18632 used  RI compare cache: 24576 total in 2 blocks; 15984 free (5 chunks); 
8592 used  RI query cache: 24576 total in 2 blocks; 11888 free (5 chunks); 12688 
used  PL/pgSQL function context: 57344 total in 3 blocks; 31832 free (2 
chunks); 25512 used  PL/pgSQL function context: 57344 total in 3 blocks; 29600 free (2 
chunks); 27744 used  PL/pgSQL function context: 57344 total in 3 blocks; 39688 free (5 
chunks); 17656 used  CFuncHash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used  Rendezvous variable hash: 8192
totalin 1 blocks; 1680 free (0 
 
chunks); 6512 used  PLpgSQL function cache: 24520 total in 2 blocks; 3744 free (0 
chunks); 20776 used  Prepared Queries: 125886512 total in 25 blocks; 4764208 free (91 
chunks); 121122304 used  TableSpace cache: 8192 total in 1 blocks; 3216 free (0 chunks); 4976 used  Operator lookup
cache:24576 total in 2 blocks; 11888 free (5 
 
chunks); 12688 used  MessageContext: 8192 total in 1 blocks; 6976 free (0 chunks); 1216 used  Operator class cache:
8192total in 1 blocks; 1680 free (0 chunks); 
 
6512 used  smgr relation table: 24576 total in 2 blocks; 5696 free (4 chunks); 
18880 used  TransactionAbortContext: 32768 total in 1 blocks; 32736 free (0 
chunks); 32 used  Portal hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used  PortalMemory: 8192 total in 1
blocks;7888 free (0 chunks); 304 used    PortalHeapMemory: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
ExecutorState:57344 total in 3 blocks; 21856 free (2 chunks); 
 
35488 used        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used        ExprContext: 8192 total in 1
blocks;8160 free (1 chunks); 32 used  Relcache by OID: 24576 total in 2 blocks; 12832 free (3 chunks); 
 
11744 used  CacheMemoryContext: 42236592 total in 28 blocks; 7160904 free (298 
chunks); 35075688 used    CachedPlan: 7168 total in 3 blocks; 1544 free (0 chunks); 5624 used    CachedPlanSource: 7168
totalin 3 blocks; 3904 free (1 chunks); 
 
3264 used    CachedPlan: 15360 total in 4 blocks; 6440 free (1 chunks); 8920 used    CachedPlanSource: 7168 total in 3
blocks;1352 free (0 chunks); 
 
5816 used    CachedPlan: 15360 total in 4 blocks; 6440 free (1 chunks); 8920 used    CachedPlanSource: 7168 total in 3
blocks;1352 free (0 chunks); 
 
5816 used    CachedPlan: 7168 total in 3 blocks; 1544 free (0 chunks); 5624 used    CachedPlanSource: 7168 total in 3
blocks;3904 free (1 chunks); 
 
3264 used    CachedPlan: 15360 total in 4 blocks; 6440 free (1 chunks); 8920 used    CachedPlanSource: 7168 total in 3
blocks;1352 free (0 chunks); 
 
5816 used    CachedPlan: 15360 total in 4 blocks; 6440 free (1 chunks); 8920 used    CachedPlanSource: 7168 total in 3
blocks;1352 free (0 chunks); 
 
5816 used    CachedPlan: 7168 total in 3 blocks; 1544 free (0 chunks); 5624 used    CachedPlanSource: 7168 total in 3
blocks;3904 free (1 chunks); 
 
3264 used    CachedPlan: 15360 total in 4 blocks; 6440 free (1 chunks); 8920 used    CachedPlanSource: 7168 total in 3
blocks;1352 free (0 chunks); 
 
5816 used    CachedPlan: 15360 total in 4 blocks; 6440 free (1 chunks); 8920 used    CachedPlanSource: 7168 total in 3
blocks;1352 free (0 chunks); 
 
5816 used    CachedPlan: 7168 total in 3 blocks; 1544 free (0 chunks); 5624 used    CachedPlanSource: 7168 total in 3
blocks;3904 free (1 chunks); 
 
3264 used    CachedPlan: 15360 total in 4 blocks; 6440 free (1 chunks); 8920 used    CachedPlanSource: 7168 total in 3
blocks;1352 free (0 chunks); 
 
5816 used    CachedPlan: 7168 total in 3 blocks; 2248 free (0 chunks); 4920 used    CachedPlanSource: 3072 total in 2
blocks;64 free (0 chunks); 3008 used    CachedPlan: 15360 total in 4 blocks; 7576 free (0 chunks); 7784 used
CachedPlanSource:7168 total in 3 blocks; 1888 free (0 chunks); 
 
5280 used    CachedPlan: 15360 total in 4 blocks; 7576 free (0 chunks); 7784 used    CachedPlanSource: 7168 total in 3
blocks;1888 free (0 chunks); 
 
5280 used    CachedPlan: 7168 total in 3 blocks; 2248 free (0 chunks); 4920 used    CachedPlanSource: 3072 total in 2
blocks;64 free (0 chunks); 3008 used    CachedPlan: 15360 total in 4 blocks; 7576 free (0 chunks); 7784 used
 

... goes of for some 2M lines...

given that an "out of memory" situation is already bad enough, but 
essentially mounting a DoS on the rest of the system due to creating 
excessive amounts of logs feels even worse.

I wonder if we should have a default of capping the dump to say 1k lines 
or such and only optionally do a full one.
the other option would be to find an alternative way of displaying the 
information like if there is a large number of repeated entries just 
show the first X of them and say "123456 similiar ones".

This is 9.1.14 on Debian Wheezy/amd64 fwiw - but I dont think we have 
made relevant changes in more recent versions.



regards


Stefan



pgsql-hackers by date:

Previous
From: Greg Stark
Date:
Subject: Re: PostgreSQL for VAX on NetBSD/OpenBSD
Next
From: Corey Huinker
Date:
Subject: Re: Declarative partitioning