Thread: Postmaster Out of Memory

Postmaster Out of Memory

From
Jeff Gold
Date:
Joe Maldonado and I have a vexing problem with PostgreSQL 7.4.5 (we are
using the PGDG RPMs on a RedHat 9 system).  He posted about this briefly
with the subject "info on strange error messages on postgresql" on this
list, but I'll start from the beginning.  About once per week the
database enters some pathological state where the parent postmaster --
NOT one of the child connections -- appears to run out of memory.  This
causes it to dump megabytes of diagnostic data that I don't know how to
analyze into the log.  Then it does it all over again.  And again and
again and... well, eventually the log grows until it fills as many
gigabytes as the disk has available.

Here is an example from the logfile:

TopMemoryContext: 87552048 total in 10683 blocks; 181280 free (60
chunks); 87370768 used
TopTransactionContext: 57344 total in 3 blocks; 648 free (5 chunks);
56696 used
DeferredTriggerXact: 0 total in 0 blocks; 0 free (0 chunks); 0 used
SPI Plan: 3072 total in 2 blocks; 64 free (0 chunks); 3008 used
SPI Plan: 3072 total in 2 blocks; 64 free (0 chunks); 3008 used
SPI Plan: 3072 total in 2 blocks; 64 free (0 chunks); 3008 used
MessageContext: 8192 total in 1 blocks; 7248 free (1 chunks); 944 used
PortalMemory: 8192 total in 1 blocks; 8040 free (0 chunks); 152 used
PortalHeapMemory: 3072 total in 2 blocks; 1136 free (12 chunks); 1936 used
CacheMemoryContext: 421519360 total in 60 blocks; 830096 free (457
chunks); 420689264 used
current_hostpairs2_idx: 1024 total in 1 blocks; 640 free (0 chunks); 384
used
current_hostpair2_eid_key: 2048 total in 1 blocks; 704 free (0 chunks);
1344 used
pg_temp_260223: 2048 total in 1 blocks; 768 free (0 chunks); 1280 used
pg_temp_260225: 1024 total in 1 blocks; 640 free (0 chunks); 384 used
pg_temp_260226: 1024 total in 1 blocks; 640 free (0 chunks); 384 used
pg_toast_888405987_index: 1024 total in 1 blocks; 320 free (0 chunks);
704 used
pg_toast_888405987_index: 1024 total in 1 blocks; 320 free (0 chunks);
704 used
pg_temp_260196: 1024 total in 1 blocks; 640 free (0 chunks); 384 used
pg_toast_888382258_index: 1024 total in 1 blocks; 320 free (0 chunks);
704 used
pg_toast_888382258_index: 1024 total in 1 blocks; 320 free (0 chunks);
704 used
[...]
DynaHashTable: 0 total in 0 blocks; 0 free (0 chunks); 0 used
DynaHashTable: 0 total in 0 blocks; 0 free (0 chunks); 0 used
DynaHashTable: 0 total in 0 blocks; 0 free (0 chunks); 0 used
DynaHashTable: 0 total in 0 blocks; 0 free (0 chunks); 0 used
DynaHashTable: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ErrorContext: 8192 total in 1 blocks; 8176 free (0 chunks); 16 used

Eventually the following is reported in the log:

     [1587] ERROR:  out of memory
     DETAIL:  Failed on request of size 268.

A quick look at the code shows that this comes from the AllocSetAlloc()
routine at line 502 of src/backend/utils/mmgr/aset.c from the 7.4.5
release.  This doesn't tell me much about what happened beyond the fact
that malloc() failed.  I see no evidence in /var/log/messages to
indicate a system wide memory exhaustion and a periodic sampling of the
amount of available swap space shows that there is always at least a
gigabyte of that available.  This suggests that a postmaster process has
simply run out of address space somehow.

In each case the process identifier is the same (1587 in this instance).
  Other processes are unable to establish new connections, but the log
shows that existing connections continue to issue queries.  This leads
us to believe that it is the parent process which is running out of
memory rather than any child forked for a specific connection.  Is this
possible?  What might lead to such a case?

There is some history here.  We originally discovered that the database
would develop long term problems because system tables weren't being
vacuumed properly so we decided to deply pg_autovacuum.  This didn't
work out because we have temporary tables that get periodically deleted.
  Every so often pg_autovacuum would notice that a table was missing and
throw up its hands in despair.  We attempted to address this by
implementing a wrapper that restarted it whenever it terminates.

After this change the problem described above began.  We have
subsequently added a delay to the wrapper that doubles in duration each
time pg_autovacuum crashes to avoid hammering the system when something
is wrong.  That might make the problem less catestrophic, but of course
it won't really solve it.  After all, the memory problem is in the
parent postmaster, not pg_autovacuum.

Why should ordinary vacuuming run the system out of memory?  What should
we do to make things work reliably?

                                             Jeff

Re: Postmaster Out of Memory

From
Tom Lane
Date:
Jeff Gold <jgold@mazunetworks.com> writes:
> About once per week the
> database enters some pathological state where the parent postmaster --
> NOT one of the child connections -- appears to run out of memory.

I can absolutely, positively say that that dump is not from the parent
postmaster.  It's a backend.  Apparently it's a backend that is creating
an unreasonable number of hashtables --- the default location for
hashtable overhead is TopMemoryContext, and it looks like that explains
this:

> TopMemoryContext: 87552048 total in 10683 blocks; 181280 free (60
> chunks); 87370768 used

However this is a bit odd:

> CacheMemoryContext: 421519360 total in 60 blocks; 830096 free (457
> chunks); 420689264 used

I can't really think what would blow out CacheMemoryContext like that.

> pg_temp_260196: 1024 total in 1 blocks; 640 free (0 chunks); 384 used
> pg_toast_888382258_index: 1024 total in 1 blocks; 320 free (0 chunks);
> 704 used
> pg_toast_888382258_index: 1024 total in 1 blocks; 320 free (0 chunks);
> 704 used
> [...]
> DynaHashTable: 0 total in 0 blocks; 0 free (0 chunks); 0 used
> DynaHashTable: 0 total in 0 blocks; 0 free (0 chunks); 0 used
> DynaHashTable: 0 total in 0 blocks; 0 free (0 chunks); 0 used
> DynaHashTable: 0 total in 0 blocks; 0 free (0 chunks); 0 used
> DynaHashTable: 0 total in 0 blocks; 0 free (0 chunks); 0 used

This is quite uninformative.  Are you suggesting that there are many
many lines about indexes?  Many many lines about DynaHashTable?  Both?

How many tables/indexes do you have in your database, anyway?  Could
this process be running pg_dump, or something else that would try to
touch them all?

            regards, tom lane

Re: Postmaster Out of Memory

From
Tom Lane
Date:
Jeff Gold <jgold@mazunetworks.com> writes:
> I presented the start and the end of what seemed to my uninformed eye to
> be the relevant error messages, since posting all 46.7 megabytes seemed
> impolite.  :-)  According to grep there are 122034 lines that include
> the word "index" in any combination of case.  "DynaHashTable" appears a
> mere eleven times, all at the bottom.  There are 640582 lines in total.

OK.  I'm mildly interested to know where the empty DynaHashTables are
coming from, but with only 11 of them, that's not your problem.  What is
clearly the problem is the 640,000 relcache entries.  If you've only got
600-odd tables and indexes in the database, something is way wrong with
that :-(.

I suppose what we are looking at here is some operation that is
invalidating a relcache entry but failing to clear it.  It might be the
TRUNCATE you mention, or it might be something else.  In any case that
process must have done it somewhere around 640,000 times to get into
this state :-(.  Can you say anything about the history of SQL commands
issued to this process --- is there anything it does repetitively?  A
table or index DDL operation is what we are looking for, not mere
inserts/updates ...

            regards, tom lane

Re: Postmaster Out of Memory

From
Tom Lane
Date:
Jeff Gold <jgold@mazunetworks.com> writes:
> Hm.  After discussing this with people here we have a hypothesis.  The
> process that issues the TRUNCATE command does something a little
> peculiar: every minute or so twelve distinct functions are overwritten
> using CREATE OR REPLACE FUNCTION.  Perhaps this is causing the
> postmaster backend to fill its relcache.

[ scratches head ... ]  It's hard to see what that would have to do with
the relcache.  I was sort of expecting you to come back and say that you
thought the process might have done 640K TRUNCATEs over its lifespan,
but I guess not?  What about temporary tables?  (Note: you have the list
of tables involved, in the form of that dump you showed --- does
eyeballing the list suggest any pattern to you?  Are there duplicate
table names in the list?)

> To test this Joe created a simple C program that does the same operation
> on slightly changed functions (changed only enough so as not to
> interfere with the real process).  After running this for a little over
> an hour here is what we see from top:

>    PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
> 17033 postgres  16   0  182M 182M  180M R    35.7  4.6  22:33   1 postmaster
> 17032 root       9   0 63056  61M  1336 S     0.5  1.5   0:21   0
> test_function

> The RSS value has been steadily growing the entire time, so I think we
> have our cause.

Um, maybe.  On many systems, "top" counts as part of RSS whatever the
process has touched of the postmaster's shared memory pool.  So it's
fairly common to see RSS slowly grow to exceed the size of the shared
memory area, as the process happens to touch one or another slot of the
shared buffer pool.  Given that you are showing "SHARE" as 180M, I think
it's pretty likely you've got only 2MB of actual local storage in that
backend.

If it's a true leak, then letting it run until RSS is well beyond shared
memory will prove it.

            regards, tom lane

Re: Postmaster Out of Memory

From
Tom Lane
Date:
Jeff Gold <jgold@mazunetworks.com> writes:
>> I was sort of expecting you to come back and say that you
>> thought the process might have done 640K TRUNCATEs over its lifespan,
>> but I guess not?

> That's possible.  The process does twelve TRUNCATEs every minute.  The
> problem we're talking about seems to occur only when the system has been
> running for quite some time.  Still, that's seems like an awefully slow
> rate.  Is it really likely that a postmaster backend would accumulate
> relcache entries in this case?

Well, assuming that TRUNCATE is in fact causing a relcache entry to get
leaked, it's a kilobyte-and-change per entry ... so 12K per minute,
17M or so per day, you'd get to 640000 entries in about 37 days.

> What stands out is that nearly all of the entries seem to be pg_*
> tables.  A "grep '^pg_' crash_dump | wc -l" gives 640499 entries, which
> is pretty nearly the entire file.  We are careful not to name our tables
> with the "pg_" prefix, so I believe these are system tables.  This is
> part of what originally made us believe pg_autovacuum was responsible.

I am not completely sure without looking at the code, but I think
TRUNCATE makes a short-lived table by the name of pg_temp_xxx, so
this observation seems to fit with the idea that TRUNCATE's at fault.
You showed some pg_toast entries in your original list, too, but those
could be subsidiaries of the truncated tables.

I think we have a suspect --- will go look.

            regards, tom lane

Re: Postmaster Out of Memory

From
Tom Lane
Date:
I wrote:
> I think we have a suspect --- will go look.

Jeff, are you doing CLUSTER operations too?

Some preliminary testing says that:

7.4:
CLUSTER leaks a pg_temp_nnn relcache entry per call; if table
  has toast subtable it also leaks a pg_toast_nnn_index entry per call
TRUNCATE on a table with toast leaks a pg_toast_nnn_index entry per call

8.0:
CLUSTER on a table with toast leaks a pg_toast_nnn_index entry per call

Since you showed both pg_temp_nnn and pg_toast_nnn_index items in your
dump, I surmise that there must be a fair number of CLUSTERs going on.

This is probably all actually just one bug --- 7.4's TRUNCATE used the
CLUSTER mechanism while 8.0 doesn't, so that explains why TRUNCATE isn't
showing a leak anymore.  CLUSTER got whacked around too, but it looks
like we only cured half the leak :-(

            regards, tom lane

Re: Postmaster Out of Memory

From
Jeff Gold
Date:
Tom Lane wrote:
> I can absolutely, positively say that that dump is not from the parent
> postmaster.  It's a backend.

That makes sense.  I'm still a bit puzzled about why new clients can't
connect when the problem happens, though.  Does the parent postmaster
need some resource from one of the backends in order to establish a new
connection?  Perhaps this will allow us to understand which one is
producing these dumps.

> I can't really think what would blow out CacheMemoryContext like that.

I have some new information here.  In at least one case it turns out the
statements that provoke the out of memory error are these:

   TRUNCATE TABLE events.current_hostpair1
   Query failed with message ERROR:  out of memory
   DETAIL:  Failed on request of size 268.

One of the things we did around the time we introduced the wrapper
around pg_autovacuum was to change the way we process certain tables so
that instead of dropping a table and re-creating it we would use
TRUNCATE statements.  This is more efficient in our testing.

Could this practice be running one of the backends out of memory
somehow?  Is there a less cache hungry alternative we could try?

> This is quite uninformative.  Are you suggesting that there are many
> many lines about indexes?  Many many lines about DynaHashTable?  Both?

I presented the start and the end of what seemed to my uninformed eye to
be the relevant error messages, since posting all 46.7 megabytes seemed
impolite.  :-)  According to grep there are 122034 lines that include
the word "index" in any combination of case.  "DynaHashTable" appears a
mere eleven times, all at the bottom.  There are 640582 lines in total.

In other words, I'm not sure which of these messages is important or
what I should be looking for.  Any suggestions?

> How many tables/indexes do you have in your database, anyway?

According to a quick-and-dirty script I hacked together we have 484
tables and 141 indexes outside of the system resources created by
PostgreSQL itself.  Here's the script:

   tables=0
   indexes=0
   for schema in $(echo '\dn' | psql mazu postgres | grep '|' | \
                   grep -v Name | sed 's/|.*//' | grep -v pg_ | \
                   grep -v information_schema); do
     tables=$(($tables + $(echo '\dt' $schema.\* | \
                           psql mazu postgres | grep rows | \
                           sed 's/.*(//;s/ rows).*//') ))
     indexes=$(($indexes + $(echo '\di' $schema.\* | \
                             psql mazu postgres | grep rows | \
                             sed 's/.*(//;s/ rows).*//') ))
   done
   echo tables = $tables
   echo indexes = $indexes

> Could this process be running pg_dump, or something else that would try to
> touch them all?

We only use pg_dump for manual backup procedures.  There is no evidence
in our logs that anyone attempted that during the time that the problem
occured.  The process that issues the TRUNCATE command works with only
one of our schemas as far as I can tell.

Thanks for taking the time to respond.

                                                 Jeff

Re: Postmaster Out of Memory

From
Jeff Gold
Date:
Tom Lane wrote:
> I suppose what we are looking at here is some operation that is
> invalidating a relcache entry but failing to clear it.

Hm.  After discussing this with people here we have a hypothesis.  The
process that issues the TRUNCATE command does something a little
peculiar: every minute or so twelve distinct functions are overwritten
using CREATE OR REPLACE FUNCTION.  Perhaps this is causing the
postmaster backend to fill its relcache.

To test this Joe created a simple C program that does the same operation
on slightly changed functions (changed only enough so as not to
interfere with the real process).  After running this for a little over
an hour here is what we see from top:

   PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
17033 postgres  16   0  182M 182M  180M R    35.7  4.6  22:33   1 postmaster
17032 root       9   0 63056  61M  1336 S     0.5  1.5   0:21   0
test_function

The RSS value has been steadily growing the entire time, so I think we
have our cause.  Is this a postmaster memory leak, or is this behavior
intentional?  Either way, what would be the best way to work around it?
  These functions are created in this way because we want them to point
to different tables at different times.

Thanks again for pointing us in the right direction on this.

                                                Jeff

Re: Postmaster Out of Memory

From
Jeff Gold
Date:
Tom Lane wrote:
> I was sort of expecting you to come back and say that you
> thought the process might have done 640K TRUNCATEs over its lifespan,
> but I guess not?

That's possible.  The process does twelve TRUNCATEs every minute.  The
problem we're talking about seems to occur only when the system has been
running for quite some time.  Still, that's seems like an awefully slow
rate.  Is it really likely that a postmaster backend would accumulate
relcache entries in this case?

> What about temporary tables?

We don't use AS TEMP tables, but we have something like thirty tables
that we create, use, drop and eventually re-create.  This gets done
about twice a day or so.

> (Note: you have the list
> of tables involved, in the form of that dump you showed --- does
> eyeballing the list suggest any pattern to you?  Are there duplicate
> table names in the list?)

What stands out is that nearly all of the entries seem to be pg_*
tables.  A "grep '^pg_' crash_dump | wc -l" gives 640499 entries, which
is pretty nearly the entire file.  We are careful not to name our tables
with the "pg_" prefix, so I believe these are system tables.  This is
part of what originally made us believe pg_autovacuum was responsible.

> Given that you are showing "SHARE" as 180M, I think
> it's pretty likely you've got only 2MB of actual local storage in that
> backend.

Ah, good point.  I was assuming that memory was being allocated from a
shared pool, but I have no evidence to support that.

> If it's a true leak, then letting it run until RSS is well beyond shared
> memory will prove it.

We'll give that a try over the weekend and see where it goes.  (We have
since stopped the script because Joe is improving the instrumentation of
the code so that we don't overflow the logs again.)

                                               Jeff

Re: Postmaster Out of Memory

From
Tom Lane
Date:
Jeff Gold <jgold@mazunetworks.com> writes:
> [ backend memory leak ]

Found it --- the actual leak is in index_create, not in TRUNCATE or
CLUSTER at all; and it's been there a really long time.
Patch for 7.4 branch attached.

            regards, tom lane

Index: index.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/catalog/index.c,v
retrieving revision 1.219
diff -c -r1.219 index.c
*** index.c    29 Sep 2003 00:05:24 -0000    1.219
--- index.c    25 Jun 2005 16:22:43 -0000
***************
*** 707,719 ****
      }

      /*
!      * Fill in the index strategy structure with information from the
!      * catalogs.  First we must advance the command counter so that we
!      * will see the newly-entered index catalog tuples.
       */
      CommandCounterIncrement();

!     RelationInitIndexAccessInfo(indexRelation);

      /*
       * If this is bootstrap (initdb) time, then we don't actually fill in
--- 707,727 ----
      }

      /*
!      * Advance the command counter so that we can see the newly-entered
!      * catalog tuples for the index.
       */
      CommandCounterIncrement();

!     /*
!      * In bootstrap mode, we have to fill in the index strategy structure
!      * with information from the catalogs.  If we aren't bootstrapping,
!      * then the relcache entry has already been rebuilt thanks to sinval
!      * update during CommandCounterIncrement.
!      */
!     if (IsBootstrapProcessingMode())
!         RelationInitIndexAccessInfo(indexRelation);
!     else
!         Assert(indexRelation->rd_indexcxt != NULL);

      /*
       * If this is bootstrap (initdb) time, then we don't actually fill in

Re: Postmaster Out of Memory

From
Tom Lane
Date:
Jeff Gold <jgold@mazunetworks.com> writes:
> Excellent!  Does index_create refer to something that is invoked as a
> consequence of CREATE INDEX?  I'm looking through the code on our side
> and can't find any obvious places where we recreate indexes, but I might
> just be missing something.

TRUNCATE and CLUSTER both rebuild indexes, so they'd also trigger the
leak.

            regards, tom lane

Re: Postmaster Out of Memory

From
Jeff Gold
Date:
Tom Lane wrote:
> Found it --- the actual leak is in index_create, not in TRUNCATE or
> CLUSTER at all; and it's been there a really long time.
> Patch for 7.4 branch attached.

Excellent!  Does index_create refer to something that is invoked as a
consequence of CREATE INDEX?  I'm looking through the code on our side
and can't find any obvious places where we recreate indexes, but I might
just be missing something.  Might it be a conseuence of repulating a
recently truncated table instead?  I'll confer with Joe on this new
information tomorrow.  Thanks for taking the time to look into this.

                                         Jeff



Re: Postmaster Out of Memory

From
Jeff Gold
Date:
Tom Lane wrote:
> TRUNCATE and CLUSTER both rebuild indexes, so they'd also trigger the
> leak.

Sorry to bug you again, but I have two quick followup questions: (1) is
the leak you discovered fixed on the 8.0 branch? and (2) would closing
the database connection once per day be a reasonable way to work around
the problem in the absence of the patch you forwarded?

                                             Jeff

Re: Postmaster Out of Memory

From
Tom Lane
Date:
Jeff Gold <jgold@mazunetworks.com> writes:
> Tom Lane wrote:
>> TRUNCATE and CLUSTER both rebuild indexes, so they'd also trigger the
>> leak.

> Sorry to bug you again, but I have two quick followup questions: (1) is
> the leak you discovered fixed on the 8.0 branch? and (2) would closing
> the database connection once per day be a reasonable way to work around
> the problem in the absence of the patch you forwarded?

It is fixed in CVS-tip of all branches back to 7.3, but there is not any
release yet incorporating the fix.  Yes, closing your session and
starting a fresh one is a usable workaround.

            regards, tom lane