Thread: Postmaster Out of Memory
Joe Maldonado and I have a vexing problem with PostgreSQL 7.4.5 (we are using the PGDG RPMs on a RedHat 9 system). He posted about this briefly with the subject "info on strange error messages on postgresql" on this list, but I'll start from the beginning. About once per week the database enters some pathological state where the parent postmaster -- NOT one of the child connections -- appears to run out of memory. This causes it to dump megabytes of diagnostic data that I don't know how to analyze into the log. Then it does it all over again. And again and again and... well, eventually the log grows until it fills as many gigabytes as the disk has available. Here is an example from the logfile: TopMemoryContext: 87552048 total in 10683 blocks; 181280 free (60 chunks); 87370768 used TopTransactionContext: 57344 total in 3 blocks; 648 free (5 chunks); 56696 used DeferredTriggerXact: 0 total in 0 blocks; 0 free (0 chunks); 0 used SPI Plan: 3072 total in 2 blocks; 64 free (0 chunks); 3008 used SPI Plan: 3072 total in 2 blocks; 64 free (0 chunks); 3008 used SPI Plan: 3072 total in 2 blocks; 64 free (0 chunks); 3008 used MessageContext: 8192 total in 1 blocks; 7248 free (1 chunks); 944 used PortalMemory: 8192 total in 1 blocks; 8040 free (0 chunks); 152 used PortalHeapMemory: 3072 total in 2 blocks; 1136 free (12 chunks); 1936 used CacheMemoryContext: 421519360 total in 60 blocks; 830096 free (457 chunks); 420689264 used current_hostpairs2_idx: 1024 total in 1 blocks; 640 free (0 chunks); 384 used current_hostpair2_eid_key: 2048 total in 1 blocks; 704 free (0 chunks); 1344 used pg_temp_260223: 2048 total in 1 blocks; 768 free (0 chunks); 1280 used pg_temp_260225: 1024 total in 1 blocks; 640 free (0 chunks); 384 used pg_temp_260226: 1024 total in 1 blocks; 640 free (0 chunks); 384 used pg_toast_888405987_index: 1024 total in 1 blocks; 320 free (0 chunks); 704 used pg_toast_888405987_index: 1024 total in 1 blocks; 320 free (0 chunks); 704 used pg_temp_260196: 1024 total in 1 blocks; 640 free (0 chunks); 384 used pg_toast_888382258_index: 1024 total in 1 blocks; 320 free (0 chunks); 704 used pg_toast_888382258_index: 1024 total in 1 blocks; 320 free (0 chunks); 704 used [...] DynaHashTable: 0 total in 0 blocks; 0 free (0 chunks); 0 used DynaHashTable: 0 total in 0 blocks; 0 free (0 chunks); 0 used DynaHashTable: 0 total in 0 blocks; 0 free (0 chunks); 0 used DynaHashTable: 0 total in 0 blocks; 0 free (0 chunks); 0 used DynaHashTable: 0 total in 0 blocks; 0 free (0 chunks); 0 used ErrorContext: 8192 total in 1 blocks; 8176 free (0 chunks); 16 used Eventually the following is reported in the log: [1587] ERROR: out of memory DETAIL: Failed on request of size 268. A quick look at the code shows that this comes from the AllocSetAlloc() routine at line 502 of src/backend/utils/mmgr/aset.c from the 7.4.5 release. This doesn't tell me much about what happened beyond the fact that malloc() failed. I see no evidence in /var/log/messages to indicate a system wide memory exhaustion and a periodic sampling of the amount of available swap space shows that there is always at least a gigabyte of that available. This suggests that a postmaster process has simply run out of address space somehow. In each case the process identifier is the same (1587 in this instance). Other processes are unable to establish new connections, but the log shows that existing connections continue to issue queries. This leads us to believe that it is the parent process which is running out of memory rather than any child forked for a specific connection. Is this possible? What might lead to such a case? There is some history here. We originally discovered that the database would develop long term problems because system tables weren't being vacuumed properly so we decided to deply pg_autovacuum. This didn't work out because we have temporary tables that get periodically deleted. Every so often pg_autovacuum would notice that a table was missing and throw up its hands in despair. We attempted to address this by implementing a wrapper that restarted it whenever it terminates. After this change the problem described above began. We have subsequently added a delay to the wrapper that doubles in duration each time pg_autovacuum crashes to avoid hammering the system when something is wrong. That might make the problem less catestrophic, but of course it won't really solve it. After all, the memory problem is in the parent postmaster, not pg_autovacuum. Why should ordinary vacuuming run the system out of memory? What should we do to make things work reliably? Jeff
Jeff Gold <jgold@mazunetworks.com> writes: > About once per week the > database enters some pathological state where the parent postmaster -- > NOT one of the child connections -- appears to run out of memory. I can absolutely, positively say that that dump is not from the parent postmaster. It's a backend. Apparently it's a backend that is creating an unreasonable number of hashtables --- the default location for hashtable overhead is TopMemoryContext, and it looks like that explains this: > TopMemoryContext: 87552048 total in 10683 blocks; 181280 free (60 > chunks); 87370768 used However this is a bit odd: > CacheMemoryContext: 421519360 total in 60 blocks; 830096 free (457 > chunks); 420689264 used I can't really think what would blow out CacheMemoryContext like that. > pg_temp_260196: 1024 total in 1 blocks; 640 free (0 chunks); 384 used > pg_toast_888382258_index: 1024 total in 1 blocks; 320 free (0 chunks); > 704 used > pg_toast_888382258_index: 1024 total in 1 blocks; 320 free (0 chunks); > 704 used > [...] > DynaHashTable: 0 total in 0 blocks; 0 free (0 chunks); 0 used > DynaHashTable: 0 total in 0 blocks; 0 free (0 chunks); 0 used > DynaHashTable: 0 total in 0 blocks; 0 free (0 chunks); 0 used > DynaHashTable: 0 total in 0 blocks; 0 free (0 chunks); 0 used > DynaHashTable: 0 total in 0 blocks; 0 free (0 chunks); 0 used This is quite uninformative. Are you suggesting that there are many many lines about indexes? Many many lines about DynaHashTable? Both? How many tables/indexes do you have in your database, anyway? Could this process be running pg_dump, or something else that would try to touch them all? regards, tom lane
Jeff Gold <jgold@mazunetworks.com> writes: > I presented the start and the end of what seemed to my uninformed eye to > be the relevant error messages, since posting all 46.7 megabytes seemed > impolite. :-) According to grep there are 122034 lines that include > the word "index" in any combination of case. "DynaHashTable" appears a > mere eleven times, all at the bottom. There are 640582 lines in total. OK. I'm mildly interested to know where the empty DynaHashTables are coming from, but with only 11 of them, that's not your problem. What is clearly the problem is the 640,000 relcache entries. If you've only got 600-odd tables and indexes in the database, something is way wrong with that :-(. I suppose what we are looking at here is some operation that is invalidating a relcache entry but failing to clear it. It might be the TRUNCATE you mention, or it might be something else. In any case that process must have done it somewhere around 640,000 times to get into this state :-(. Can you say anything about the history of SQL commands issued to this process --- is there anything it does repetitively? A table or index DDL operation is what we are looking for, not mere inserts/updates ... regards, tom lane
Jeff Gold <jgold@mazunetworks.com> writes: > Hm. After discussing this with people here we have a hypothesis. The > process that issues the TRUNCATE command does something a little > peculiar: every minute or so twelve distinct functions are overwritten > using CREATE OR REPLACE FUNCTION. Perhaps this is causing the > postmaster backend to fill its relcache. [ scratches head ... ] It's hard to see what that would have to do with the relcache. I was sort of expecting you to come back and say that you thought the process might have done 640K TRUNCATEs over its lifespan, but I guess not? What about temporary tables? (Note: you have the list of tables involved, in the form of that dump you showed --- does eyeballing the list suggest any pattern to you? Are there duplicate table names in the list?) > To test this Joe created a simple C program that does the same operation > on slightly changed functions (changed only enough so as not to > interfere with the real process). After running this for a little over > an hour here is what we see from top: > PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND > 17033 postgres 16 0 182M 182M 180M R 35.7 4.6 22:33 1 postmaster > 17032 root 9 0 63056 61M 1336 S 0.5 1.5 0:21 0 > test_function > The RSS value has been steadily growing the entire time, so I think we > have our cause. Um, maybe. On many systems, "top" counts as part of RSS whatever the process has touched of the postmaster's shared memory pool. So it's fairly common to see RSS slowly grow to exceed the size of the shared memory area, as the process happens to touch one or another slot of the shared buffer pool. Given that you are showing "SHARE" as 180M, I think it's pretty likely you've got only 2MB of actual local storage in that backend. If it's a true leak, then letting it run until RSS is well beyond shared memory will prove it. regards, tom lane
Jeff Gold <jgold@mazunetworks.com> writes: >> I was sort of expecting you to come back and say that you >> thought the process might have done 640K TRUNCATEs over its lifespan, >> but I guess not? > That's possible. The process does twelve TRUNCATEs every minute. The > problem we're talking about seems to occur only when the system has been > running for quite some time. Still, that's seems like an awefully slow > rate. Is it really likely that a postmaster backend would accumulate > relcache entries in this case? Well, assuming that TRUNCATE is in fact causing a relcache entry to get leaked, it's a kilobyte-and-change per entry ... so 12K per minute, 17M or so per day, you'd get to 640000 entries in about 37 days. > What stands out is that nearly all of the entries seem to be pg_* > tables. A "grep '^pg_' crash_dump | wc -l" gives 640499 entries, which > is pretty nearly the entire file. We are careful not to name our tables > with the "pg_" prefix, so I believe these are system tables. This is > part of what originally made us believe pg_autovacuum was responsible. I am not completely sure without looking at the code, but I think TRUNCATE makes a short-lived table by the name of pg_temp_xxx, so this observation seems to fit with the idea that TRUNCATE's at fault. You showed some pg_toast entries in your original list, too, but those could be subsidiaries of the truncated tables. I think we have a suspect --- will go look. regards, tom lane
I wrote: > I think we have a suspect --- will go look. Jeff, are you doing CLUSTER operations too? Some preliminary testing says that: 7.4: CLUSTER leaks a pg_temp_nnn relcache entry per call; if table has toast subtable it also leaks a pg_toast_nnn_index entry per call TRUNCATE on a table with toast leaks a pg_toast_nnn_index entry per call 8.0: CLUSTER on a table with toast leaks a pg_toast_nnn_index entry per call Since you showed both pg_temp_nnn and pg_toast_nnn_index items in your dump, I surmise that there must be a fair number of CLUSTERs going on. This is probably all actually just one bug --- 7.4's TRUNCATE used the CLUSTER mechanism while 8.0 doesn't, so that explains why TRUNCATE isn't showing a leak anymore. CLUSTER got whacked around too, but it looks like we only cured half the leak :-( regards, tom lane
Tom Lane wrote: > I can absolutely, positively say that that dump is not from the parent > postmaster. It's a backend. That makes sense. I'm still a bit puzzled about why new clients can't connect when the problem happens, though. Does the parent postmaster need some resource from one of the backends in order to establish a new connection? Perhaps this will allow us to understand which one is producing these dumps. > I can't really think what would blow out CacheMemoryContext like that. I have some new information here. In at least one case it turns out the statements that provoke the out of memory error are these: TRUNCATE TABLE events.current_hostpair1 Query failed with message ERROR: out of memory DETAIL: Failed on request of size 268. One of the things we did around the time we introduced the wrapper around pg_autovacuum was to change the way we process certain tables so that instead of dropping a table and re-creating it we would use TRUNCATE statements. This is more efficient in our testing. Could this practice be running one of the backends out of memory somehow? Is there a less cache hungry alternative we could try? > This is quite uninformative. Are you suggesting that there are many > many lines about indexes? Many many lines about DynaHashTable? Both? I presented the start and the end of what seemed to my uninformed eye to be the relevant error messages, since posting all 46.7 megabytes seemed impolite. :-) According to grep there are 122034 lines that include the word "index" in any combination of case. "DynaHashTable" appears a mere eleven times, all at the bottom. There are 640582 lines in total. In other words, I'm not sure which of these messages is important or what I should be looking for. Any suggestions? > How many tables/indexes do you have in your database, anyway? According to a quick-and-dirty script I hacked together we have 484 tables and 141 indexes outside of the system resources created by PostgreSQL itself. Here's the script: tables=0 indexes=0 for schema in $(echo '\dn' | psql mazu postgres | grep '|' | \ grep -v Name | sed 's/|.*//' | grep -v pg_ | \ grep -v information_schema); do tables=$(($tables + $(echo '\dt' $schema.\* | \ psql mazu postgres | grep rows | \ sed 's/.*(//;s/ rows).*//') )) indexes=$(($indexes + $(echo '\di' $schema.\* | \ psql mazu postgres | grep rows | \ sed 's/.*(//;s/ rows).*//') )) done echo tables = $tables echo indexes = $indexes > Could this process be running pg_dump, or something else that would try to > touch them all? We only use pg_dump for manual backup procedures. There is no evidence in our logs that anyone attempted that during the time that the problem occured. The process that issues the TRUNCATE command works with only one of our schemas as far as I can tell. Thanks for taking the time to respond. Jeff
Tom Lane wrote: > I suppose what we are looking at here is some operation that is > invalidating a relcache entry but failing to clear it. Hm. After discussing this with people here we have a hypothesis. The process that issues the TRUNCATE command does something a little peculiar: every minute or so twelve distinct functions are overwritten using CREATE OR REPLACE FUNCTION. Perhaps this is causing the postmaster backend to fill its relcache. To test this Joe created a simple C program that does the same operation on slightly changed functions (changed only enough so as not to interfere with the real process). After running this for a little over an hour here is what we see from top: PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND 17033 postgres 16 0 182M 182M 180M R 35.7 4.6 22:33 1 postmaster 17032 root 9 0 63056 61M 1336 S 0.5 1.5 0:21 0 test_function The RSS value has been steadily growing the entire time, so I think we have our cause. Is this a postmaster memory leak, or is this behavior intentional? Either way, what would be the best way to work around it? These functions are created in this way because we want them to point to different tables at different times. Thanks again for pointing us in the right direction on this. Jeff
Tom Lane wrote: > I was sort of expecting you to come back and say that you > thought the process might have done 640K TRUNCATEs over its lifespan, > but I guess not? That's possible. The process does twelve TRUNCATEs every minute. The problem we're talking about seems to occur only when the system has been running for quite some time. Still, that's seems like an awefully slow rate. Is it really likely that a postmaster backend would accumulate relcache entries in this case? > What about temporary tables? We don't use AS TEMP tables, but we have something like thirty tables that we create, use, drop and eventually re-create. This gets done about twice a day or so. > (Note: you have the list > of tables involved, in the form of that dump you showed --- does > eyeballing the list suggest any pattern to you? Are there duplicate > table names in the list?) What stands out is that nearly all of the entries seem to be pg_* tables. A "grep '^pg_' crash_dump | wc -l" gives 640499 entries, which is pretty nearly the entire file. We are careful not to name our tables with the "pg_" prefix, so I believe these are system tables. This is part of what originally made us believe pg_autovacuum was responsible. > Given that you are showing "SHARE" as 180M, I think > it's pretty likely you've got only 2MB of actual local storage in that > backend. Ah, good point. I was assuming that memory was being allocated from a shared pool, but I have no evidence to support that. > If it's a true leak, then letting it run until RSS is well beyond shared > memory will prove it. We'll give that a try over the weekend and see where it goes. (We have since stopped the script because Joe is improving the instrumentation of the code so that we don't overflow the logs again.) Jeff
Jeff Gold <jgold@mazunetworks.com> writes: > [ backend memory leak ] Found it --- the actual leak is in index_create, not in TRUNCATE or CLUSTER at all; and it's been there a really long time. Patch for 7.4 branch attached. regards, tom lane Index: index.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/catalog/index.c,v retrieving revision 1.219 diff -c -r1.219 index.c *** index.c 29 Sep 2003 00:05:24 -0000 1.219 --- index.c 25 Jun 2005 16:22:43 -0000 *************** *** 707,719 **** } /* ! * Fill in the index strategy structure with information from the ! * catalogs. First we must advance the command counter so that we ! * will see the newly-entered index catalog tuples. */ CommandCounterIncrement(); ! RelationInitIndexAccessInfo(indexRelation); /* * If this is bootstrap (initdb) time, then we don't actually fill in --- 707,727 ---- } /* ! * Advance the command counter so that we can see the newly-entered ! * catalog tuples for the index. */ CommandCounterIncrement(); ! /* ! * In bootstrap mode, we have to fill in the index strategy structure ! * with information from the catalogs. If we aren't bootstrapping, ! * then the relcache entry has already been rebuilt thanks to sinval ! * update during CommandCounterIncrement. ! */ ! if (IsBootstrapProcessingMode()) ! RelationInitIndexAccessInfo(indexRelation); ! else ! Assert(indexRelation->rd_indexcxt != NULL); /* * If this is bootstrap (initdb) time, then we don't actually fill in
Jeff Gold <jgold@mazunetworks.com> writes: > Excellent! Does index_create refer to something that is invoked as a > consequence of CREATE INDEX? I'm looking through the code on our side > and can't find any obvious places where we recreate indexes, but I might > just be missing something. TRUNCATE and CLUSTER both rebuild indexes, so they'd also trigger the leak. regards, tom lane
Tom Lane wrote: > Found it --- the actual leak is in index_create, not in TRUNCATE or > CLUSTER at all; and it's been there a really long time. > Patch for 7.4 branch attached. Excellent! Does index_create refer to something that is invoked as a consequence of CREATE INDEX? I'm looking through the code on our side and can't find any obvious places where we recreate indexes, but I might just be missing something. Might it be a conseuence of repulating a recently truncated table instead? I'll confer with Joe on this new information tomorrow. Thanks for taking the time to look into this. Jeff
Tom Lane wrote: > TRUNCATE and CLUSTER both rebuild indexes, so they'd also trigger the > leak. Sorry to bug you again, but I have two quick followup questions: (1) is the leak you discovered fixed on the 8.0 branch? and (2) would closing the database connection once per day be a reasonable way to work around the problem in the absence of the patch you forwarded? Jeff
Jeff Gold <jgold@mazunetworks.com> writes: > Tom Lane wrote: >> TRUNCATE and CLUSTER both rebuild indexes, so they'd also trigger the >> leak. > Sorry to bug you again, but I have two quick followup questions: (1) is > the leak you discovered fixed on the 8.0 branch? and (2) would closing > the database connection once per day be a reasonable way to work around > the problem in the absence of the patch you forwarded? It is fixed in CVS-tip of all branches back to 7.3, but there is not any release yet incorporating the fix. Yes, closing your session and starting a fresh one is a usable workaround. regards, tom lane