Thread: BUG #5599: Vacuum fails due to index corruption issues
The following bug has been logged online: Bug reference: 5599 Logged by: Hitesh Bhambhani Email address: hiteshb@asg.com PostgreSQL version: 8.2.9-1 Operating system: Microsoft Windows Server 2003, Enterprise Edition Description: Vacuum fails due to index corruption issues Details: Hi, We are seeing a problem in our application where the table indexes get corrupted. This application is a Java Webapp with postgre as the backend. The Webapp kicks off Vacuum at regular intervals. After running the application for a while, one of our customers noted that the Vacuum fails and Webapp gets very slow. A re-index of the full database works fine and resolves the issue. But it re-occurs within a day. Based on some logs in the Webapp I can see that there were some errors in truncating relations. Once those errors disappear the index corruption errors start. I'm not sure if there is a connection here. Here is a sample log message from the Webapp that shows the truncate error: 2010-07-08 06:29:54,641 WARN DefaultQuartzScheduler_Worker-4 maintenance.PostgreSqlVacuumer:32 - runVacuumFull(): running VACUUM FULL VERBOSE 2010-07-08 06:29:56,672 ERROR DefaultQuartzScheduler_Worker-4 core.JobRunShell:211 - Job DEFAULT.postgreSqlVacuumJob threw an unhandled Exception: org.springframework.jdbc.BadSqlGrammarException: Hibernate-related JDBC operation; bad SQL grammar []; nested exception is org.postgresql.util.PSQLException: ERROR: could not truncate relation 1663/16403/41274 to 30 blocks: Permission denied After a couple of such truncate errors the Vacuum starts failing due to 'failed to re-find parent key in index', as seen in sample error log below: 2010-07-08 06:44:56,060 ERROR DefaultQuartzScheduler_Worker-1 core.JobRunShell:2 11 - Job DEFAULT.postgreSqlVacuumJob threw an unhandled Exception: org.springframework.jdbc.UncategorizedSQLException: Hibernate-related JDBC operation; uncategorized SQLException for SQL []; SQL state [XX000]; error code [0]; ERROR: failed to re-find parent key in index "pmoinstance_idx_pmotypeid" for deletion target page 30; nested exception is org.postgresql.util.PSQLException: ERROR: failed to re-find parent key in index "pmoinstance_idx_pmotypeid" for deletion target page 30 Here the index "pmoinstance_idx_pmotypeid" is one of several application specific indexes. Once this index corruption issue occurs, the Vacuum job keeps failing until a re-index is done. In the long run, we don't want to keep re-indexing the database as a scheduled job so we would like your help to get to the bottom of this. So how can we avoid these index corruption errors and are there any known causes? Also, please let me know if there is a direct link between the truncate relation errors that I saw preceded the index corruption errors? At the time these Webapp logs were collected, the database was set to produce verbose logging so I don't have database logs, sorry. Please do let me know what other information I can provide to help you diagnose this situation. Thanks for your time. Regards, Hitesh
On Wed, Aug 4, 2010 at 10:47 PM, Hitesh Bhambhani <hiteshb@asg.com> wrote: > > PostgreSQL version: 8.2.9-1 Firstly, the current release of 8.2 is 8.2.17. There are a long list of bugs fixed in those intervening releases including one involving vacuum truncating relations. I don't think it's the same problem but I would recommend updating immediately to 8.2.17. > 2010-07-08 06:29:54,641 =A0WARN DefaultQuartzScheduler_Worker-4 > maintenance.PostgreSqlVacuumer:32 - runVacuumFull(): running VACUUM FULL > VERBOSE Secondly we don't recommend running VACUUM FULL routinely. It should only be necessary in extraordinary circumstances. Normally a plain VACUUM (or VACUUM ANALYZE or VACUUM VERBOSE) should be sufficient as long as it's being run regularly. Regular VACUUM without the "FULL" has much less impact on the system. > 2010-07-08 06:29:56,672 ERROR DefaultQuartzScheduler_Worker-4 > core.JobRunShell:211 - Job DEFAULT.postgreSqlVacuumJob threw an unhandled > Exception: > org.springframework.jdbc.BadSqlGrammarException: Hibernate-related JDBC > operation; > bad SQL grammar []; nested exception is org.postgresql.util.PSQLException: > ERROR: could not truncate relation 1663/16403/41274 to 30 blocks: Permiss= ion > denied "Permission denied" smells like a Windows problem with concurrent file operations. Are you sure you're not running any anti-virus software or backup software which could have these files open and prevent Postgres from performing regular file operations on its files? Many people have reported other problems with anti-virus software in particular. > Also, please let me know if there is a direct link between the truncate > relation errors that I saw preceded the index corruption errors? I'm a bit surprised at that, but I haven't checked what exactly happens when an error occurs truncating files myself so I'm not sure if I should be or not. --=20 greg
Excerpts from Hitesh Bhambhani's message of mié ago 04 17:47:12 -0400 2010: > Based on some logs in the Webapp I can see that there were some errors in > truncating relations. Once those errors disappear the index corruption > errors start. I'm not sure if there is a connection here. There probably is. What kind of relation are the ones unable to truncate? Please see in pg_class where relfilenode = '41274' in this case: > bad SQL grammar []; nested exception is org.postgresql.util.PSQLException: > ERROR: could not truncate relation 1663/16403/41274 to 30 blocks: Permission > denied -- Ãlvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Thu, Aug 5, 2010 at 12:55 PM, Hitesh Bhambhani <hitesh.bhambhani@asg.com> wrote: > [HiteshB] I have noted your recommendation and will work with our Product Management to upgrade to the latest and greatest.Although we can't change the version that the customer has installed (8.2.9-1). > The latest and greatest is 8.4.x which has lots of new features including improvements to VACUUM. 9.0 will be out soon and will replace VACUUM FULL entirely too. However 8.2.17 is the same as the version you're using except with dozens of known bugs fixed and security holes patched. -- greg
QWx2YXJvLCB0aGFua3MgZm9yIHlvdXIgcmVzcG9uc2UuDQpNeSBjb21tZW50 cyBiZWxvdy4uLg0KDQo+IEZyb206IEFsdmFybyBIZXJyZXJhIFttYWlsdG86 YWx2aGVycmVAY29tbWFuZHByb21wdC5jb21dDQo+IFNlbnQ6IFdlZG5lc2Rh eSwgQXVndXN0IDA0LCAyMDEwIDExOjMwIFBNDQo+IFRoZXJlIHByb2JhYmx5 IGlzLiAgV2hhdCBraW5kIG9mIHJlbGF0aW9uIGFyZSB0aGUgb25lcyB1bmFi bGUgdG8gdHJ1bmNhdGU/DQo+IFBsZWFzZSBzZWUgaW4gcGdfY2xhc3Mgd2hl cmUgcmVsZmlsZW5vZGUgPSAnNDEyNzQnIGluIHRoaXMNCj4gY2FzZToNCj4g DQpbSGl0ZXNoQl0gdGhlIHJlbGF0aW9uIGlzIGNhbGxlZCBwbW9pbnN0YW5j ZV9pZHhfcG1vdHlwZWlkLiBUaGlzIGlzIGFuIGluZGV4IG9uIGEgdGFibGUg Y3JlYXRlZCBieSBvdXIgcHJvZHVjdC4NClRoZSBkZWZpbml0aW9uIGZvciB0 aGlzIGluZGV4IGlzOg0KQ1JFQVRFIElOREVYIHBtb2luc3RhbmNlX2lkeF9w bW90eXBlaWQgICBPTiBwbW9pbnN0YW5jZXMgICBVU0lORyBidHJlZSAgIChw bW90eXBlX2lkKTsNCg0KDQoNClJlZ2FyZHMsDQpIaXRlc2gNCg==
Greg, thanks for your answers. My comments below... > From: gsstark@gmail.com [mailto:gsstark@gmail.com] On Behalf Of Greg Stark > Sent: Wednesday, August 04, 2010 9:35 PM > Firstly, the current release of 8.2 is 8.2.17. There are a long list of b= ugs > fixed in those intervening releases including one involving vacuum trunca= ting > relations. I don't think it's the same problem but I would recommend upda= ting > immediately to 8.2.17. >=20 [HiteshB] I have noted your recommendation and will work with our Product M= anagement to upgrade to the latest and greatest. Although we can't change t= he version that the customer has installed (8.2.9-1). >=20 >=20 > Secondly we don't recommend running VACUUM FULL routinely. It should only= be > necessary in extraordinary circumstances. Normally a plain VACUUM (or VAC= UUM > ANALYZE or VACUUM VERBOSE) should be sufficient as long as it's being run > regularly. Regular VACUUM without the "FULL" > has much less impact on the system. >=20 [HiteshB] Point noted. The decision to use VACUUM FULL is something we'll r= e-examine. Could you give an example of what an 'extraordinary circumstance' would be? >=20 > "Permission denied" smells like a Windows problem with concurrent file > operations. Are you sure you're not running any anti-virus software or ba= ckup > software which could have these files open and prevent Postgres from > performing regular file operations on its files? Many people have reported > other problems with anti-virus software in particular. >=20 [HiteshB] We have worked with the customer to exclude the postgre directory= from their antivirus scans. Hopefully we won't see this Permission issue a= gain. Thanks again. Regards, Hitesh
Hitesh Bhambhani <hitesh.bhambhani@asg.com> wrote: > Could you give an example of what an 'extraordinary circumstance' > would be? Normal vacuums will remove old tuples (versions of rows) which can no longer be seen by any transaction, and make that space available for re-use within the PostgreSQL files. It will not normally give space back to the OS, but that's usually a *good* thing, because normally the space will soon be needed again by PostgreSQL, and it would be less efficient to constantly be giving space back and allocating it again. If you neglect to vacuum aggressively enough, or do a mass UPDATE or DELETE which affects a large percentage of your rows, without anticipating that you will need that space again soon, you might want to do aggressive maintenance to shrink the PostgreSQL files. VACUUM FULL will move tuples around within the table to free up space at the end so that it can be released. But wait -- you probably *still* don't want to use VACUUM FULL, because it is *very* slow and will bloat your indexes, requiring a REINDEX to restore decent performance. CLUSTER will rewrite the table without dead space and will rebuild the indexes -- usually much faster than VACUUM FULL. But CLUSTER needs room for a second copy of the table in order to copy it. If you have a very bloated table which you want to shrink and you don't have room for a second copy of it, *that* is the time to consider VACUUM FULL (usually followed by REINDEX). If you ever find you *do* need to run VACUUM FULL, you probably need to re-evaluate your maintenance procedures to see how you can avoid having to do it again. -Kevin
Hitesh Bhambhani <hitesh.bhambhani@asg.com> writes: >> From: Alvaro Herrera [mailto:alvherre@commandprompt.com] >> Sent: Wednesday, August 04, 2010 11:30 PM >> There probably is. What kind of relation are the ones unable to truncate? >> Please see in pg_class where relfilenode = '41274' in this >> case: >> > [HiteshB] the relation is called pmoinstance_idx_pmotypeid. So the relation that couldn't be truncated is indeed the same one complained of in the later message. What it looks like to me is that page 30 was deleted, and then vacuum tried to truncate it off the index altogether, and that failed because of Windows randomness, and then later it's trying to delete page 30 again. Which naturally fails because the parent downlink entry is long gone. But it's odd that it tries to delete page 30 twice. The page should have been in BTP_DELETED state before the truncate attempt, and therefore should still be dead later, so why's it trying again? [ thinks for a bit ... ] I do see a way that could happen. The page could still be live on disk when we reach smgrtruncate(): the update to BTP_DELETED state might only exist in a dirty shared buffer. And lookee here what smgrtruncate does: /* * Get rid of any buffers for the about-to-be-deleted blocks. bufmgr will * just drop them without bothering to write the contents. */ DropRelFileNodeBuffers(reln->smgr_rnode, forknum, isTemp, nblocks); So we throw away the BTP_DELETED state update without ever writing it to disk, and then when the truncate fails, the old page state is still out there, ready to confuse us later. Seems like we need to think harder about recovering from a truncate failure. A few random ideas: 1. Write the dirty buffers before dropping them. Kind of ugly from a performance viewpoint, but simple and safe. 2. Treat truncation failure as a PANIC condition; then WAL replay will fix things. Unpleasant. Actually, worse than unpleasant: I think what the comment in RelationTruncate() is pointing out is that if the failure is repeatable, we'd fail to recover at all. 3. Don't discard the dirty buffers until after successfully truncating. The hard part here is to be sure no other process (like bgwriter) will try to write them in between. I don't see a way to guarantee that, at least not without interlock infrastructure that doesn't exist today. And see also that comment in RelationTruncate(). Seems like the whole problem of coping with truncation failure needs more thought than we've given it. regards, tom lane
Excerpts from Tom Lane's message of jue ago 05 11:06:57 -0400 2010: > Seems like we need to think harder about recovering from a truncate > failure. A few random ideas: Ugh. > 1. Write the dirty buffers before dropping them. Kind of ugly from a > performance viewpoint, but simple and safe. I think "simple" is good, considering that this code is gone in 9.0 and HEAD. IMHO investing too much effort on this problem is not worth it. Perhaps it'd be good to come up with a better solution for HEAD: > 3. Don't discard the dirty buffers until after successfully truncating. > The hard part here is to be sure no other process (like bgwriter) will > try to write them in between. I don't see a way to guarantee that, > at least not without interlock infrastructure that doesn't exist today. -- Ãlvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> writes: > Excerpts from Tom Lane's message of jue ago 05 11:06:57 -0400 2010: >> 1. Write the dirty buffers before dropping them. Kind of ugly from a >> performance viewpoint, but simple and safe. > I think "simple" is good, considering that this code is gone in 9.0 and > HEAD. IMHO investing too much effort on this problem is not worth it. Gone? Looks like it's still there to me. > Perhaps it'd be good to come up with a better solution for HEAD: Yeah, the panic-on-replay aspect is troublesome. I feel like we need a rethink here. But I agree that solution #1 is the only one that feels safe enough for backpatching. regards, tom lane
Excerpts from Tom Lane's message of jue ago 05 12:36:24 -0400 2010: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > Excerpts from Tom Lane's message of jue ago 05 11:06:57 -0400 2010: > >> 1. Write the dirty buffers before dropping them. Kind of ugly from a > >> performance viewpoint, but simple and safe. > > > I think "simple" is good, considering that this code is gone in 9.0 and > > HEAD. IMHO investing too much effort on this problem is not worth it. > > Gone? Looks like it's still there to me. I mean the btree code that does the truncation on vacuum full is truncated. There are other uses for truncation, but it doesn't look to that they are as problematic ... or are they? Hmm, I guess truncation of heap on lazy vacuum is still a problem precisely because page compaction will be forgotten. > > Perhaps it'd be good to come up with a better solution for HEAD: > > Yeah, the panic-on-replay aspect is troublesome. I feel like we need a > rethink here. But I agree that solution #1 is the only one that feels > safe enough for backpatching. -- Ãlvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> writes: > Excerpts from Tom Lane's message of jue ago 05 12:36:24 -0400 2010: >> Gone? Looks like it's still there to me. > I mean the btree code that does the truncation on vacuum full is > truncated. There are other uses for truncation, but it doesn't look to > that they are as problematic ... or are they? I think what Hitesh reported is a special case of a generic problem. > Hmm, I guess truncation of heap on lazy vacuum is still a problem > precisely because page compaction will be forgotten. Page compaction is the least of it :-( Imagine that we have some rows at the end of a table, we delete them, we vacuum before the next checkpoint. Vacuum decides it can now truncate away the last pages, but fails to do so. The original page state is still on disk, which means we have lost the fact of the deletion --- the rows are now effectively live again, though their index entries are probably gone. In any case, the removal of VACUUM FULL didn't completely disable shrinking of btree indexes did it? I don't recall having removed that. regards, tom lane
Excerpts from Tom Lane's message of jue ago 05 13:19:41 -0400 2010: > In any case, the removal of VACUUM FULL didn't completely disable > shrinking of btree indexes did it? I don't recall having removed > that. I see no call to RelationTruncate in the btvacuumscan code, but then it was only called in vacuum full before. I'm not sure how it worked previously with only lazy vacuum. Did we simply leave the pages as free for a later extension? -- Ãlvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> writes: > Excerpts from Tom Lane's message of jue ago 05 13:19:41 -0400 2010: >> In any case, the removal of VACUUM FULL didn't completely disable >> shrinking of btree indexes did it? I don't recall having removed >> that. > I see no call to RelationTruncate in the btvacuumscan code, but then it > was only called in vacuum full before. I'm not sure how it worked > previously with only lazy vacuum. Did we simply leave the pages as free > for a later extension? You're right, I misremembered. That code is just plain gone in 9.0: http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/access/nbtree/nbtree.c.diff?r1=1.174;r2=1.175;f=h Still, we have a live issue with heap truncation during plain VACUUM. However, the scope of the problem seems a lot less than I was thinking. Maybe write-the-buffers-first is a sufficient longterm solution. regards, tom lane
Excerpts from Tom Lane's message of jue ago 05 14:01:15 -0400 2010: > You're right, I misremembered. That code is just plain gone in 9.0: > http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/access/nbtree/nbtree.c.diff?r1=1.174;r2=1.175;f=h > > Still, we have a live issue with heap truncation during plain VACUUM. > However, the scope of the problem seems a lot less than I was thinking. The scope is further reduced by the fact that this only seems to happen on Windows, and then only when the antivirus is messing around with the files. > Maybe write-the-buffers-first is a sufficient longterm solution. Yeah, perhaps it is, though it's a pity that a single platform problem is going to slow down everyone else. -- Ãlvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On 05/08/10 21:28, Alvaro Herrera wrote: > Excerpts from Tom Lane's message of jue ago 05 14:01:15 -0400 2010: >> Maybe write-the-buffers-first is a sufficient longterm solution. > > Yeah, perhaps it is, though it's a pity that a single platform problem > is going to slow down everyone else. How about performing the truncate as usual, but if it fails, fill the truncated portion with zeros instead? Zeroed pages should be handled gracefully, and this would be a very non-invasive fix. Now if the write() to zero the pages fails too, I guess we'll have to panic, but that's not much different from flushing the dirty pages first. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > On 05/08/10 21:28, Alvaro Herrera wrote: >> Excerpts from Tom Lane's message of jue ago 05 14:01:15 -0400 2010: >>> Maybe write-the-buffers-first is a sufficient longterm solution. >> >> Yeah, perhaps it is, though it's a pity that a single platform problem >> is going to slow down everyone else. > How about performing the truncate as usual, but if it fails, fill the > truncated portion with zeros instead? This is blithely assuming that you're going to be able to do anything at all to the file after having failed to truncate it. The permissions problem might be persistent, or you might crash, or several other things. If we could do this safely, we could also do the original #3 solution (write the dirty buffers for the to-be-truncated block range only after failing to truncate). regards, tom lane
On Thu, Aug 5, 2010 at 7:28 PM, Alvaro Herrera <alvherre@commandprompt.com> wrote: > > The scope is further reduced by the fact that this only seems to happen > on Windows, and then only when the antivirus is messing around with the > files. So I suspect this could be triggered lots of ways. Imagine a ZFS volume that runs out of space temporarily. Even truncate would need additional blocks to replace the meta information. Or a network filesystem like AFS where your kerberos tickets have expired and you get a permission failure until they've been renewed. Or, in the case of a very large table being truncated I suspect there's a CHECK_FOR_INTERRUPTS lying around that can cancel the backend at the wrong time. It would be nice to have a regression test harness that caused system calls to fail randomly -- the difficult part would be testing the results. -- greg