Thread: BUG #5599: Vacuum fails due to index corruption issues

BUG #5599: Vacuum fails due to index corruption issues

From
"Hitesh Bhambhani"
Date:
The following bug has been logged online:

Bug reference:      5599
Logged by:          Hitesh Bhambhani
Email address:      hiteshb@asg.com
PostgreSQL version: 8.2.9-1
Operating system:   Microsoft Windows Server 2003, Enterprise Edition
Description:        Vacuum fails due to index corruption issues
Details:

Hi,

We are seeing a problem in our application where the table indexes get
corrupted. This application is a Java Webapp with postgre as the backend.
The Webapp kicks off Vacuum at regular intervals. After running the
application for a while, one of our customers noted that the Vacuum fails
and Webapp gets very slow.

A re-index of the full database works fine and resolves the issue. But it
re-occurs within a day.

Based on some logs in the Webapp I can see that there were some errors in
truncating relations. Once those errors disappear the index corruption
errors start. I'm not sure if there is a connection here.

Here is a sample log message from the Webapp that shows the truncate error:

2010-07-08 06:29:54,641  WARN DefaultQuartzScheduler_Worker-4
maintenance.PostgreSqlVacuumer:32 - runVacuumFull(): running VACUUM FULL
VERBOSE
2010-07-08 06:29:56,672 ERROR DefaultQuartzScheduler_Worker-4
core.JobRunShell:211 - Job DEFAULT.postgreSqlVacuumJob threw an unhandled
Exception:
org.springframework.jdbc.BadSqlGrammarException: Hibernate-related JDBC
operation;
bad SQL grammar []; nested exception is org.postgresql.util.PSQLException:
ERROR: could not truncate relation 1663/16403/41274 to 30 blocks: Permission
denied

After a couple of such truncate errors the Vacuum starts failing due to
'failed to re-find parent key in index', as seen in sample error log below:

2010-07-08 06:44:56,060 ERROR DefaultQuartzScheduler_Worker-1
core.JobRunShell:2
11 - Job DEFAULT.postgreSqlVacuumJob threw an unhandled Exception:
org.springframework.jdbc.UncategorizedSQLException: Hibernate-related JDBC
operation;
uncategorized SQLException for SQL []; SQL state [XX000]; error code [0];
ERROR: failed to re-find parent key in index "pmoinstance_idx_pmotypeid" for
deletion target page 30;
nested exception is org.postgresql.util.PSQLException: ERROR: failed to
re-find parent key in index "pmoinstance_idx_pmotypeid" for deletion target
page 30

Here the index "pmoinstance_idx_pmotypeid" is one of several application
specific indexes.

Once this index corruption issue occurs, the Vacuum job keeps failing until
a re-index is done. In the long run, we don't want to keep re-indexing the
database as a scheduled job so we would like your help to get to the bottom
of this.

So how can we avoid these index corruption errors and are there any known
causes?

Also, please let me know if there is a direct link between the truncate
relation errors that I saw preceded the index corruption errors?

At the time these Webapp logs were collected, the database was set to
produce verbose logging so I don't have database logs, sorry.  Please do let
me know what other information I can provide to help you diagnose this
situation.

Thanks for your time.

Regards,
Hitesh

Re: BUG #5599: Vacuum fails due to index corruption issues

From
Greg Stark
Date:
On Wed, Aug 4, 2010 at 10:47 PM, Hitesh Bhambhani <hiteshb@asg.com> wrote:
>

> PostgreSQL version: 8.2.9-1

Firstly, the current release of 8.2 is 8.2.17. There are a long list
of bugs fixed in those intervening releases including one involving
vacuum truncating relations. I don't think it's the same problem but I
would recommend updating immediately to 8.2.17.

> 2010-07-08 06:29:54,641 =A0WARN DefaultQuartzScheduler_Worker-4
> maintenance.PostgreSqlVacuumer:32 - runVacuumFull(): running VACUUM FULL
> VERBOSE

Secondly we don't recommend running VACUUM FULL routinely. It should
only be necessary in extraordinary circumstances. Normally a plain
VACUUM (or VACUUM ANALYZE or VACUUM VERBOSE) should be sufficient as
long as it's being run regularly. Regular VACUUM without the "FULL"
has much less impact on the system.

> 2010-07-08 06:29:56,672 ERROR DefaultQuartzScheduler_Worker-4
> core.JobRunShell:211 - Job DEFAULT.postgreSqlVacuumJob threw an unhandled
> Exception:
> org.springframework.jdbc.BadSqlGrammarException: Hibernate-related JDBC
> operation;
> bad SQL grammar []; nested exception is org.postgresql.util.PSQLException:
> ERROR: could not truncate relation 1663/16403/41274 to 30 blocks: Permiss=
ion
> denied

"Permission denied" smells like a Windows problem with concurrent file
operations. Are you sure you're not running any anti-virus software or
backup software which could have these files open and prevent Postgres
from performing regular file operations on its files? Many people have
reported other problems with anti-virus software in particular.


> Also, please let me know if there is a direct link between the truncate
> relation errors that I saw preceded the index corruption errors?

I'm a bit surprised at that, but I haven't checked what exactly
happens when an error occurs truncating files myself so I'm not sure
if I should be or not.

--=20
greg

Re: BUG #5599: Vacuum fails due to index corruption issues

From
Alvaro Herrera
Date:
Excerpts from Hitesh Bhambhani's message of mié ago 04 17:47:12 -0400 2010:

> Based on some logs in the Webapp I can see that there were some errors in
> truncating relations. Once those errors disappear the index corruption
> errors start. I'm not sure if there is a connection here.

There probably is.  What kind of relation are the ones unable to
truncate?  Please see in pg_class where relfilenode = '41274' in this
case:

> bad SQL grammar []; nested exception is org.postgresql.util.PSQLException:
> ERROR: could not truncate relation 1663/16403/41274 to 30 blocks: Permission
> denied

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: BUG #5599: Vacuum fails due to index corruption issues

From
Greg Stark
Date:
On Thu, Aug 5, 2010 at 12:55 PM, Hitesh Bhambhani
<hitesh.bhambhani@asg.com> wrote:
> [HiteshB] I have noted your recommendation and will work with our Product Management to upgrade to the latest and
greatest.Although we can't change the version that the customer has installed (8.2.9-1). 
>

The latest and greatest is 8.4.x which has lots of new features
including improvements to VACUUM. 9.0 will be out soon and will
replace VACUUM FULL entirely too.

However 8.2.17 is the same as the version you're using except with
dozens of known bugs fixed and security holes patched.

--
greg

Re: BUG #5599: Vacuum fails due to index corruption issues

From
Hitesh Bhambhani
Date:
QWx2YXJvLCB0aGFua3MgZm9yIHlvdXIgcmVzcG9uc2UuDQpNeSBjb21tZW50
cyBiZWxvdy4uLg0KDQo+IEZyb206IEFsdmFybyBIZXJyZXJhIFttYWlsdG86
YWx2aGVycmVAY29tbWFuZHByb21wdC5jb21dDQo+IFNlbnQ6IFdlZG5lc2Rh
eSwgQXVndXN0IDA0LCAyMDEwIDExOjMwIFBNDQo+IFRoZXJlIHByb2JhYmx5
IGlzLiAgV2hhdCBraW5kIG9mIHJlbGF0aW9uIGFyZSB0aGUgb25lcyB1bmFi
bGUgdG8gdHJ1bmNhdGU/DQo+IFBsZWFzZSBzZWUgaW4gcGdfY2xhc3Mgd2hl
cmUgcmVsZmlsZW5vZGUgPSAnNDEyNzQnIGluIHRoaXMNCj4gY2FzZToNCj4g
DQpbSGl0ZXNoQl0gdGhlIHJlbGF0aW9uIGlzIGNhbGxlZCBwbW9pbnN0YW5j
ZV9pZHhfcG1vdHlwZWlkLiBUaGlzIGlzIGFuIGluZGV4IG9uIGEgdGFibGUg
Y3JlYXRlZCBieSBvdXIgcHJvZHVjdC4NClRoZSBkZWZpbml0aW9uIGZvciB0
aGlzIGluZGV4IGlzOg0KQ1JFQVRFIElOREVYIHBtb2luc3RhbmNlX2lkeF9w
bW90eXBlaWQgICBPTiBwbW9pbnN0YW5jZXMgICBVU0lORyBidHJlZSAgIChw
bW90eXBlX2lkKTsNCg0KDQoNClJlZ2FyZHMsDQpIaXRlc2gNCg==

Re: BUG #5599: Vacuum fails due to index corruption issues

From
Hitesh Bhambhani
Date:
Greg, thanks for your answers.

My comments below...

> From: gsstark@gmail.com [mailto:gsstark@gmail.com] On Behalf Of Greg Stark
> Sent: Wednesday, August 04, 2010 9:35 PM
> Firstly, the current release of 8.2 is 8.2.17. There are a long list of b=
ugs
> fixed in those intervening releases including one involving vacuum trunca=
ting
> relations. I don't think it's the same problem but I would recommend upda=
ting
> immediately to 8.2.17.
>=20
[HiteshB] I have noted your recommendation and will work with our Product M=
anagement to upgrade to the latest and greatest. Although we can't change t=
he version that the customer has installed (8.2.9-1).

>=20
>=20
> Secondly we don't recommend running VACUUM FULL routinely. It should only=
 be
> necessary in extraordinary circumstances. Normally a plain VACUUM (or VAC=
UUM
> ANALYZE or VACUUM VERBOSE) should be sufficient as long as it's being run
> regularly. Regular VACUUM without the "FULL"
> has much less impact on the system.
>=20
[HiteshB] Point noted. The decision to use VACUUM FULL is something we'll r=
e-examine.
Could you give an example of what an 'extraordinary circumstance' would be?

>=20
> "Permission denied" smells like a Windows problem with concurrent file
> operations. Are you sure you're not running any anti-virus software or ba=
ckup
> software which could have these files open and prevent Postgres from
> performing regular file operations on its files? Many people have reported
> other problems with anti-virus software in particular.
>=20
[HiteshB] We have worked with the customer to exclude the postgre directory=
 from their antivirus scans. Hopefully we won't see this Permission issue a=
gain.

Thanks again.
Regards,
Hitesh

Re: BUG #5599: Vacuum fails due to index corruption issues

From
"Kevin Grittner"
Date:
Hitesh Bhambhani <hitesh.bhambhani@asg.com> wrote:

> Could you give an example of what an 'extraordinary circumstance'
> would be?

Normal vacuums will remove old tuples (versions of rows) which can
no longer be seen by any transaction, and make that space available
for re-use within the PostgreSQL files.  It will not normally give
space back to the OS, but that's usually a *good* thing, because
normally the space will soon be needed again by PostgreSQL, and it
would be less efficient to constantly be giving space back and
allocating it again.

If you neglect to vacuum aggressively enough, or do a mass UPDATE or
DELETE which affects a large percentage of your rows, without
anticipating that you will need that space again soon, you might
want to do aggressive maintenance to shrink the PostgreSQL files.
VACUUM FULL will move tuples around within the table to free up
space at the end so that it can be released.  But wait -- you
probably *still* don't want to use VACUUM FULL, because it is *very*
slow and will bloat your indexes, requiring a REINDEX to restore
decent performance.  CLUSTER will rewrite the table without dead
space and will rebuild the indexes -- usually much faster than
VACUUM FULL.  But CLUSTER needs room for a second copy of the table
in order to copy it.  If you have a very bloated table which you
want to shrink and you don't have room for a second copy of it,
*that* is the time to consider VACUUM FULL (usually followed by
REINDEX).

If you ever find you *do* need to run VACUUM FULL, you probably need
to re-evaluate your maintenance procedures to see how you can avoid
having to do it again.

-Kevin

Re: BUG #5599: Vacuum fails due to index corruption issues

From
Tom Lane
Date:
Hitesh Bhambhani <hitesh.bhambhani@asg.com> writes:
>> From: Alvaro Herrera [mailto:alvherre@commandprompt.com]
>> Sent: Wednesday, August 04, 2010 11:30 PM
>> There probably is.  What kind of relation are the ones unable to truncate?
>> Please see in pg_class where relfilenode = '41274' in this
>> case:
>>
> [HiteshB] the relation is called pmoinstance_idx_pmotypeid.

So the relation that couldn't be truncated is indeed the same one
complained of in the later message.

What it looks like to me is that page 30 was deleted, and then vacuum
tried to truncate it off the index altogether, and that failed because
of Windows randomness, and then later it's trying to delete page 30
again.  Which naturally fails because the parent downlink entry is long
gone.  But it's odd that it tries to delete page 30 twice.  The page
should have been in BTP_DELETED state before the truncate attempt, and
therefore should still be dead later, so why's it trying again?

[ thinks for a bit ... ]  I do see a way that could happen.  The page
could still be live on disk when we reach smgrtruncate(): the update
to BTP_DELETED state might only exist in a dirty shared buffer.  And
lookee here what smgrtruncate does:

    /*
     * Get rid of any buffers for the about-to-be-deleted blocks. bufmgr will
     * just drop them without bothering to write the contents.
     */
    DropRelFileNodeBuffers(reln->smgr_rnode, forknum, isTemp, nblocks);

So we throw away the BTP_DELETED state update without ever writing it to
disk, and then when the truncate fails, the old page state is still out
there, ready to confuse us later.

Seems like we need to think harder about recovering from a truncate
failure.  A few random ideas:

1. Write the dirty buffers before dropping them.  Kind of ugly from a
performance viewpoint, but simple and safe.

2. Treat truncation failure as a PANIC condition; then WAL replay will
fix things.  Unpleasant.  Actually, worse than unpleasant: I think what
the comment in RelationTruncate() is pointing out is that if the
failure is repeatable, we'd fail to recover at all.

3. Don't discard the dirty buffers until after successfully truncating.
The hard part here is to be sure no other process (like bgwriter) will
try to write them in between.  I don't see a way to guarantee that,
at least not without interlock infrastructure that doesn't exist today.

And see also that comment in RelationTruncate().  Seems like the whole
problem of coping with truncation failure needs more thought than we've
given it.

            regards, tom lane

Re: BUG #5599: Vacuum fails due to index corruption issues

From
Alvaro Herrera
Date:
Excerpts from Tom Lane's message of jue ago 05 11:06:57 -0400 2010:

> Seems like we need to think harder about recovering from a truncate
> failure.  A few random ideas:

Ugh.

> 1. Write the dirty buffers before dropping them.  Kind of ugly from a
> performance viewpoint, but simple and safe.

I think "simple" is good, considering that this code is gone in 9.0 and
HEAD.  IMHO investing too much effort on this problem is not worth it.


Perhaps it'd be good to come up with a better solution for HEAD:

> 3. Don't discard the dirty buffers until after successfully truncating.
> The hard part here is to be sure no other process (like bgwriter) will
> try to write them in between.  I don't see a way to guarantee that,
> at least not without interlock infrastructure that doesn't exist today.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: BUG #5599: Vacuum fails due to index corruption issues

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Excerpts from Tom Lane's message of jue ago 05 11:06:57 -0400 2010:
>> 1. Write the dirty buffers before dropping them.  Kind of ugly from a
>> performance viewpoint, but simple and safe.

> I think "simple" is good, considering that this code is gone in 9.0 and
> HEAD.  IMHO investing too much effort on this problem is not worth it.

Gone?  Looks like it's still there to me.

> Perhaps it'd be good to come up with a better solution for HEAD:

Yeah, the panic-on-replay aspect is troublesome.  I feel like we need a
rethink here.  But I agree that solution #1 is the only one that feels
safe enough for backpatching.

            regards, tom lane

Re: BUG #5599: Vacuum fails due to index corruption issues

From
Alvaro Herrera
Date:
Excerpts from Tom Lane's message of jue ago 05 12:36:24 -0400 2010:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > Excerpts from Tom Lane's message of jue ago 05 11:06:57 -0400 2010:
> >> 1. Write the dirty buffers before dropping them.  Kind of ugly from a
> >> performance viewpoint, but simple and safe.
>
> > I think "simple" is good, considering that this code is gone in 9.0 and
> > HEAD.  IMHO investing too much effort on this problem is not worth it.
>
> Gone?  Looks like it's still there to me.

I mean the btree code that does the truncation on vacuum full is
truncated.  There are other uses for truncation, but it doesn't look to
that they are as problematic ... or are they?

Hmm, I guess truncation of heap on lazy vacuum is still a problem
precisely because page compaction will be forgotten.

> > Perhaps it'd be good to come up with a better solution for HEAD:
>
> Yeah, the panic-on-replay aspect is troublesome.  I feel like we need a
> rethink here.  But I agree that solution #1 is the only one that feels
> safe enough for backpatching.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: BUG #5599: Vacuum fails due to index corruption issues

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Excerpts from Tom Lane's message of jue ago 05 12:36:24 -0400 2010:
>> Gone?  Looks like it's still there to me.

> I mean the btree code that does the truncation on vacuum full is
> truncated.  There are other uses for truncation, but it doesn't look to
> that they are as problematic ... or are they?

I think what Hitesh reported is a special case of a generic problem.

> Hmm, I guess truncation of heap on lazy vacuum is still a problem
> precisely because page compaction will be forgotten.

Page compaction is the least of it :-(

Imagine that we have some rows at the end of a table, we delete them,
we vacuum before the next checkpoint.  Vacuum decides it can now
truncate away the last pages, but fails to do so.  The original page
state is still on disk, which means we have lost the fact of the
deletion --- the rows are now effectively live again, though their
index entries are probably gone.

In any case, the removal of VACUUM FULL didn't completely disable
shrinking of btree indexes did it?  I don't recall having removed
that.

            regards, tom lane

Re: BUG #5599: Vacuum fails due to index corruption issues

From
Alvaro Herrera
Date:
Excerpts from Tom Lane's message of jue ago 05 13:19:41 -0400 2010:

> In any case, the removal of VACUUM FULL didn't completely disable
> shrinking of btree indexes did it?  I don't recall having removed
> that.

I see no call to RelationTruncate in the btvacuumscan code, but then it
was only called in vacuum full before.  I'm not sure how it worked
previously with only lazy vacuum.  Did we simply leave the pages as free
for a later extension?

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: BUG #5599: Vacuum fails due to index corruption issues

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Excerpts from Tom Lane's message of jue ago 05 13:19:41 -0400 2010:
>> In any case, the removal of VACUUM FULL didn't completely disable
>> shrinking of btree indexes did it?  I don't recall having removed
>> that.

> I see no call to RelationTruncate in the btvacuumscan code, but then it
> was only called in vacuum full before.  I'm not sure how it worked
> previously with only lazy vacuum.  Did we simply leave the pages as free
> for a later extension?

You're right, I misremembered.  That code is just plain gone in 9.0:
http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/access/nbtree/nbtree.c.diff?r1=1.174;r2=1.175;f=h

Still, we have a live issue with heap truncation during plain VACUUM.
However, the scope of the problem seems a lot less than I was thinking.
Maybe write-the-buffers-first is a sufficient longterm solution.

            regards, tom lane

Re: BUG #5599: Vacuum fails due to index corruption issues

From
Alvaro Herrera
Date:
Excerpts from Tom Lane's message of jue ago 05 14:01:15 -0400 2010:

> You're right, I misremembered.  That code is just plain gone in 9.0:
> http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/access/nbtree/nbtree.c.diff?r1=1.174;r2=1.175;f=h
>
> Still, we have a live issue with heap truncation during plain VACUUM.
> However, the scope of the problem seems a lot less than I was thinking.

The scope is further reduced by the fact that this only seems to happen
on Windows, and then only when the antivirus is messing around with the
files.

> Maybe write-the-buffers-first is a sufficient longterm solution.

Yeah, perhaps it is, though it's a pity that a single platform problem
is going to slow down everyone else.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: BUG #5599: Vacuum fails due to index corruption issues

From
Heikki Linnakangas
Date:
On 05/08/10 21:28, Alvaro Herrera wrote:
> Excerpts from Tom Lane's message of jue ago 05 14:01:15 -0400 2010:
>> Maybe write-the-buffers-first is a sufficient longterm solution.
>
> Yeah, perhaps it is, though it's a pity that a single platform problem
> is going to slow down everyone else.

How about performing the truncate as usual, but if it fails, fill the
truncated portion with zeros instead? Zeroed pages should be handled
gracefully, and this would be a very non-invasive fix. Now if the
write() to zero the pages fails too, I guess we'll have to panic, but
that's not much different from flushing the dirty pages first.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: BUG #5599: Vacuum fails due to index corruption issues

From
Tom Lane
Date:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> On 05/08/10 21:28, Alvaro Herrera wrote:
>> Excerpts from Tom Lane's message of jue ago 05 14:01:15 -0400 2010:
>>> Maybe write-the-buffers-first is a sufficient longterm solution.
>>
>> Yeah, perhaps it is, though it's a pity that a single platform problem
>> is going to slow down everyone else.

> How about performing the truncate as usual, but if it fails, fill the
> truncated portion with zeros instead?

This is blithely assuming that you're going to be able to do anything at
all to the file after having failed to truncate it.  The permissions
problem might be persistent, or you might crash, or several other
things.  If we could do this safely, we could also do the original #3
solution (write the dirty buffers for the to-be-truncated block range
only after failing to truncate).

            regards, tom lane

Re: BUG #5599: Vacuum fails due to index corruption issues

From
Greg Stark
Date:
On Thu, Aug 5, 2010 at 7:28 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
>
> The scope is further reduced by the fact that this only seems to happen
> on Windows, and then only when the antivirus is messing around with the
> files.

So I suspect this could be triggered lots of ways. Imagine a ZFS
volume that runs out of space temporarily. Even truncate would need
additional blocks to replace the meta information. Or a network
filesystem like AFS where your kerberos tickets have expired and you
get a permission failure until they've been renewed. Or, in the case
of a very large table being truncated I suspect there's a
CHECK_FOR_INTERRUPTS lying around that can cancel the backend at the
wrong time.

It would be nice to have a regression test harness that caused system
calls to fail randomly -- the difficult part would be testing the
results.

--
greg