BUG #5599: Vacuum fails due to index corruption issues - Mailing list pgsql-bugs

From Hitesh Bhambhani
Subject BUG #5599: Vacuum fails due to index corruption issues
Date
Msg-id 201008042147.o74LlCge057659@wwwmaster.postgresql.org
Whole thread Raw
Responses Re: BUG #5599: Vacuum fails due to index corruption issues  (Greg Stark <gsstark@mit.edu>)
Re: BUG #5599: Vacuum fails due to index corruption issues  (Alvaro Herrera <alvherre@commandprompt.com>)
List pgsql-bugs
The following bug has been logged online:

Bug reference:      5599
Logged by:          Hitesh Bhambhani
Email address:      hiteshb@asg.com
PostgreSQL version: 8.2.9-1
Operating system:   Microsoft Windows Server 2003, Enterprise Edition
Description:        Vacuum fails due to index corruption issues
Details:

Hi,

We are seeing a problem in our application where the table indexes get
corrupted. This application is a Java Webapp with postgre as the backend.
The Webapp kicks off Vacuum at regular intervals. After running the
application for a while, one of our customers noted that the Vacuum fails
and Webapp gets very slow.

A re-index of the full database works fine and resolves the issue. But it
re-occurs within a day.

Based on some logs in the Webapp I can see that there were some errors in
truncating relations. Once those errors disappear the index corruption
errors start. I'm not sure if there is a connection here.

Here is a sample log message from the Webapp that shows the truncate error:

2010-07-08 06:29:54,641  WARN DefaultQuartzScheduler_Worker-4
maintenance.PostgreSqlVacuumer:32 - runVacuumFull(): running VACUUM FULL
VERBOSE
2010-07-08 06:29:56,672 ERROR DefaultQuartzScheduler_Worker-4
core.JobRunShell:211 - Job DEFAULT.postgreSqlVacuumJob threw an unhandled
Exception:
org.springframework.jdbc.BadSqlGrammarException: Hibernate-related JDBC
operation;
bad SQL grammar []; nested exception is org.postgresql.util.PSQLException:
ERROR: could not truncate relation 1663/16403/41274 to 30 blocks: Permission
denied

After a couple of such truncate errors the Vacuum starts failing due to
'failed to re-find parent key in index', as seen in sample error log below:

2010-07-08 06:44:56,060 ERROR DefaultQuartzScheduler_Worker-1
core.JobRunShell:2
11 - Job DEFAULT.postgreSqlVacuumJob threw an unhandled Exception:
org.springframework.jdbc.UncategorizedSQLException: Hibernate-related JDBC
operation;
uncategorized SQLException for SQL []; SQL state [XX000]; error code [0];
ERROR: failed to re-find parent key in index "pmoinstance_idx_pmotypeid" for
deletion target page 30;
nested exception is org.postgresql.util.PSQLException: ERROR: failed to
re-find parent key in index "pmoinstance_idx_pmotypeid" for deletion target
page 30

Here the index "pmoinstance_idx_pmotypeid" is one of several application
specific indexes.

Once this index corruption issue occurs, the Vacuum job keeps failing until
a re-index is done. In the long run, we don't want to keep re-indexing the
database as a scheduled job so we would like your help to get to the bottom
of this.

So how can we avoid these index corruption errors and are there any known
causes?

Also, please let me know if there is a direct link between the truncate
relation errors that I saw preceded the index corruption errors?

At the time these Webapp logs were collected, the database was set to
produce verbose logging so I don't have database logs, sorry.  Please do let
me know what other information I can provide to help you diagnose this
situation.

Thanks for your time.

Regards,
Hitesh

pgsql-bugs by date:

Previous
From: Robert Haas
Date:
Subject: Re: [HACKERS] Drop one-argument string_agg? (was Re: string_agg delimiter having no effect with order by)
Next
From: Greg Stark
Date:
Subject: Re: [HACKERS] Drop one-argument string_agg? (was Re: string_agg delimiter having no effect with order by)