Home > mailing lists

Re: Autovacuum and invalid page header - Mailing list pgsql-admin

From	Kevin Grittner
Subject	Re: Autovacuum and invalid page header
Date	May 13, 2010 11:25:18
Msg-id	4BEBC57002000025000316A3@gw.wicourts.gov Whole thread Raw
In response to	Autovacuum and invalid page header (Ireneusz Pluta <ipluta@wp.pl>)
Responses	Re: Autovacuum and invalid page header Re: Autovacuum and invalid page header
List	pgsql-admin

Tree view

Ireneusz Pluta <ipluta@wp.pl> wrote:

> many different autovacuums on the same table cycling in start-fail
> scenario because of invalid page headers of some indexes of the
> table. Manual VACUUM VERBOSE said me that, now I can also see that
> in related ERROR-CONTEXT log message pairs. I dropped the damaged
> indexes and now that autovacuum seems to continue without
> repeating itself.

> First, it seems that an unnoticed damage of a relation, causing
> autovacuum failures, which might initially be harmless for the
> whole cluster (as far as only damaged relation and its disk space
> is concerned), may lead to a critical situation when a wraparound
> gets close, and autovacuum is in prevenitng mode can't freeze its
> xids. The forced autovacuum insists on vacuuming the table which
> can not be vacuumed and locks itself there. At the same time,
> autovacuum get sticked with a database of the damaged relation,
> and "forgets" about other databases which might need me vacuumed.
> From this point of view, just one damaged table might be a single
> point of failure of the whole cluster. Is there any way, other
> than tracing logs, to prevent such a situation?

That does sound like an area where there might be room for
improvement within PostgreSQL; however, I strongly recommend that
you have *some* sort of periodic VACUUM ANALYZE VERBOSE of any
active database, and that you scan the results (we use grep) to look
for problems.  We send an email to the DBA team if errors or warning
show in the VACUUM ANALYZE VERBOSE.

> Another issue, rather loosely related to the topic. I suspect that
> the cluster may have more invalid page headers like that. They
> might be caused by past bad sector failures of one of the drives
> of my raid array. So I think it might be a good idea to check the
> whole cluster, page by page, for invalid page headers. Is there
> any ready tool, which, for instance when given a path to database
> cluster, would traverse all cluster directories and files and
> check all page headers? I probably answered myself - manual VACUUM
> [VERBOSE] would do - but it fails when finds the first invalid
> header, so I would have. That's why I am looking for something
> only scanning and reporting invalid headers.

I don't know of anything, but you might try searching pgfoundry.

> Yet another issue: how could that happen[?]

That's the big question.  You want to try very hard to answer it,
because otherwise you'll probably be going through this all again
soon.  It could be RAM, RAID controller, OS, or a bad drive.  It
could also be an abnormal system shutdown (OS crash or power loss)
if you have an unsafe configuration.  If you don't want to see
problems like this, don't run with fsync or full_page_writes set to
"off".

-Kevin

pgsql-admin by date:

From: "Donald Fraser"
Date: 12 May 2010, 17:21:57
Subject: Re: Shutdown fails with both 'fast' and 'immediate'

From: Ray Stell
Date: 13 May 2010, 11:25:56
Subject: Re: upgrade proc with pitr standby

Re: Autovacuum and invalid page header - Mailing list pgsql-admin

Previous

Next