Re: ERROR: invalid page in block 1226710 of relation base/16750/27244 - Mailing list pgsql-general

From Tom Lane
Subject Re: ERROR: invalid page in block 1226710 of relation base/16750/27244
Date
Msg-id 22806.1445453184@sss.pgh.pa.us
Whole thread Raw
In response to ERROR: invalid page in block 1226710 of relation base/16750/27244  (bricklen <bricklen@gmail.com>)
Responses Re: ERROR: invalid page in block 1226710 of relation base/16750/27244  (bricklen <bricklen@gmail.com>)
List pgsql-general
bricklen <bricklen@gmail.com> writes:
> We have run into some corruption in one of our production tables. We know
> the cause (a compute node was moved), but now we need to fix the data. We
> have backups, but at this point they are nearly a day old, so recovering
> from them is a last-resort and will incur significant downtime.
> We are running 9.3.9

> Following the steps at
> http://blog.endpoint.com/2010/06/tracking-down-database-corruption-with.html

> I get the following output for ctid, id, other_id, tstamp:

>  (690651,42) |    318698967 |   347978007 | 2015-10-20 01:55:41.757+00
>  (690651,43) |    318698968 |   347978008 | 2015-10-20 01:55:41.663+00
>  (690651,44) |    318698969 |   347978009 | 2015-10-20 01:55:42.005+00
> ERROR:  invalid page in block 1226710 of relation base/16750/27244

> It appears 690652 is what would be dd'd if that's the route we take. Is
> that accurate?

I'm confused by the block mentioned in the error message not having
anything to do with the TID sequence.  I wonder whether it refers to an
index not the table proper.  What query were you using to get this output,
exactly?  Have you confirmed which relation has relfilenode 27244?

> Because the message indicates the corruption is in the table's page, not
> the page header, according to the docs zero_damaged_pages probably won't
> work.

I should think that zero_damaged_pages would work fine, if the problem
is indeed in the base table.  But if it's in an index, a REINDEX would be
a better answer.

Another thing to keep in mind here, if you've got replication slaves,
is that I'm not sure whether the effects of zero_damaged_pages would
propagate to slaves.  Have you investigated the possibility that some
slave has an uncorrupt copy that you could dd into place in the master?

> Is this the correct command if option #2 is chosen? Can it be executed
> against a running cluster?
> dd if=/dev/zero of=database/16750/27244 bs=8192 seek=690652 count=1
> conv=notrunc

Uh, no, you're not accounting for the fact that such an offset wouldn't be
in the first segment file of the relation.

            regards, tom lane


pgsql-general by date:

Previous
From: Jonathan Vanasco
Date:
Subject: temporary indexes?
Next
From: Adrian Klaver
Date:
Subject: Re: temporary indexes?