Re: Data corruption zero a file - help!! - Mailing list pgsql-general

From Noel Faux
Subject Re: Data corruption zero a file - help!!
Date
Msg-id 440F7334.1030304@med.monash.edu.au
Whole thread Raw
In response to Re: Data corruption zero a file - help!!  (Michael Fuhr <mike@fuhr.org>)
Responses Re: Data corruption zero a file - help!!  (Michael Fuhr <mike@fuhr.org>)
List pgsql-general
Ok it worked but we ran into another bad block :(
vacuumdb: vacuuming of database "monashprotein" failed: ERROR:  invalid
page header in block 9022937 of relation "gap"

So the command we used was:
dd bs=8k seek=110025 conv=notrunc count=1 if=/dev/zero
of=/usr/local/postgresql/postgresql-7.4.8/data/base/37958/111685332.68

I'm tried to work out the formula for finding the file (i.e. the 111685332.*) to fix and the value to seek to, but as a complete novice I'm lost, any pointers would be a great help.  We checked the block size and it's 8192.

Cheers
Noel

Michael Fuhr wrote:
On Tue, Mar 07, 2006 at 01:41:44PM +1100, Noel Faux wrote: 
Here is the output from the pg_filedump; is there anything which looks 
suss and where would we re-zero the data, if that's the next step:   
[...] 
Block 110025 ********************************************************
<Header> -----
Block Offset: 0x35b92000         Offsets: Lower       0 (0x0000)
Block: Size    0  Version   24            Upper       2 (0x0002)
LSN:  logid      0 recoff 0x00000000      Special     0 (0x0000)
Items:    0                   Free Space:    2
Length (including item array): 24

Error: Invalid header information.
0000: 00000000 00000000 00000000 00000200  ................0010: 00001800 af459a00                    .....E..       

<Data> ------
Empty block - no items listed

<Special Section> -----
Error: Invalid special section encountered.
Error: Special section points off page. Unable to dump contents.   
Looks like we've successfully identified the bad block; contrast
these header values and the hex dump with the good blocks and you
can see at a glance that this one is different.  It might be
interesting to you (but probably not to us, so don't send the output)
to see if the block's contents are recognizable, as though they
came from some unrelated file (which might suggest an OS bug).
Check your local documentation to see what od/hd/hexdump/whatever
options will give you an ASCII dump and use dd to fetch the page
and pipe it into that command.  Try this (substitute the hd command
with whatever works on your system):

dd bs=8k skip=110025 count=1 if=/path/file | hd

Even if you don't care about the block's current contents, you might
want to redirect dd's output to a file to save a copy of the block
in case you do ever want to examine it further.  And it would be
prudent to verify that the data shown by the above dd command matches
the data in the pg_filedump output before doing anything destructive.

When you're ready to zero the file, shut down the postmaster and
run a command like the following (but keep reading before doing
so):

dd bs=8k seek=110025 conv=notrunc count=1 if=/dev/zero of=/path/file

Before running that command I would strongly advise reading the dd
manual page on your system to make sure the options are correct and
that you understand them.  I'd also suggest practicing on a test
table: create a table, populate it with arbitrary data, pick a page
to zero, identify the file and block, run a command like the above,
and verify that the table is intact except for the missing block.
Make *sure* you know what you're doing and that the above command
works before running it -- if you botch it you might lose a 1G file
instead of an 8K block.

In one of his messages Tom Lane suggested vacuuming the table after
zeroing the bad block to see if vacuum discovers any other bad
blocks.  During the vacuum you should see a message like this:

WARNING:  relation "foo" page 110025 is uninitialized --- fixing

If you see any other errors or warnings then please post them.
 

Attachment

pgsql-general by date:

Previous
From: Michael Fuhr
Date:
Subject: Re: Triggers and Multiple Schemas.
Next
From: Michael Fuhr
Date:
Subject: Re: Data corruption zero a file - help!!