Re: Data corruption zero a file - help!! - Mailing list pgsql-general

From Michael Fuhr
Subject Re: Data corruption zero a file - help!!
Date
Msg-id 20060307044503.GA18411@winnie.fuhr.org
Whole thread Raw
In response to Data corruption zero a file - help!!  (Noel Faux <noel.faux@med.monash.edu.au>)
Responses Re: Data corruption zero a file - help!!  (Noel Faux <noel.faux@med.monash.edu.au>)
List pgsql-general
On Tue, Mar 07, 2006 at 01:41:44PM +1100, Noel Faux wrote:
> Here is the output from the pg_filedump; is there anything which looks
> suss and where would we re-zero the data, if that's the next step:
[...]
> Block 110025 ********************************************************
> <Header> -----
> Block Offset: 0x35b92000         Offsets: Lower       0 (0x0000)
> Block: Size    0  Version   24            Upper       2 (0x0002)
> LSN:  logid      0 recoff 0x00000000      Special     0 (0x0000)
> Items:    0                   Free Space:    2
> Length (including item array): 24
>
> Error: Invalid header information.
>
>  0000: 00000000 00000000 00000000 00000200  ................
>  0010: 00001800 af459a00                    .....E..
>
> <Data> ------
> Empty block - no items listed
>
> <Special Section> -----
> Error: Invalid special section encountered.
> Error: Special section points off page. Unable to dump contents.

Looks like we've successfully identified the bad block; contrast
these header values and the hex dump with the good blocks and you
can see at a glance that this one is different.  It might be
interesting to you (but probably not to us, so don't send the output)
to see if the block's contents are recognizable, as though they
came from some unrelated file (which might suggest an OS bug).
Check your local documentation to see what od/hd/hexdump/whatever
options will give you an ASCII dump and use dd to fetch the page
and pipe it into that command.  Try this (substitute the hd command
with whatever works on your system):

dd bs=8k skip=110025 count=1 if=/path/file | hd

Even if you don't care about the block's current contents, you might
want to redirect dd's output to a file to save a copy of the block
in case you do ever want to examine it further.  And it would be
prudent to verify that the data shown by the above dd command matches
the data in the pg_filedump output before doing anything destructive.

When you're ready to zero the file, shut down the postmaster and
run a command like the following (but keep reading before doing
so):

dd bs=8k seek=110025 conv=notrunc count=1 if=/dev/zero of=/path/file

Before running that command I would strongly advise reading the dd
manual page on your system to make sure the options are correct and
that you understand them.  I'd also suggest practicing on a test
table: create a table, populate it with arbitrary data, pick a page
to zero, identify the file and block, run a command like the above,
and verify that the table is intact except for the missing block.
Make *sure* you know what you're doing and that the above command
works before running it -- if you botch it you might lose a 1G file
instead of an 8K block.

In one of his messages Tom Lane suggested vacuuming the table after
zeroing the bad block to see if vacuum discovers any other bad
blocks.  During the vacuum you should see a message like this:

WARNING:  relation "foo" page 110025 is uninitialized --- fixing

If you see any other errors or warnings then please post them.

--
Michael Fuhr

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: Is the "ACCESS EXCLUSIVE" lock for TRUNCATE really necessary?
Next
From: "Phill Edwards"
Date:
Subject: JSP pages don't work with database after postgres downgrade