Thread: BUG #5077: Corrupted Table

BUG #5077: Corrupted Table

From
"Bryan McLemore"
Date:
The following bug has been logged online:

Bug reference:      5077
Logged by:          Bryan McLemore
Email address:      kaelten@gmail.com
PostgreSQL version: 8.4.0
Operating system:   Ubuntu 64 bit
Description:        Corrupted Table
Details:

Today a table corrupted and I started getting:

"invalid page header in block 900 of relation pg_tblspc/32041/138911/187737"


on all selects on a given table.

RhodiumToad & StuckMojo (and a few others) helped me track it down.  The
page in question looked like this:

http://pgsql.privatepaste.com/83JfmQGtS5

RhodiumToad gave me this command to repair the table:

printf '\x00\x01\x40\x03\x00\x20' | dd of=pg_tblspc/32041/138911/187737 bs=1
conv=notrunc seek=7372812 count=6

The reason they asked me to report this is that it appears this occured when
a disk filled up while pg_dump was running.


On this system df -h shows:

/dev/sda1              65G   23G   39G  38% /
varrun                4.0G   48K  4.0G   1% /var/run
varlock               4.0G     0  4.0G   0% /var/lock
udev                  4.0G   40K  4.0G   1% /dev
devshm                4.0G     0  4.0G   0% /dev/shm
/dev/sdb1             136G   23G  107G  18% /data


/dev/sda1 is where the pgdata directory is.
/dev/sbd1 is where the tablespace is.

/sda1 is the drive that filled up while running a pg_dump.

If there is any additional info I can provide please let me know.

Re: BUG #5077: Corrupted Table

From
Andrew Gierth
Date:
>>>>> "Bryan" == "Bryan McLemore" <kaelten@gmail.com> writes:

 Bryan> "invalid page header in block 900 of relation pg_tblspc/32041/138911/187737"

 Bryan> http://pgsql.privatepaste.com/83JfmQGtS5

Privatepaste urls do expire, so for the record here is the relevant
part of the data in question:

00000000  82 00 00 00 50 01 72 8a  01 00 04 00 00 00 84 03  |....P.r.........|
00000010  02 00 04 20 13 01 d9 00  a8 8e a2 01 d0 8d a2 01  |... ............|
00000020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000030  00 00 00 00 00 00 00 00  16 00 01 00 17 00 01 00  |................|
00000040  18 00 01 00 19 00 01 00  1a 00 01 00 00 00 00 00  |................|
00000050  1b 00 01 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000060  00 00 00 00 00 00 00 00  00 00 00 00 e8 9e 24 02  |..............$.|
00000070  60 9e 0c 01 d0 9d 1c 01  10 9d 74 01 48 9c 84 01  |`.........t.H...|
00000080  c0 9b 0c 01 e8 9a a2 01  30 9a 6c 01 50 99 bc 01  |........0.l.P...|
00000090  1c 00 01 00 1d 00 01 00  1e 00 01 00 28 00 01 00  |............(...|
000000a0  29 00 01 00 2a 00 01 00  2b 00 01 00 2c 00 01 00  |)...*...+...,...|
000000b0  2d 00 01 00 30 98 32 02  68 97 8c 01 80 96 cc 01  |-...0.2.h.......|
000000c0  d8 95 44 01 98 94 74 02  c0 93 a8 01 a8 92 24 02  |..D...t.......$.|
000000d0  20 92 0c 01 90 91 1c 01  d0 90 74 01 08 90 84 01  | .........t.....|
000000e0  80 8f 0c 01 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000000f0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

The data appears intact other than invalid values for pd_lower and pd_special
(and possibly pd_upper, wasn't sure about that one).

 Bryan> The reason they asked me to report this is that it appears
 Bryan> this occured when a disk filled up while pg_dump was running.

I have no idea whether the disk full was the cause of this, but there
was no evidence in the page data of a hardware failure, so it could do
with investigation. (I don't know of any external cause that could damage
pd_lower while leaving the rest of the page intact.)

I did ask Bryan on IRC to make a copy of his data directory before doing
the fix.

--
Andrew (irc:RhodiumToad)

Re: BUG #5077: Corrupted Table

From
Kaelten
Date:
I do have said copies and would be glad to help in whatever ways I am able =
too.

Bryan McLemore
Kaelten



On Wed, Sep 23, 2009 at 9:38 PM, Andrew Gierth
<andrew@tao11.riddles.org.uk> wrote:
>>>>>> "Bryan" =3D=3D "Bryan McLemore" <kaelten@gmail.com> writes:
>
> =C2=A0Bryan> "invalid page header in block 900 of relation pg_tblspc/3204=
1/138911/187737"
>
> =C2=A0Bryan> http://pgsql.privatepaste.com/83JfmQGtS5
>
> Privatepaste urls do expire, so for the record here is the relevant
> part of the data in question:
>
> 00000000 =C2=A082 00 00 00 50 01 72 8a =C2=A001 00 04 00 00 00 84 03 =C2=
=A0|....P.r.........|
> 00000010 =C2=A002 00 04 20 13 01 d9 00 =C2=A0a8 8e a2 01 d0 8d a2 01 =C2=
=A0|... ............|
> 00000020 =C2=A000 00 00 00 00 00 00 00 =C2=A000 00 00 00 00 00 00 00 =C2=
=A0|................|
> 00000030 =C2=A000 00 00 00 00 00 00 00 =C2=A016 00 01 00 17 00 01 00 =C2=
=A0|................|
> 00000040 =C2=A018 00 01 00 19 00 01 00 =C2=A01a 00 01 00 00 00 00 00 =C2=
=A0|................|
> 00000050 =C2=A01b 00 01 00 00 00 00 00 =C2=A000 00 00 00 00 00 00 00 =C2=
=A0|................|
> 00000060 =C2=A000 00 00 00 00 00 00 00 =C2=A000 00 00 00 e8 9e 24 02 =C2=
=A0|..............$.|
> 00000070 =C2=A060 9e 0c 01 d0 9d 1c 01 =C2=A010 9d 74 01 48 9c 84 01 =C2=
=A0|`.........t.H...|
> 00000080 =C2=A0c0 9b 0c 01 e8 9a a2 01 =C2=A030 9a 6c 01 50 99 bc 01 =C2=
=A0|........0.l.P...|
> 00000090 =C2=A01c 00 01 00 1d 00 01 00 =C2=A01e 00 01 00 28 00 01 00 =C2=
=A0|............(...|
> 000000a0 =C2=A029 00 01 00 2a 00 01 00 =C2=A02b 00 01 00 2c 00 01 00 =C2=
=A0|)...*...+...,...|
> 000000b0 =C2=A02d 00 01 00 30 98 32 02 =C2=A068 97 8c 01 80 96 cc 01 =C2=
=A0|-...0.2.h.......|
> 000000c0 =C2=A0d8 95 44 01 98 94 74 02 =C2=A0c0 93 a8 01 a8 92 24 02 =C2=
=A0|..D...t.......$.|
> 000000d0 =C2=A020 92 0c 01 90 91 1c 01 =C2=A0d0 90 74 01 08 90 84 01 =C2=
=A0| .........t.....|
> 000000e0 =C2=A080 8f 0c 01 00 00 00 00 =C2=A000 00 00 00 00 00 00 00 =C2=
=A0|................|
> 000000f0 =C2=A000 00 00 00 00 00 00 00 =C2=A000 00 00 00 00 00 00 00 =C2=
=A0|................|
>
> The data appears intact other than invalid values for pd_lower and pd_spe=
cial
> (and possibly pd_upper, wasn't sure about that one).
>
> =C2=A0Bryan> The reason they asked me to report this is that it appears
> =C2=A0Bryan> this occured when a disk filled up while pg_dump was running.
>
> I have no idea whether the disk full was the cause of this, but there
> was no evidence in the page data of a hardware failure, so it could do
> with investigation. (I don't know of any external cause that could damage
> pd_lower while leaving the rest of the page intact.)
>
> I did ask Bryan on IRC to make a copy of his data directory before doing
> the fix.
>
> --
> Andrew (irc:RhodiumToad)
>