Thread: "invalid page header in block 597621 of relation..." error

"invalid page header in block 597621 of relation..." error

From
Adam Witney
Date:
Hi,

I just had this error in my database:

bugasbase2=# SELECT count(*) from mba_data_base;
ERROR:  invalid page header in block 597621 of relation "mba_data_base"

Any ideas whats going on? Am a bit worried as this is my production
database.

Thanks for any assistance

Adam


--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


Re: "invalid page header in block 597621 of relation..." error

From
Tom Lane
Date:
Adam Witney <awitney@sgul.ac.uk> writes:
> bugasbase2=# SELECT count(*) from mba_data_base;
> ERROR:  invalid page header in block 597621 of relation "mba_data_base"

Sounds like a data corruption problem :-(.  Do you want to pull out that
page and see what's in it?  Something like

    dd bs=8k skip=73333 count=1 if=relationfile | od -x

where you need to use oid2name or something similar to determine which
file contains that relation, and then append ".4" because block 597621
would be in the fifth segment of the file.  (I got 73333 from
"select 597621 % 131072")

            regards, tom lane

Re: "invalid page header in block 597621 of relation..."error

From
Adam Witney
Date:
On 23/11/05 8:55 pm, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

> Adam Witney <awitney@sgul.ac.uk> writes:
>> bugasbase2=# SELECT count(*) from mba_data_base;
>> ERROR:  invalid page header in block 597621 of relation "mba_data_base"
>
> Sounds like a data corruption problem :-(.  Do you want to pull out that
> page and see what's in it?  Something like
>
> dd bs=8k skip=73333 count=1 if=relationfile | od -x
>
> where you need to use oid2name or something similar to determine which
> file contains that relation, and then append ".4" because block 597621
> would be in the fifth segment of the file.  (I got 73333 from
> "select 597621 % 131072")

Thanks for the help.... Here is the output:

adam@bugsdb:/opt$ dd bs=8k skip=73333 count=1 if=134401991.4 | od -x
0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
1+0 records in
1+0 records out
0010000 1d9e 201c 0fa0 0000 0010 0000 0000 000b
0010020 0ca6 19fb 1797 0ab4 000a 0000 0000 0001
0010040 01af 0000 000a 0000 0000 0001 0ca7 0000
0010060 0012 0000 0000 0010 0002 1190 068f 0c9a
0010100 0391 0000 0012 0000 0000 000f 0002 10aa
0010120 0971 06f3 184c 0000 0010 0000 ffff 0010
0010140 0d14 0f78 1c5c 066d 0012 0000 0000 000f
0010160 0009 0ff5 26c1 077a 0c8a 0000 0012 0000
0010200 0000 000f 0009 0ff5 26c1 077a 0c8a 0000
0010220 0008 0000 0000 0000 0002 0000 0000 0000
0010240 0000 0000 0000 0000 0000 0000 0000 0000
0010260 0000 0000 0000 0000 0001 0000 0000 0000
0010300 0000 0000 0008 0000 7059 3376 0004 0000
0010320 1d53 003b 0021 0000 0000 0000 0009 1e75
0010340 0006 004d 0813 ff28 fedf fffb ff87 1803
0010360 0000 0000 221c 0ea7 e6dc 009c 1439 0000
0010400 0017 0000 4353 522d 7461 6f69 305f 2036
0010420 3228 4537 3031 0029 0012 0000 0000 000e
0010440 0056 0a6a 1a0a 1a0a 1a2c 0000 0012 0000
0010460 0000 000e 004e 1b26 0dda 03e0 238c 0000
0010500 000a 0000 0000 0001 004c 0000 000a 0000
0010520 0000 0001 004a 0000 0012 0000 0000 000e
0010540 002c 2208 12b6 2232 0ce4 0000 0012 0000
0010560 0000 000f 0027 2448 0aad 0001 1dec 0000
0010600 0019 0000 000b 0000 0007 0000 0001 0000
0010620 000c 0000 0000 0001 0199 1388 000c 0000
0010640 0000 0001 0ca7 1388 0012 0000 0000 000e
0010660 0073 16c9 0d05 0d05 0ce4 0000 0012 0000
0010700 0000 000e 0047 2404 1e70 081d 23f0 0000
0010720 000a 0000 0000 0001 003c 0000 000a 0000
0010740 0000 0001 011a 0000 000a 0000 0000 0001
0010760 1438 0000 000c 0000 0001 0001 0002 0890
0011000 000a 0000 ffff 0002 1d4c 0000 0008 0000
0011020 0000 0001 000a 0000 0000 0001 003c 0000
0011040 0010 0000 ffff 0010 2591 04b3 1a9d 07e3
0011060 0008 0000 0000 0001 000a 0000 0000 0001
0011100 0009 0000 0012 0000 0000 0010 0001 22c2
0011120 131e 07da 21f5 0000 0012 0000 0000 4010
0011140 0001 1c60 234b 2081 0363 0000 0010 0000
0011160 ffff 000f 1e06 0395 1980 164e 0012 0000
0011200 0000 000d 019b 08d8 234b 2081 03e8 0000
0011220 0012 0000 0000 000d 0ca6 1c91 237a 0d8f
0011240 0fa0 0000 000c 0000 0000 0001 0199 1388
0011260 000c 0000 0000 0001 0ca7 1388 0012 0000
0011300 0000 0010 0001 22c2 131e 07da 21f5 0000
0011320 0012 0000 0000 4010 0001 1c60 234b 2081
0011340 0363 0000 0010 0000 ffff 000f 1e06 0395
0011360 1980 164e 0012 0000 0000 000f 0008 2032
0011400 0390 1032 0b5e 0000 0012 0000 0000 000f
0011420 0008 2032 0390 1032 0b5e 0000 0008 0000
0011440 0000 0000 0002 0000 0000 0000 0000 0000
0011460 0000 0000 0000 0000 0000 0000 0000 0000
0011500 0000 0000 0001 0000 0000 0000 0000 0000
0011520 0008 0000 7059 3376 0004 0000 1d53 003b
0011540 0021 0000 0000 0000 0009 1e75 0005 004d
0011560 0813 ff28 fedf fffb ff87 1803 0000 0000
0011600 221b 0ea7 e6db 009c 1439 0000 0017 0000
0011620 4353 522d 7461 6f69 305f 2035 3228 4137
0011640 3031 0029 0012 0000 0000 000e 005b 229f
0011660 18db 18db 1900 0000 0012 0000 0000 000e
0011700 004f 07c6 01f1 1404 0ed8 0000 000a 0000
0011720 0000 0001 0057 0000 000a 0000 0000 0001
0011740 0045 0000 0012 0000 0000 000d 0033 1608
0011760 1186 1eec 0fa0 0000 0012 0000 0000 000e
0012000 002a 113e 066c 136d 2134 0000 0019 0000
0012020 000a 0000 0007 0000 0001 0000 000c 0000
0012040 0000 0001 018c 1388 000c 0000 0000 0001
0012060 0ca7 1388 000c 0000 0000 0001 0073 1388
0012100 0012 0000 0000 000f 0032 02a5 25bd 00a9
0012120 1356 0000 000a 0000 0000 0001 0058 0000
0012140 000a 0000 0000 0001 00c9 0000 000a 0000
0012160 0000 0001 1f96 0000 000c 0000 0001 0001
0012200 0001 171f 0010 0000 ffff 0010 1eb1 0594
0012220 1652 0b29 0008 0000 0000 0001 000a 0000
0012240 0000 0001 0058 0000 000a 0000 0000 0001
0012260 0001 0000 0008 0000 0000 0001 000a 0000
0012300 0000 0001 000b 0000 0012 0000 0000 000f
0012320 0002 0207 1d03 15de 18a6 0000 0012 0000
0012340 0000 0010 0001 2436 1511 1a71 1510 0000
0012360 0010 0000 ffff 0010 1b8a 0a20 0cd1 249b
0012400 0012 0000 0000 000e 018a 1661 11fe 0c9e
0012420 11f8 0000 0010 0000 0000 000c 0ca6 1f0d
0012440 1cef 1a3e 000c 0000 0000 0001 018c 1388
0012460 000c 0000 0000 0001 0ca7 1388 0012 0000
0012500 0000 000f 0002 0207 1d03 15de 18a6 0000
0012520 0012 0000 0000 0010 0001 2436 1511 1a71
0012540 1510 0000 0010 0000 ffff 0010 1b8a 0a20
0012560 0cd1 249b 0012 0000 0000 000f 000a 188f
0012600 07f6 09d3 0d20 0000 0012 0000 0000 000f
0012620 000a 188f 07f6 09d3 0d20 0000 0008 0000
0012640 0000 0000 0002 0000 0000 0000 0000 0000
0012660 0000 0000 0000 0000 0000 0000 0000 0000
0012700 0000 0000 0001 0000 0000 0000 0000 0000
0012720 0008 0000 7059 3376 0004 0000 1d53 003b
0012740 0021 0000 0000 0000 0009 1e75 0004 004d
0012760 0813 ff28 fedf fffb ff87 1803 0000 0000
0013000 221a 0ea7 e6da 009c 1439 0000 0016 0000
0013020 4353 522d 7461 6f69 305f 2034 3228 4d37
0013040 2932 0000 0012 0000 0000 000e 0058 0339
0013060 1ae0 1cab 11f8 0000 0012 0000 0000 000e
0013100 0052 014b 1d6f 154a 0960 0000 000a 0000
0013120 0000 0001 0054 0000 000a 0000 0000 0001
0013140 004d 0000 0012 0000 0000 000e 002c 0f2c
0013160 0b57 0801 26ac 0000 0012 0000 0000 000f
0013200 0028 00a6 0b9a 13b0 01cc 0000 0019 0000
0013220 0009 0000 0007 0000 0001 0000 000a 0000
0013240 0000 0001 0179 0000 000a 0000 0000 0001
0013260 0ca5 0000 0012 0000 0000 000e 005e 0165
0013300 0594 1652 0b54 0000 0012 0000 0000 000d
0013320 0045 0487 106a 2319 1388 0000 000a 0000
0013340 0000 0001 006d 0000 000a 0000 0000 0001
0013360 00d3 0000 000a 0000 0000 0001 2581 0000
0013400 000c 0000 0001 0001 0001 1c8d 000a 0000
0013420 0000 0001 0001 0000 0008 0000 0000 0001
0013440 000a 0000 0000 0001 006d 0000 000a 0000
0013460 0000 0001 0001 0000 0008 0000 0000 0001
0013500 000a 0000 0000 0001 000c 0000 0012 0000
0013520 0000 0010 0001 25e3 1314 0a3e 0e3e 0000
0013540 0010 0000 ffff 4010 1965 0312 0555 0125
0013560 0012 0000 0000 400f 0001 2193 2165 115a
0013600 0f96 0000 0012 0000 0000 000e 0179 1965
0013620 0312 0555 012c 0000 0012 0000 0000 000d
0013640 0ca6 2193 2165 115a 0fa0 0000 000a 0000
0013660 0000 0001 0179 0000 000a 0000 0000 0001
0013700 0ca5 0000 0012 0000 0000 0010 0001 25e3
0013720 1314 0a3e 0e3e 0000 0010 0000 ffff 4010
0013740 1965 0312 0555 0125 0012 0000 0000 400f
0013760 0001 2193 2165 115a 0f96 0000 0012 0000
0014000 0000 000f 000b 20d4 0aaa 014b 118a 0000
0014020 0012 0000 0000 000f 000b 20d4 0aaa 014b
0014040 118a 0000 0008 0000 0000 0000 0002 0000
0014060 0000 0000 0000 0000 0000 0000 0000 0000
0014100 0000 0000 0000 0000 0000 0000 0001 0000
0014120 0000 0000 0000 0000 0008 0000 7059 3376
0014140 0004 0000 1d53 003b 0021 0000 0000 0000
0014160 0009 1e75 0003 004d 0813 ff28 fedf fffb
0014200 ff87 1803 0000 0000 2219 0ea7 e6d9 009c
0014220 1439 0000 0016 0000 4353 522d 7461 6f69
0014240 305f 2033 3228 4937 2932 0000 0012 0000
0014260 0000 000e 0061 12a4 1c68 1c68 1c84 0000
0014300 0012 0000 0000 000e 0050 1809 210d 1508
0014320 17d4 0000 000a 0000 0000 0001 0056 0000
0014340 000a 0000 0000 0001 004b 0000 0012 0000
0014360 0000 000e 002d 059a 147f 2017 1004 0000
0014400 0012 0000 0000 000e 0029 2326 1c5b 2064
0014420 206c 0000 0019 0000 0008 0000 0007 0000
0014440 0001 0000 000c 0000 0000 0001 0165 1388
0014460 000c 0000 0000 0001 0ca6 1388 0012 0000
0014500 0000 000e 0054 0341 0d05 0d05 0ce4 0000
0014520 0012 0000 0000 000f 003f 06c6 1b1a 1e4b
0014540 041a 0000 000a 0000 0000 0001 0058 0000
0014560 000a 0000 0000 0001 0111 0000 000a 0000
0014600 0000 0001 2182 0000 000c 0000 0001 0001
0014620 0002 07d8 0010 0000 ffff 0010 1eb1 0594
0014640 1652 0b29 0008 0000 0000 0001 000a 0000
0014660 0000 0001 0058 0000 000a 0000 0000 0001
0014700 0001 0000 0008 0000 0000 0001 000a 0000
0014720 0000 0001 000b 0000 0012 0000 0000 000f
0014740 0002 2696 1b71 19ea 1860 0000 0012 0000
0014760 0000 400f 0002 2566 162e 093e 055a 0000
0015000 0010 0000 ffff 4010 109c 0a91 1f8a 0590
0015020 0012 0000 0000 000e 0168 11de 162e 093e
0015040 0578 0000 0010 0000 0000 000c 0ca6 2424
0015060 0a91 1f8a 000c 0000 0000 0001 0165 1388
0015100 000c 0000 0000 0001 0ca6 1388 0012 0000
0015120 0000 000f 0002 2696 1b71 19ea 1860 0000
0015140 0012 0000 0000 400f 0002 2566 162e 093e
0015160 055a 0000 0010 0000 ffff 4010 109c 0a91
0015200 1f8a 0590 0012 0000 0000 000f 000a 188f
0015220 07f6 09d3 0d20 0000 0012 0000 0000 000f
0015240 000a 188f 07f6 09d3 0d20 0000 0008 0000
0015260 0000 0000 0002 0000 0000 0000 0000 0000
0015300 0000 0000 0000 0000 0000 0000 0000 0000
0015320 0000 0000 0001 0000 0000 0000 0000 0000
0015340 0008 0000 7059 3376 0004 0000 1d53 003b
0015360 0021 0000 0000 0000 0009 1e75 0002 004d
0015400 0813 ff28 fedf fffb ff87 1803 0000 0000
0015420 2218 0ea7 e6d8 009c 1439 0000 0016 0000
0015440 4353 522d 7461 6f69 305f 2032 3228 4537
0015460 2932 0000 000c 0000 0000 0001 0047 1b58
0015500 0012 0000 0000 000e 0050 19ce 025e 025e
0015520 0258 0000 000c 0000 0000 0001 0041 1388
0015540 000a 0000 0000 0001 004b 0000 0012 0000
0015560 0000 000e 0020 24e5 1349 085b 1130 0000
0015600 0012 0000 0000 000e 0028 0150 2612 1f93
0015620 0dac 0000 0019 0000 0007 0000 0007 0000
0015640 0001 0000 000c 0000 0000 0001 0154 1388
0015660 000c 0000 0000 0001 0ca3 1388 0012 0000
0015700 0000 000d 0041 2357 182e 1299 2328 0000
0015720 0012 0000 0000 000e 0031 0ec7 1e9d 20ba
0015740 0ed8 0000 000a 0000 0000 0001 003c 0000
0015760 000a 0000 0000 0001 00a5 0000 000a 0000
0016000 0000 0001 10ce 0000 000c 0000 0001 0001
0016020 0001 0ced 000a 0000 ffff 0002 1d4c 0000
0016040 0008 0000 0000 0001 000a 0000 0000 0001
0016060 003c 0000 0010 0000 ffff 0010 2591 04b3
0016100 1a9d 07e3 0008 0000 0000 0001 000a 0000
0016120 0000 0001 0009 0000 0012 0000 0000 000f
0016140 0004 04c4 0255 1e79 1c5c 0000 0012 0000
0016160 0000 400f 0002 075c 1b6a 0ccd 2562 0000
0016200 0012 0000 0000 4010 0003 1342 1afb 05ac
0016220 0b09 0000 0012 0000 0000 000e 0156 1ae4
0016240 1b6a 0ccd 2580 0000 0012 0000 0000 000d
0016260 0ca6 26ca 1afb 05ac 0bb8 0000 000c 0000
0016300 0000 0001 0154 1388 000c 0000 0000 0001
0016320 0ca3 1388 0012 0000 0000 000f 0004 04c4
0016340 0255 1e79 1c5c 0000 0012 0000 0000 400f
0016360 0002 075c 1b6a 0ccd 2562 0000 0012 0000
0016400 0000 4010 0003 1342 1afb 05ac 0b09 0000
0016420 0012 0000 0000 000f 0008 2032 0390 1032
0016440 0b5e 0000 0012 0000 0000 000f 0008 2032
0016460 0390 1032 0b5e 0000 0008 0000 0000 0000
0016500 0002 0000 0000 0000 0000 0000 0000 0000
0016520 0000 0000 0000 0000 0000 0000 0000 0000
0016540 0001 0000 0001 0000 0000 0000 0008 0000
0016560 7059 3376 0004 0000 1d53 003b 0021 0000
0016600 0000 0000 0009 1e75 0001 004d 0813 ff28
0016620 fedf fffb ff87 1803 0000 0000 2217 0ea7
0016640 e6d7 009c 1439 0000 0016 0000 4353 522d
0016660 7461 6f69 305f 2031 3228 4137 2932 0000
0016700 0012 0000 0000 000d 005a 0154 2382 2382
0016720 2328 0000 0012 0000 0000 000e 0049 12ae
0016740 17c6 255d 0898 0000 000a 0000 0000 0001
0016760 0052 0000 000c 0000 0000 0001 0040 1388
0017000 0012 0000 0000 000f 0021 1c70 024f 22b6
0017020 1f04 0000 0012 0000 0000 000e 002a 0a17
0017040 21f5 1c6d 0898 0000 0019 0000 0006 0000
0017060 0007 0000 0001 0000 000c 0000 0000 0001
0017100 0148 1388 000c 0000 0000 0001 0ca7 1388
0017120 0012 0000 0000 000e 0054 18db 18db 18db
0017140 1900 0000 0012 0000 0000 000e 0026 17e7
0017160 0fbd 213b 11f8 0000 000a 0000 0000 0001
0017200 0058 0000 000a 0000 0000 0001 00b8 0000
0017220 000a 0000 0000 0001 1ef3 0000 000c 0000
0017240 0001 0001 0001 0dc0 0010 0000 ffff 0010
0017260 1eb1 0594 1652 0b29 0008 0000 0000 0001
0017300 000a 0000 0000 0001 0058 0000 000a 0000
0017320 0000 0001 0001 0000 0008 0000 0000 0001
0017340 000a 0000 0000 0001 000b 0000 0012 0000
0017360 0000 000f 0001 11e8 2467 248b 0014 0000
0017400 0012 0000 0000 0010 0001 0f29 16a4 26a4
0017420 1f56 0000 0010 0000 ffff 0010 117a 172e
0017440 09c1 0198 0012 0000 0000 000d 0147 045e
0017460 106b 006b 07d0 0000 0010 0000 0000 000c
0017500 0ca7 020d 0fe1 1d4f 000c 0000 0000 0001
0017520 0148 1388 000c 0000 0000 0001 0ca7 1388
0017540 0012 0000 0000 000f 0001 11e8 2467 248b
0017560 0014 0000 0012 0000 0000 0010 0001 0f29
0017600 16a4 26a4 1f56 0000 0010 0000 ffff 0010
0017620 117a 172e 09c1 0198 0012 0000 0000 000f
0017640 000a 188f 07f6 09d3 0d20 0000 0012 0000
0017660 0000 000f 000a 188f 07f6 09d3 0d20 0000
0017700 0008 0000 0000 0000 0002 0000 0000 0000
0017720 0000 0000 0000 0000 0000 0000 0000 0000
0017740 0000 0000 0000 0000 0001 0000 0000 0000
0017760 0000 0000 0008 0000 7059 3376 0004 0000
0020000


Unfortunately I have no idea what any of that means!

Does it help?

Thanks again

Adam



--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


Re: "invalid page header in block 597621 of relation..."error

From
Tom Lane
Date:
Adam Witney <awitney@sgul.ac.uk> writes:
> Thanks for the help.... Here is the output:

> adam@bugsdb:/opt$ dd bs=8k skip=73333 count=1 if=134401991.4 | od -x
> 0000000 0000 0000 0000 0000 0000 0000 0000 0000
> *
> 0010000 1d9e 201c 0fa0 0000 0010 0000 0000 000b
> 0010020 0ca6 19fb 1797 0ab4 000a 0000 0000 0001
> 0010040 01af 0000 000a 0000 0000 0001 0ca7 0000
> 0010060 0012 0000 0000 0010 0002 1190 068f 0c9a
> ...

> Unfortunately I have no idea what any of that means!

The second half of the page looks reasonable, but the first half
is all zeroes :-(.  (dd uses "*" to mean "same as above".)

It's unlikely that this is Postgres' fault; I can't think of any
plausible pathology within PG that would so carefully zero out just
half of a page.  What seems more likely is that the block size on the
underlying filesystem is 4K, and that either a kernel bug or a disk
drive error has caused the system to drop the contents of one block.
If I had to bet with no additional info, I'd bet on kernel bug.  What's
the platform exactly, and what filesystem are you using?

            regards, tom lane

Re: "invalid page header in block 597621 of relation..."error

From
Adam Witney
Date:
On 23/11/05 9:36 pm, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

> Adam Witney <awitney@sgul.ac.uk> writes:
>> Thanks for the help.... Here is the output:
>
>> adam@bugsdb:/opt$ dd bs=8k skip=73333 count=1 if=134401991.4 | od -x
>> 0000000 0000 0000 0000 0000 0000 0000 0000 0000
>> *
>> 0010000 1d9e 201c 0fa0 0000 0010 0000 0000 000b
>> 0010020 0ca6 19fb 1797 0ab4 000a 0000 0000 0001
>> 0010040 01af 0000 000a 0000 0000 0001 0ca7 0000
>> 0010060 0012 0000 0000 0010 0002 1190 068f 0c9a
>> ...
>
>> Unfortunately I have no idea what any of that means!
>
> The second half of the page looks reasonable, but the first half
> is all zeroes :-(.  (dd uses "*" to mean "same as above".)
>
> It's unlikely that this is Postgres' fault; I can't think of any
> plausible pathology within PG that would so carefully zero out just
> half of a page.  What seems more likely is that the block size on the
> underlying filesystem is 4K, and that either a kernel bug or a disk
> drive error has caused the system to drop the contents of one block.
> If I had to bet with no additional info, I'd bet on kernel bug.  What's
> the platform exactly, and what filesystem are you using?

Linux bugsdb 2.4.26 #1 SMP Wed May 5 12:08:48 BST 2004 i686 unknown

/dev/md2 on /pg_data type xfs (rw,noatime)

/dev/md2 is a software RAID5 device. Also

 PostgreSQL 7.4.8 on i686-pc-linux-gnu, compiled by GCC 2.95.4

This table is only ever COPY'd to from data files, no updates or deletes, if
I could find out which data file this bit comes from I could just reupload
that file... Is it possible to tell what the data actually is from the data
I sent?

Thanks again

Adam


--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


Re: "invalid page header in block 597621 of relation..."error

From
Tom Lane
Date:
Adam Witney <awitney@sgul.ac.uk> writes:
> This table is only ever COPY'd to from data files, no updates or deletes, if
> I could find out which data file this bit comes from I could just reupload
> that file... Is it possible to tell what the data actually is from the data
> I sent?

You might try dumping the page with "od -c" instead of "-x" and looking
to see if there are any recognizable strings.  Also try the same on the
preceding or following pages.  Or try dumping what's on the preceding
and following pages, eg
    select * from tab where ctid = '(597620,1)';
    select * from tab where ctid = '(597620,2)';
    ...
(run the item number up high enough to be sure you've seen all the live
rows on the page).

To get the file back in a usable state before you add more data, you'll
want to zero the bad block out completely, and then do a VACUUM to see
if there are any other damaged pages.

            regards, tom lane

Re: "invalid page header in block 597621 of relation..."error

From
Adam Witney
Date:
On 23/11/05 9:55 pm, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

> Adam Witney <awitney@sgul.ac.uk> writes:
>> This table is only ever COPY'd to from data files, no updates or deletes, if
>> I could find out which data file this bit comes from I could just reupload
>> that file... Is it possible to tell what the data actually is from the data
>> I sent?
>
> You might try dumping the page with "od -c" instead of "-x" and looking
> to see if there are any recognizable strings.  Also try the same on the
> preceding or following pages.  Or try dumping what's on the preceding
> and following pages, eg
> select * from tab where ctid = '(597620,1)';
> select * from tab where ctid = '(597620,2)';
> ...
> (run the item number up high enough to be sure you've seen all the live
> rows on the page).
>
> To get the file back in a usable state before you add more data, you'll
> want to zero the bad block out completely, and then do a VACUUM to see
> if there are any other damaged pages.

Aha, the select statements above made it easy to identify the correct file.

Whats the best way to zero the bad block? Could I not just delete that data
and reload it?

Thanks again

Adam


--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


Re: "invalid page header in block 597621 of relation..."error

From
Tom Lane
Date:
Adam Witney <awitney@sgul.ac.uk> writes:
> Whats the best way to zero the bad block?

Probably dd from /dev/zero, along the lines of

    dd bs=8k seek=597621 count=1 conv=notrunc if=/dev/zero of=relation

(check this before you apply it ;-)).  You probably should stop the
postmaster while doing this, in case it has a cached copy of the page.

            regards, tom lane

Re: "invalid page header in block 597621 of relation..."error

From
Adam Witney
Date:
On 23/11/05 10:20 pm, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

> Adam Witney <awitney@sgul.ac.uk> writes:
>> Whats the best way to zero the bad block?
>
> Probably dd from /dev/zero, along the lines of
>
> dd bs=8k seek=597621 count=1 conv=notrunc if=/dev/zero of=relation
>
> (check this before you apply it ;-)).  You probably should stop the
> postmaster while doing this, in case it has a cached copy of the page.

Just wanted to clarify, should this not be

dd bs=8k seek=73333 count=1 conv=notrunc if=/dev/zero of=134401991.4

I thought that the 597621 blocks were split up over the . .1 .2 .3 .4 files?

Thanks again

Adam


--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


Re: "invalid page header in block 597621 of relation..."error

From
Tom Lane
Date:
Adam Witney <awitney@sgul.ac.uk> writes:
> Just wanted to clarify, should this not be
> dd bs=8k seek=73333 count=1 conv=notrunc if=/dev/zero of=134401991.4

Looks reasonable.

            regards, tom lane

Re: "invalid page header in block 597621 of relation..."error

From
Adam Witney
Date:
On 24/11/05 2:48 pm, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

> Adam Witney <awitney@sgul.ac.uk> writes:
>> Just wanted to clarify, should this not be
>> dd bs=8k seek=73333 count=1 conv=notrunc if=/dev/zero of=134401991.4
>
> Looks reasonable.
>
> regards, tom lane

Excellent thanks. I have run it and copied the changed file back into the
right place and run vacuum:

bugasbase2=# vacuum;
WARNING:  relation "mba_data_base" page 597621 is uninitialized --- fixing
WARNING:  relation "mba_data_base" page 640793 is uninitialized --- fixing
WARNING:  relation "mba_data_base" page 640794 is uninitialized --- fixing
WARNING:  relation "mba_data_base" page 640795 is uninitialized --- fixing
WARNING:  relation "mba_data_base" page 640796 is uninitialized --- fixing
WARNING:  relation "mba_data_base" page 640797 is uninitialized --- fixing
WARNING:  relation "mba_data_base" page 640798 is uninitialized --- fixing
VACUUM

Does this look reasonable? Or should I be looking for other problems? There
are some different pages being fixed here from the ones that were shown
before (597621), they all belong to a different dataset so looks like I have
to reupload that one as well.

Thanks again for your help

Adam


--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


Re: "invalid page header in block 597621 of relation..."error

From
Tom Lane
Date:
Adam Witney <awitney@sgul.ac.uk> writes:
> bugasbase2=# vacuum;
> WARNING:  relation "mba_data_base" page 597621 is uninitialized --- fixing

This is the expected result of what you did.

> WARNING:  relation "mba_data_base" page 640793 is uninitialized --- fixing
> WARNING:  relation "mba_data_base" page 640794 is uninitialized --- fixing
> WARNING:  relation "mba_data_base" page 640795 is uninitialized --- fixing
> WARNING:  relation "mba_data_base" page 640796 is uninitialized --- fixing
> WARNING:  relation "mba_data_base" page 640797 is uninitialized --- fixing
> WARNING:  relation "mba_data_base" page 640798 is uninitialized --- fixing

That's a bit odd.  There are scenarios where all-zero pages can
legitimately appear in a PG file --- specifically, if PG extends
the table and the OS completes that task, but then there's a crash
before PG gets to write any data into the new page.  Conceivably a
crash during a bulk data load process could result in half a dozen
such pages together, but it seems improbable.  Try looking at the
data on the preceding and following pages --- does it look like there's
something missing?

            regards, tom lane

Re: "invalid page header in block 597621 of relation..."error

From
Adam Witney
Date:
On 24/11/05 3:52 pm, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

> Adam Witney <awitney@sgul.ac.uk> writes:
>> bugasbase2=# vacuum;
>> WARNING:  relation "mba_data_base" page 597621 is uninitialized --- fixing
>
> This is the expected result of what you did.
>
>> WARNING:  relation "mba_data_base" page 640793 is uninitialized --- fixing
>> WARNING:  relation "mba_data_base" page 640794 is uninitialized --- fixing
>> WARNING:  relation "mba_data_base" page 640795 is uninitialized --- fixing
>> WARNING:  relation "mba_data_base" page 640796 is uninitialized --- fixing
>> WARNING:  relation "mba_data_base" page 640797 is uninitialized --- fixing
>> WARNING:  relation "mba_data_base" page 640798 is uninitialized --- fixing
>
> That's a bit odd.  There are scenarios where all-zero pages can
> legitimately appear in a PG file --- specifically, if PG extends
> the table and the OS completes that task, but then there's a crash
> before PG gets to write any data into the new page.  Conceivably a
> crash during a bulk data load process could result in half a dozen
> such pages together, but it seems improbable.  Try looking at the
> data on the preceding and following pages --- does it look like there's
> something missing?

If you mean by that, this:

select * from mba_data_base where ctid = '(640792,12)';
select * from mba_data_base where ctid = '(640799,1)';

Then the data looks normal... Of course everything in between that is now
blank.

Is that what you meant?

Thanks

Adam


--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


Re: "invalid page header in block 597621 of relation..."error

From
Tom Lane
Date:
Adam Witney <awitney@sgul.ac.uk> writes:
> If you mean by that, this:

> select * from mba_data_base where ctid = '(640792,12)';
> select * from mba_data_base where ctid = '(640799,1)';

> Then the data looks normal... Of course everything in between that is now
> blank.

The question is, can you tell whether any data is actually missing?
In the crash scenario I was describing, no committed data would be lost.
If these blocks went zero because of filesystem misfeasance, however,
you might have lost data ...

            regards, tom lane

Re: "invalid page header in block 597621 of relation..."error

From
Adam Witney
Date:
On 24/11/05 4:19 pm, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

> Adam Witney <awitney@sgul.ac.uk> writes:
>> If you mean by that, this:
>
>> select * from mba_data_base where ctid = '(640792,12)';
>> select * from mba_data_base where ctid = '(640799,1)';
>
>> Then the data looks normal... Of course everything in between that is now
>> blank.
>
> The question is, can you tell whether any data is actually missing?
> In the crash scenario I was describing, no committed data would be lost.
> If these blocks went zero because of filesystem misfeasance, however,
> you might have lost data ...

Well each of these datasets are about 20,000 rows each... So I can tell
which one is in (640792,12) and in (640799,1), they have the same dataset id
value. Im assuming the missing ones in the middle will therefore be from the
same dataset as well then... So I know which files did not get uploaded
properly and I can re-upload them....

Is that what you were referring to?

Thanks again

Adam


--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


Re: "invalid page header in block 597621 of relation..."error

From
Tom Lane
Date:
Adam Witney <awitney@sgul.ac.uk> writes:
> On 24/11/05 4:19 pm, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
>> The question is, can you tell whether any data is actually missing?

> Well each of these datasets are about 20,000 rows each... So I can tell
> which one is in (640792,12) and in (640799,1), they have the same dataset id
> value. Im assuming the missing ones in the middle will therefore be from the
> same dataset as well then... So I know which files did not get uploaded
> properly and I can re-upload them....

But can you tell whether there *were* any in the middle?  It might be
worth counting the rows currently present for that dataset, then
reloading and counting again.

            regards, tom lane

Re: "invalid page header in block 597621 of relation..."error

From
Adam Witney
Date:
On 24/11/05 4:42 pm, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

> Adam Witney <awitney@sgul.ac.uk> writes:
>> On 24/11/05 4:19 pm, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
>>> The question is, can you tell whether any data is actually missing?
>
>> Well each of these datasets are about 20,000 rows each... So I can tell
>> which one is in (640792,12) and in (640799,1), they have the same dataset id
>> value. Im assuming the missing ones in the middle will therefore be from the
>> same dataset as well then... So I know which files did not get uploaded
>> properly and I can re-upload them....
>
> But can you tell whether there *were* any in the middle?  It might be
> worth counting the rows currently present for that dataset, then
> reloading and counting again.

Ah yes I see what you mean. I have already done a count(*) on them:

bugasbase2=# select count(*) from mba_data_base where bioassay_id in (5176);
 count
-------
 20000
(1 row)

bugasbase2=# select count(*) from mba_data_base where bioassay_id in (5177);
 count
-------
 19988
(1 row)

bugasbase2=# select count(*) from mba_data_base where bioassay_id in (5203);
 count
-------
 19928

And the two affected datasets (5177, 5203) are short on rows compared to a
correct one (5176)

Does this help identifying what went wrong?

Thanks

Adam


--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


Re: "invalid page header in block 597621 of relation..."error

From
Tom Lane
Date:
Adam Witney <awitney@sgul.ac.uk> writes:
> Does this help identifying what went wrong?

At this point I think there's no question that your filesystem is
dropping blocks :-(.  Might want to check for available kernel updates,
or contemplate changing to a different filesystem.

            regards, tom lane

Re: "invalid page header in block 597621 of relation..."error

From
Adam Witney
Date:
On 24/11/05 5:28 pm, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

> Adam Witney <awitney@sgul.ac.uk> writes:
>> Does this help identifying what went wrong?
>
> At this point I think there's no question that your filesystem is
> dropping blocks :-(.  Might want to check for available kernel updates,
> or contemplate changing to a different filesystem.

Well I have been considering moving up to the 2.6.x kernels, this has made
my other machines more reliable (they are not db machines though)

Thanks again for all your help

adam


--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


Re: "invalid page header in block 597621 of relation..."error

From
"Qingqing Zhou"
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> wrote
>
> At this point I think there's no question that your filesystem is
> dropping blocks :-(.

It is very interesting to follow this thread. But at this point, can you
explain more why "there is no question" is file system's fault?

Thanks,
Qingqing



Re: "invalid page header in block 597621 of relation..."error

From
Adam Witney
Date:
On 24/11/05 5:27 pm, "Adam Witney" <awitney@sgul.ac.uk> wrote:

> On 24/11/05 5:28 pm, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
>
>> Adam Witney <awitney@sgul.ac.uk> writes:
>>> Does this help identifying what went wrong?
>>
>> At this point I think there's no question that your filesystem is
>> dropping blocks :-(.  Might want to check for available kernel updates,
>> or contemplate changing to a different filesystem.

Ok I think this problem is not entirely fixed, my backups are failing now
with this:

pg_dump: ERROR:  unexpected chunk number 5153 (expected 21) for toast value
245334402
pg_dump: SQL command to dump the contents of table "measured_bioassay_base"
failed: PQendcopy() failed.
pg_dump: Error message from server: ERROR:  unexpected chunk number 5153
(expected 21) for toast value 245334402

measured_bioassay_base is always inserted at the same time as mba_data_base
(the table where I had the problem before) and it has a text field which is
very large.. I guess this is where the TOAST comes in.

Any ideas what is going on here?

Thanks again for any help

Adam


--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


Re: "invalid page header in block 597621

From
Lincoln Yeoh
Date:
Could it be faulty hardware?

Run memtest86? Test your drives?

At 10:49 AM 11/26/2005 +0000, Adam Witney wrote:


>Any ideas what is going on here?
>
>Thanks again for any help
>
>Adam



Re: "invalid page header in block 597621 of relation..."error

From
Tom Lane
Date:
Adam Witney <awitney@sgul.ac.uk> writes:
> pg_dump: ERROR:  unexpected chunk number 5153 (expected 21) for toast value
> 245334402

> measured_bioassay_base is always inserted at the same time as mba_data_base
> (the table where I had the problem before) and it has a text field which is
> very large..

How large is "very large" ... on the order of 10Mb?  If so I'd say this
is the same problem as we saw in your table --- a lot of consecutive
rows have gone missing.  You could examine the toast table to confirm
or deny this.

            regards, tom lane

Re: "invalid page header in block 597621 of relation..."error

From
Adam Witney
Date:
On 26/11/05 4:14 pm, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

> Adam Witney <awitney@sgul.ac.uk> writes:
>> pg_dump: ERROR:  unexpected chunk number 5153 (expected 21) for toast value
>> 245334402
>
>> measured_bioassay_base is always inserted at the same time as mba_data_base
>> (the table where I had the problem before) and it has a text field which is
>> very large..
>
> How large is "very large" ... on the order of 10Mb?  If so I'd say this
> is the same problem as we saw in your table --- a lot of consecutive
> rows have gone missing.  You could examine the toast table to confirm
> or deny this.

Yes, around 10Mb is about right.

I deleted the two datasets in mba_data_base that were affected by the empty
pages, I also deleted the relevant two rows in measured_bioassay_base... But
maybe it didn't do the right thing with the toast table for these two rows?
Where would I need to look to see what the problem is?

Thanks again for the help.

Adam


--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


Re: "invalid page header in block 597621 of relation..."error

From
Tom Lane
Date:
Adam Witney <awitney@sgul.ac.uk> writes:
> I deleted the two datasets in mba_data_base that were affected by the empty
> pages, I also deleted the relevant two rows in measured_bioassay_base... But
> maybe it didn't do the right thing with the toast table for these two rows?

Evidently the missing data in the toast table is associated with yet a
different dataset.

I'd suggest first looking into the toast table to see if you can confirm
that the missing data corresponds to a swath of zeroed-out pages.  If
that's the case then it gives even more urgency to the need to find out
what's going wrong with your filesystem (or possibly your disk drive,
but my gut feel is that this is a kernel filesystem problem).

The other thing you'd need to do is figure out which dataset you have to
reload.  A tedious way to do this is something like
    select sum(length(bigfield)) from maintable where dataset = 'xxx';
for various values of xxx until you see the error.

            regards, tom lane

Re: "invalid page header in block 597621 of relation..."error

From
Adam Witney
Date:
On 26/11/05 4:48 pm, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

> Adam Witney <awitney@sgul.ac.uk> writes:
>> I deleted the two datasets in mba_data_base that were affected by the empty
>> pages, I also deleted the relevant two rows in measured_bioassay_base... But
>> maybe it didn't do the right thing with the toast table for these two rows?
>
> Evidently the missing data in the toast table is associated with yet a
> different dataset.
>
> I'd suggest first looking into the toast table to see if you can confirm
> that the missing data corresponds to a swath of zeroed-out pages.  If
> that's the case then it gives even more urgency to the need to find out
> what's going wrong with your filesystem (or possibly your disk drive,
> but my gut feel is that this is a kernel filesystem problem).
>
> The other thing you'd need to do is figure out which dataset you have to
> reload.  A tedious way to do this is something like
> select sum(length(bigfield)) from maintable where dataset = 'xxx';
> for various values of xxx until you see the error.

Well I tracked down which row went wrong and deleted that dataset also, the
backups worked fine and it seems to be ok now.

Not really sure what caused all this, all these datasets (190 in total) went
in in one batch the other day, so for some reason 3 of them got screwed up.

Anyway I have upgraded the box to linux 2.6.14.3, so I will keep an eye on
it and see how things go.

Thanks again for your help Tom

Adam


--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


Re: "invalid page header in block 597621 of relation..."error

From
"Jim C. Nasby"
Date:
On Thu, Nov 24, 2005 at 02:59:28PM -0500, Qingqing Zhou wrote:
>
> "Tom Lane" <tgl@sss.pgh.pa.us> wrote
> >
> > At this point I think there's no question that your filesystem is
> > dropping blocks :-(.
>
> It is very interesting to follow this thread. But at this point, can you
> explain more why "there is no question" is file system's fault?

Not to put words in Tom's mouth, but as he said there's very few ways
that PostgreSQL will leave a blank page laying around, and the tests he
had the OP perform show that this almost certainly isn't one of those
cases. That means something other than PostgreSQL is dropping data.
Since it's apparently multiples of 4k it's reasonable to suspect the
kernel or the filesystem; it's pretty unlikely it's the drives.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461