Thread: "invalid page header in block 597621 of relation..." error
Hi, I just had this error in my database: bugasbase2=# SELECT count(*) from mba_data_base; ERROR: invalid page header in block 597621 of relation "mba_data_base" Any ideas whats going on? Am a bit worried as this is my production database. Thanks for any assistance Adam -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
Adam Witney <awitney@sgul.ac.uk> writes: > bugasbase2=# SELECT count(*) from mba_data_base; > ERROR: invalid page header in block 597621 of relation "mba_data_base" Sounds like a data corruption problem :-(. Do you want to pull out that page and see what's in it? Something like dd bs=8k skip=73333 count=1 if=relationfile | od -x where you need to use oid2name or something similar to determine which file contains that relation, and then append ".4" because block 597621 would be in the fifth segment of the file. (I got 73333 from "select 597621 % 131072") regards, tom lane
On 23/11/05 8:55 pm, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: > Adam Witney <awitney@sgul.ac.uk> writes: >> bugasbase2=# SELECT count(*) from mba_data_base; >> ERROR: invalid page header in block 597621 of relation "mba_data_base" > > Sounds like a data corruption problem :-(. Do you want to pull out that > page and see what's in it? Something like > > dd bs=8k skip=73333 count=1 if=relationfile | od -x > > where you need to use oid2name or something similar to determine which > file contains that relation, and then append ".4" because block 597621 > would be in the fifth segment of the file. (I got 73333 from > "select 597621 % 131072") Thanks for the help.... Here is the output: adam@bugsdb:/opt$ dd bs=8k skip=73333 count=1 if=134401991.4 | od -x 0000000 0000 0000 0000 0000 0000 0000 0000 0000 * 1+0 records in 1+0 records out 0010000 1d9e 201c 0fa0 0000 0010 0000 0000 000b 0010020 0ca6 19fb 1797 0ab4 000a 0000 0000 0001 0010040 01af 0000 000a 0000 0000 0001 0ca7 0000 0010060 0012 0000 0000 0010 0002 1190 068f 0c9a 0010100 0391 0000 0012 0000 0000 000f 0002 10aa 0010120 0971 06f3 184c 0000 0010 0000 ffff 0010 0010140 0d14 0f78 1c5c 066d 0012 0000 0000 000f 0010160 0009 0ff5 26c1 077a 0c8a 0000 0012 0000 0010200 0000 000f 0009 0ff5 26c1 077a 0c8a 0000 0010220 0008 0000 0000 0000 0002 0000 0000 0000 0010240 0000 0000 0000 0000 0000 0000 0000 0000 0010260 0000 0000 0000 0000 0001 0000 0000 0000 0010300 0000 0000 0008 0000 7059 3376 0004 0000 0010320 1d53 003b 0021 0000 0000 0000 0009 1e75 0010340 0006 004d 0813 ff28 fedf fffb ff87 1803 0010360 0000 0000 221c 0ea7 e6dc 009c 1439 0000 0010400 0017 0000 4353 522d 7461 6f69 305f 2036 0010420 3228 4537 3031 0029 0012 0000 0000 000e 0010440 0056 0a6a 1a0a 1a0a 1a2c 0000 0012 0000 0010460 0000 000e 004e 1b26 0dda 03e0 238c 0000 0010500 000a 0000 0000 0001 004c 0000 000a 0000 0010520 0000 0001 004a 0000 0012 0000 0000 000e 0010540 002c 2208 12b6 2232 0ce4 0000 0012 0000 0010560 0000 000f 0027 2448 0aad 0001 1dec 0000 0010600 0019 0000 000b 0000 0007 0000 0001 0000 0010620 000c 0000 0000 0001 0199 1388 000c 0000 0010640 0000 0001 0ca7 1388 0012 0000 0000 000e 0010660 0073 16c9 0d05 0d05 0ce4 0000 0012 0000 0010700 0000 000e 0047 2404 1e70 081d 23f0 0000 0010720 000a 0000 0000 0001 003c 0000 000a 0000 0010740 0000 0001 011a 0000 000a 0000 0000 0001 0010760 1438 0000 000c 0000 0001 0001 0002 0890 0011000 000a 0000 ffff 0002 1d4c 0000 0008 0000 0011020 0000 0001 000a 0000 0000 0001 003c 0000 0011040 0010 0000 ffff 0010 2591 04b3 1a9d 07e3 0011060 0008 0000 0000 0001 000a 0000 0000 0001 0011100 0009 0000 0012 0000 0000 0010 0001 22c2 0011120 131e 07da 21f5 0000 0012 0000 0000 4010 0011140 0001 1c60 234b 2081 0363 0000 0010 0000 0011160 ffff 000f 1e06 0395 1980 164e 0012 0000 0011200 0000 000d 019b 08d8 234b 2081 03e8 0000 0011220 0012 0000 0000 000d 0ca6 1c91 237a 0d8f 0011240 0fa0 0000 000c 0000 0000 0001 0199 1388 0011260 000c 0000 0000 0001 0ca7 1388 0012 0000 0011300 0000 0010 0001 22c2 131e 07da 21f5 0000 0011320 0012 0000 0000 4010 0001 1c60 234b 2081 0011340 0363 0000 0010 0000 ffff 000f 1e06 0395 0011360 1980 164e 0012 0000 0000 000f 0008 2032 0011400 0390 1032 0b5e 0000 0012 0000 0000 000f 0011420 0008 2032 0390 1032 0b5e 0000 0008 0000 0011440 0000 0000 0002 0000 0000 0000 0000 0000 0011460 0000 0000 0000 0000 0000 0000 0000 0000 0011500 0000 0000 0001 0000 0000 0000 0000 0000 0011520 0008 0000 7059 3376 0004 0000 1d53 003b 0011540 0021 0000 0000 0000 0009 1e75 0005 004d 0011560 0813 ff28 fedf fffb ff87 1803 0000 0000 0011600 221b 0ea7 e6db 009c 1439 0000 0017 0000 0011620 4353 522d 7461 6f69 305f 2035 3228 4137 0011640 3031 0029 0012 0000 0000 000e 005b 229f 0011660 18db 18db 1900 0000 0012 0000 0000 000e 0011700 004f 07c6 01f1 1404 0ed8 0000 000a 0000 0011720 0000 0001 0057 0000 000a 0000 0000 0001 0011740 0045 0000 0012 0000 0000 000d 0033 1608 0011760 1186 1eec 0fa0 0000 0012 0000 0000 000e 0012000 002a 113e 066c 136d 2134 0000 0019 0000 0012020 000a 0000 0007 0000 0001 0000 000c 0000 0012040 0000 0001 018c 1388 000c 0000 0000 0001 0012060 0ca7 1388 000c 0000 0000 0001 0073 1388 0012100 0012 0000 0000 000f 0032 02a5 25bd 00a9 0012120 1356 0000 000a 0000 0000 0001 0058 0000 0012140 000a 0000 0000 0001 00c9 0000 000a 0000 0012160 0000 0001 1f96 0000 000c 0000 0001 0001 0012200 0001 171f 0010 0000 ffff 0010 1eb1 0594 0012220 1652 0b29 0008 0000 0000 0001 000a 0000 0012240 0000 0001 0058 0000 000a 0000 0000 0001 0012260 0001 0000 0008 0000 0000 0001 000a 0000 0012300 0000 0001 000b 0000 0012 0000 0000 000f 0012320 0002 0207 1d03 15de 18a6 0000 0012 0000 0012340 0000 0010 0001 2436 1511 1a71 1510 0000 0012360 0010 0000 ffff 0010 1b8a 0a20 0cd1 249b 0012400 0012 0000 0000 000e 018a 1661 11fe 0c9e 0012420 11f8 0000 0010 0000 0000 000c 0ca6 1f0d 0012440 1cef 1a3e 000c 0000 0000 0001 018c 1388 0012460 000c 0000 0000 0001 0ca7 1388 0012 0000 0012500 0000 000f 0002 0207 1d03 15de 18a6 0000 0012520 0012 0000 0000 0010 0001 2436 1511 1a71 0012540 1510 0000 0010 0000 ffff 0010 1b8a 0a20 0012560 0cd1 249b 0012 0000 0000 000f 000a 188f 0012600 07f6 09d3 0d20 0000 0012 0000 0000 000f 0012620 000a 188f 07f6 09d3 0d20 0000 0008 0000 0012640 0000 0000 0002 0000 0000 0000 0000 0000 0012660 0000 0000 0000 0000 0000 0000 0000 0000 0012700 0000 0000 0001 0000 0000 0000 0000 0000 0012720 0008 0000 7059 3376 0004 0000 1d53 003b 0012740 0021 0000 0000 0000 0009 1e75 0004 004d 0012760 0813 ff28 fedf fffb ff87 1803 0000 0000 0013000 221a 0ea7 e6da 009c 1439 0000 0016 0000 0013020 4353 522d 7461 6f69 305f 2034 3228 4d37 0013040 2932 0000 0012 0000 0000 000e 0058 0339 0013060 1ae0 1cab 11f8 0000 0012 0000 0000 000e 0013100 0052 014b 1d6f 154a 0960 0000 000a 0000 0013120 0000 0001 0054 0000 000a 0000 0000 0001 0013140 004d 0000 0012 0000 0000 000e 002c 0f2c 0013160 0b57 0801 26ac 0000 0012 0000 0000 000f 0013200 0028 00a6 0b9a 13b0 01cc 0000 0019 0000 0013220 0009 0000 0007 0000 0001 0000 000a 0000 0013240 0000 0001 0179 0000 000a 0000 0000 0001 0013260 0ca5 0000 0012 0000 0000 000e 005e 0165 0013300 0594 1652 0b54 0000 0012 0000 0000 000d 0013320 0045 0487 106a 2319 1388 0000 000a 0000 0013340 0000 0001 006d 0000 000a 0000 0000 0001 0013360 00d3 0000 000a 0000 0000 0001 2581 0000 0013400 000c 0000 0001 0001 0001 1c8d 000a 0000 0013420 0000 0001 0001 0000 0008 0000 0000 0001 0013440 000a 0000 0000 0001 006d 0000 000a 0000 0013460 0000 0001 0001 0000 0008 0000 0000 0001 0013500 000a 0000 0000 0001 000c 0000 0012 0000 0013520 0000 0010 0001 25e3 1314 0a3e 0e3e 0000 0013540 0010 0000 ffff 4010 1965 0312 0555 0125 0013560 0012 0000 0000 400f 0001 2193 2165 115a 0013600 0f96 0000 0012 0000 0000 000e 0179 1965 0013620 0312 0555 012c 0000 0012 0000 0000 000d 0013640 0ca6 2193 2165 115a 0fa0 0000 000a 0000 0013660 0000 0001 0179 0000 000a 0000 0000 0001 0013700 0ca5 0000 0012 0000 0000 0010 0001 25e3 0013720 1314 0a3e 0e3e 0000 0010 0000 ffff 4010 0013740 1965 0312 0555 0125 0012 0000 0000 400f 0013760 0001 2193 2165 115a 0f96 0000 0012 0000 0014000 0000 000f 000b 20d4 0aaa 014b 118a 0000 0014020 0012 0000 0000 000f 000b 20d4 0aaa 014b 0014040 118a 0000 0008 0000 0000 0000 0002 0000 0014060 0000 0000 0000 0000 0000 0000 0000 0000 0014100 0000 0000 0000 0000 0000 0000 0001 0000 0014120 0000 0000 0000 0000 0008 0000 7059 3376 0014140 0004 0000 1d53 003b 0021 0000 0000 0000 0014160 0009 1e75 0003 004d 0813 ff28 fedf fffb 0014200 ff87 1803 0000 0000 2219 0ea7 e6d9 009c 0014220 1439 0000 0016 0000 4353 522d 7461 6f69 0014240 305f 2033 3228 4937 2932 0000 0012 0000 0014260 0000 000e 0061 12a4 1c68 1c68 1c84 0000 0014300 0012 0000 0000 000e 0050 1809 210d 1508 0014320 17d4 0000 000a 0000 0000 0001 0056 0000 0014340 000a 0000 0000 0001 004b 0000 0012 0000 0014360 0000 000e 002d 059a 147f 2017 1004 0000 0014400 0012 0000 0000 000e 0029 2326 1c5b 2064 0014420 206c 0000 0019 0000 0008 0000 0007 0000 0014440 0001 0000 000c 0000 0000 0001 0165 1388 0014460 000c 0000 0000 0001 0ca6 1388 0012 0000 0014500 0000 000e 0054 0341 0d05 0d05 0ce4 0000 0014520 0012 0000 0000 000f 003f 06c6 1b1a 1e4b 0014540 041a 0000 000a 0000 0000 0001 0058 0000 0014560 000a 0000 0000 0001 0111 0000 000a 0000 0014600 0000 0001 2182 0000 000c 0000 0001 0001 0014620 0002 07d8 0010 0000 ffff 0010 1eb1 0594 0014640 1652 0b29 0008 0000 0000 0001 000a 0000 0014660 0000 0001 0058 0000 000a 0000 0000 0001 0014700 0001 0000 0008 0000 0000 0001 000a 0000 0014720 0000 0001 000b 0000 0012 0000 0000 000f 0014740 0002 2696 1b71 19ea 1860 0000 0012 0000 0014760 0000 400f 0002 2566 162e 093e 055a 0000 0015000 0010 0000 ffff 4010 109c 0a91 1f8a 0590 0015020 0012 0000 0000 000e 0168 11de 162e 093e 0015040 0578 0000 0010 0000 0000 000c 0ca6 2424 0015060 0a91 1f8a 000c 0000 0000 0001 0165 1388 0015100 000c 0000 0000 0001 0ca6 1388 0012 0000 0015120 0000 000f 0002 2696 1b71 19ea 1860 0000 0015140 0012 0000 0000 400f 0002 2566 162e 093e 0015160 055a 0000 0010 0000 ffff 4010 109c 0a91 0015200 1f8a 0590 0012 0000 0000 000f 000a 188f 0015220 07f6 09d3 0d20 0000 0012 0000 0000 000f 0015240 000a 188f 07f6 09d3 0d20 0000 0008 0000 0015260 0000 0000 0002 0000 0000 0000 0000 0000 0015300 0000 0000 0000 0000 0000 0000 0000 0000 0015320 0000 0000 0001 0000 0000 0000 0000 0000 0015340 0008 0000 7059 3376 0004 0000 1d53 003b 0015360 0021 0000 0000 0000 0009 1e75 0002 004d 0015400 0813 ff28 fedf fffb ff87 1803 0000 0000 0015420 2218 0ea7 e6d8 009c 1439 0000 0016 0000 0015440 4353 522d 7461 6f69 305f 2032 3228 4537 0015460 2932 0000 000c 0000 0000 0001 0047 1b58 0015500 0012 0000 0000 000e 0050 19ce 025e 025e 0015520 0258 0000 000c 0000 0000 0001 0041 1388 0015540 000a 0000 0000 0001 004b 0000 0012 0000 0015560 0000 000e 0020 24e5 1349 085b 1130 0000 0015600 0012 0000 0000 000e 0028 0150 2612 1f93 0015620 0dac 0000 0019 0000 0007 0000 0007 0000 0015640 0001 0000 000c 0000 0000 0001 0154 1388 0015660 000c 0000 0000 0001 0ca3 1388 0012 0000 0015700 0000 000d 0041 2357 182e 1299 2328 0000 0015720 0012 0000 0000 000e 0031 0ec7 1e9d 20ba 0015740 0ed8 0000 000a 0000 0000 0001 003c 0000 0015760 000a 0000 0000 0001 00a5 0000 000a 0000 0016000 0000 0001 10ce 0000 000c 0000 0001 0001 0016020 0001 0ced 000a 0000 ffff 0002 1d4c 0000 0016040 0008 0000 0000 0001 000a 0000 0000 0001 0016060 003c 0000 0010 0000 ffff 0010 2591 04b3 0016100 1a9d 07e3 0008 0000 0000 0001 000a 0000 0016120 0000 0001 0009 0000 0012 0000 0000 000f 0016140 0004 04c4 0255 1e79 1c5c 0000 0012 0000 0016160 0000 400f 0002 075c 1b6a 0ccd 2562 0000 0016200 0012 0000 0000 4010 0003 1342 1afb 05ac 0016220 0b09 0000 0012 0000 0000 000e 0156 1ae4 0016240 1b6a 0ccd 2580 0000 0012 0000 0000 000d 0016260 0ca6 26ca 1afb 05ac 0bb8 0000 000c 0000 0016300 0000 0001 0154 1388 000c 0000 0000 0001 0016320 0ca3 1388 0012 0000 0000 000f 0004 04c4 0016340 0255 1e79 1c5c 0000 0012 0000 0000 400f 0016360 0002 075c 1b6a 0ccd 2562 0000 0012 0000 0016400 0000 4010 0003 1342 1afb 05ac 0b09 0000 0016420 0012 0000 0000 000f 0008 2032 0390 1032 0016440 0b5e 0000 0012 0000 0000 000f 0008 2032 0016460 0390 1032 0b5e 0000 0008 0000 0000 0000 0016500 0002 0000 0000 0000 0000 0000 0000 0000 0016520 0000 0000 0000 0000 0000 0000 0000 0000 0016540 0001 0000 0001 0000 0000 0000 0008 0000 0016560 7059 3376 0004 0000 1d53 003b 0021 0000 0016600 0000 0000 0009 1e75 0001 004d 0813 ff28 0016620 fedf fffb ff87 1803 0000 0000 2217 0ea7 0016640 e6d7 009c 1439 0000 0016 0000 4353 522d 0016660 7461 6f69 305f 2031 3228 4137 2932 0000 0016700 0012 0000 0000 000d 005a 0154 2382 2382 0016720 2328 0000 0012 0000 0000 000e 0049 12ae 0016740 17c6 255d 0898 0000 000a 0000 0000 0001 0016760 0052 0000 000c 0000 0000 0001 0040 1388 0017000 0012 0000 0000 000f 0021 1c70 024f 22b6 0017020 1f04 0000 0012 0000 0000 000e 002a 0a17 0017040 21f5 1c6d 0898 0000 0019 0000 0006 0000 0017060 0007 0000 0001 0000 000c 0000 0000 0001 0017100 0148 1388 000c 0000 0000 0001 0ca7 1388 0017120 0012 0000 0000 000e 0054 18db 18db 18db 0017140 1900 0000 0012 0000 0000 000e 0026 17e7 0017160 0fbd 213b 11f8 0000 000a 0000 0000 0001 0017200 0058 0000 000a 0000 0000 0001 00b8 0000 0017220 000a 0000 0000 0001 1ef3 0000 000c 0000 0017240 0001 0001 0001 0dc0 0010 0000 ffff 0010 0017260 1eb1 0594 1652 0b29 0008 0000 0000 0001 0017300 000a 0000 0000 0001 0058 0000 000a 0000 0017320 0000 0001 0001 0000 0008 0000 0000 0001 0017340 000a 0000 0000 0001 000b 0000 0012 0000 0017360 0000 000f 0001 11e8 2467 248b 0014 0000 0017400 0012 0000 0000 0010 0001 0f29 16a4 26a4 0017420 1f56 0000 0010 0000 ffff 0010 117a 172e 0017440 09c1 0198 0012 0000 0000 000d 0147 045e 0017460 106b 006b 07d0 0000 0010 0000 0000 000c 0017500 0ca7 020d 0fe1 1d4f 000c 0000 0000 0001 0017520 0148 1388 000c 0000 0000 0001 0ca7 1388 0017540 0012 0000 0000 000f 0001 11e8 2467 248b 0017560 0014 0000 0012 0000 0000 0010 0001 0f29 0017600 16a4 26a4 1f56 0000 0010 0000 ffff 0010 0017620 117a 172e 09c1 0198 0012 0000 0000 000f 0017640 000a 188f 07f6 09d3 0d20 0000 0012 0000 0017660 0000 000f 000a 188f 07f6 09d3 0d20 0000 0017700 0008 0000 0000 0000 0002 0000 0000 0000 0017720 0000 0000 0000 0000 0000 0000 0000 0000 0017740 0000 0000 0000 0000 0001 0000 0000 0000 0017760 0000 0000 0008 0000 7059 3376 0004 0000 0020000 Unfortunately I have no idea what any of that means! Does it help? Thanks again Adam -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
Adam Witney <awitney@sgul.ac.uk> writes: > Thanks for the help.... Here is the output: > adam@bugsdb:/opt$ dd bs=8k skip=73333 count=1 if=134401991.4 | od -x > 0000000 0000 0000 0000 0000 0000 0000 0000 0000 > * > 0010000 1d9e 201c 0fa0 0000 0010 0000 0000 000b > 0010020 0ca6 19fb 1797 0ab4 000a 0000 0000 0001 > 0010040 01af 0000 000a 0000 0000 0001 0ca7 0000 > 0010060 0012 0000 0000 0010 0002 1190 068f 0c9a > ... > Unfortunately I have no idea what any of that means! The second half of the page looks reasonable, but the first half is all zeroes :-(. (dd uses "*" to mean "same as above".) It's unlikely that this is Postgres' fault; I can't think of any plausible pathology within PG that would so carefully zero out just half of a page. What seems more likely is that the block size on the underlying filesystem is 4K, and that either a kernel bug or a disk drive error has caused the system to drop the contents of one block. If I had to bet with no additional info, I'd bet on kernel bug. What's the platform exactly, and what filesystem are you using? regards, tom lane
On 23/11/05 9:36 pm, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: > Adam Witney <awitney@sgul.ac.uk> writes: >> Thanks for the help.... Here is the output: > >> adam@bugsdb:/opt$ dd bs=8k skip=73333 count=1 if=134401991.4 | od -x >> 0000000 0000 0000 0000 0000 0000 0000 0000 0000 >> * >> 0010000 1d9e 201c 0fa0 0000 0010 0000 0000 000b >> 0010020 0ca6 19fb 1797 0ab4 000a 0000 0000 0001 >> 0010040 01af 0000 000a 0000 0000 0001 0ca7 0000 >> 0010060 0012 0000 0000 0010 0002 1190 068f 0c9a >> ... > >> Unfortunately I have no idea what any of that means! > > The second half of the page looks reasonable, but the first half > is all zeroes :-(. (dd uses "*" to mean "same as above".) > > It's unlikely that this is Postgres' fault; I can't think of any > plausible pathology within PG that would so carefully zero out just > half of a page. What seems more likely is that the block size on the > underlying filesystem is 4K, and that either a kernel bug or a disk > drive error has caused the system to drop the contents of one block. > If I had to bet with no additional info, I'd bet on kernel bug. What's > the platform exactly, and what filesystem are you using? Linux bugsdb 2.4.26 #1 SMP Wed May 5 12:08:48 BST 2004 i686 unknown /dev/md2 on /pg_data type xfs (rw,noatime) /dev/md2 is a software RAID5 device. Also PostgreSQL 7.4.8 on i686-pc-linux-gnu, compiled by GCC 2.95.4 This table is only ever COPY'd to from data files, no updates or deletes, if I could find out which data file this bit comes from I could just reupload that file... Is it possible to tell what the data actually is from the data I sent? Thanks again Adam -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
Adam Witney <awitney@sgul.ac.uk> writes: > This table is only ever COPY'd to from data files, no updates or deletes, if > I could find out which data file this bit comes from I could just reupload > that file... Is it possible to tell what the data actually is from the data > I sent? You might try dumping the page with "od -c" instead of "-x" and looking to see if there are any recognizable strings. Also try the same on the preceding or following pages. Or try dumping what's on the preceding and following pages, eg select * from tab where ctid = '(597620,1)'; select * from tab where ctid = '(597620,2)'; ... (run the item number up high enough to be sure you've seen all the live rows on the page). To get the file back in a usable state before you add more data, you'll want to zero the bad block out completely, and then do a VACUUM to see if there are any other damaged pages. regards, tom lane
On 23/11/05 9:55 pm, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: > Adam Witney <awitney@sgul.ac.uk> writes: >> This table is only ever COPY'd to from data files, no updates or deletes, if >> I could find out which data file this bit comes from I could just reupload >> that file... Is it possible to tell what the data actually is from the data >> I sent? > > You might try dumping the page with "od -c" instead of "-x" and looking > to see if there are any recognizable strings. Also try the same on the > preceding or following pages. Or try dumping what's on the preceding > and following pages, eg > select * from tab where ctid = '(597620,1)'; > select * from tab where ctid = '(597620,2)'; > ... > (run the item number up high enough to be sure you've seen all the live > rows on the page). > > To get the file back in a usable state before you add more data, you'll > want to zero the bad block out completely, and then do a VACUUM to see > if there are any other damaged pages. Aha, the select statements above made it easy to identify the correct file. Whats the best way to zero the bad block? Could I not just delete that data and reload it? Thanks again Adam -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
Adam Witney <awitney@sgul.ac.uk> writes: > Whats the best way to zero the bad block? Probably dd from /dev/zero, along the lines of dd bs=8k seek=597621 count=1 conv=notrunc if=/dev/zero of=relation (check this before you apply it ;-)). You probably should stop the postmaster while doing this, in case it has a cached copy of the page. regards, tom lane
On 23/11/05 10:20 pm, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: > Adam Witney <awitney@sgul.ac.uk> writes: >> Whats the best way to zero the bad block? > > Probably dd from /dev/zero, along the lines of > > dd bs=8k seek=597621 count=1 conv=notrunc if=/dev/zero of=relation > > (check this before you apply it ;-)). You probably should stop the > postmaster while doing this, in case it has a cached copy of the page. Just wanted to clarify, should this not be dd bs=8k seek=73333 count=1 conv=notrunc if=/dev/zero of=134401991.4 I thought that the 597621 blocks were split up over the . .1 .2 .3 .4 files? Thanks again Adam -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
Adam Witney <awitney@sgul.ac.uk> writes: > Just wanted to clarify, should this not be > dd bs=8k seek=73333 count=1 conv=notrunc if=/dev/zero of=134401991.4 Looks reasonable. regards, tom lane
On 24/11/05 2:48 pm, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: > Adam Witney <awitney@sgul.ac.uk> writes: >> Just wanted to clarify, should this not be >> dd bs=8k seek=73333 count=1 conv=notrunc if=/dev/zero of=134401991.4 > > Looks reasonable. > > regards, tom lane Excellent thanks. I have run it and copied the changed file back into the right place and run vacuum: bugasbase2=# vacuum; WARNING: relation "mba_data_base" page 597621 is uninitialized --- fixing WARNING: relation "mba_data_base" page 640793 is uninitialized --- fixing WARNING: relation "mba_data_base" page 640794 is uninitialized --- fixing WARNING: relation "mba_data_base" page 640795 is uninitialized --- fixing WARNING: relation "mba_data_base" page 640796 is uninitialized --- fixing WARNING: relation "mba_data_base" page 640797 is uninitialized --- fixing WARNING: relation "mba_data_base" page 640798 is uninitialized --- fixing VACUUM Does this look reasonable? Or should I be looking for other problems? There are some different pages being fixed here from the ones that were shown before (597621), they all belong to a different dataset so looks like I have to reupload that one as well. Thanks again for your help Adam -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
Adam Witney <awitney@sgul.ac.uk> writes: > bugasbase2=# vacuum; > WARNING: relation "mba_data_base" page 597621 is uninitialized --- fixing This is the expected result of what you did. > WARNING: relation "mba_data_base" page 640793 is uninitialized --- fixing > WARNING: relation "mba_data_base" page 640794 is uninitialized --- fixing > WARNING: relation "mba_data_base" page 640795 is uninitialized --- fixing > WARNING: relation "mba_data_base" page 640796 is uninitialized --- fixing > WARNING: relation "mba_data_base" page 640797 is uninitialized --- fixing > WARNING: relation "mba_data_base" page 640798 is uninitialized --- fixing That's a bit odd. There are scenarios where all-zero pages can legitimately appear in a PG file --- specifically, if PG extends the table and the OS completes that task, but then there's a crash before PG gets to write any data into the new page. Conceivably a crash during a bulk data load process could result in half a dozen such pages together, but it seems improbable. Try looking at the data on the preceding and following pages --- does it look like there's something missing? regards, tom lane
On 24/11/05 3:52 pm, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: > Adam Witney <awitney@sgul.ac.uk> writes: >> bugasbase2=# vacuum; >> WARNING: relation "mba_data_base" page 597621 is uninitialized --- fixing > > This is the expected result of what you did. > >> WARNING: relation "mba_data_base" page 640793 is uninitialized --- fixing >> WARNING: relation "mba_data_base" page 640794 is uninitialized --- fixing >> WARNING: relation "mba_data_base" page 640795 is uninitialized --- fixing >> WARNING: relation "mba_data_base" page 640796 is uninitialized --- fixing >> WARNING: relation "mba_data_base" page 640797 is uninitialized --- fixing >> WARNING: relation "mba_data_base" page 640798 is uninitialized --- fixing > > That's a bit odd. There are scenarios where all-zero pages can > legitimately appear in a PG file --- specifically, if PG extends > the table and the OS completes that task, but then there's a crash > before PG gets to write any data into the new page. Conceivably a > crash during a bulk data load process could result in half a dozen > such pages together, but it seems improbable. Try looking at the > data on the preceding and following pages --- does it look like there's > something missing? If you mean by that, this: select * from mba_data_base where ctid = '(640792,12)'; select * from mba_data_base where ctid = '(640799,1)'; Then the data looks normal... Of course everything in between that is now blank. Is that what you meant? Thanks Adam -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
Adam Witney <awitney@sgul.ac.uk> writes: > If you mean by that, this: > select * from mba_data_base where ctid = '(640792,12)'; > select * from mba_data_base where ctid = '(640799,1)'; > Then the data looks normal... Of course everything in between that is now > blank. The question is, can you tell whether any data is actually missing? In the crash scenario I was describing, no committed data would be lost. If these blocks went zero because of filesystem misfeasance, however, you might have lost data ... regards, tom lane
On 24/11/05 4:19 pm, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: > Adam Witney <awitney@sgul.ac.uk> writes: >> If you mean by that, this: > >> select * from mba_data_base where ctid = '(640792,12)'; >> select * from mba_data_base where ctid = '(640799,1)'; > >> Then the data looks normal... Of course everything in between that is now >> blank. > > The question is, can you tell whether any data is actually missing? > In the crash scenario I was describing, no committed data would be lost. > If these blocks went zero because of filesystem misfeasance, however, > you might have lost data ... Well each of these datasets are about 20,000 rows each... So I can tell which one is in (640792,12) and in (640799,1), they have the same dataset id value. Im assuming the missing ones in the middle will therefore be from the same dataset as well then... So I know which files did not get uploaded properly and I can re-upload them.... Is that what you were referring to? Thanks again Adam -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
Adam Witney <awitney@sgul.ac.uk> writes: > On 24/11/05 4:19 pm, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: >> The question is, can you tell whether any data is actually missing? > Well each of these datasets are about 20,000 rows each... So I can tell > which one is in (640792,12) and in (640799,1), they have the same dataset id > value. Im assuming the missing ones in the middle will therefore be from the > same dataset as well then... So I know which files did not get uploaded > properly and I can re-upload them.... But can you tell whether there *were* any in the middle? It might be worth counting the rows currently present for that dataset, then reloading and counting again. regards, tom lane
On 24/11/05 4:42 pm, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: > Adam Witney <awitney@sgul.ac.uk> writes: >> On 24/11/05 4:19 pm, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: >>> The question is, can you tell whether any data is actually missing? > >> Well each of these datasets are about 20,000 rows each... So I can tell >> which one is in (640792,12) and in (640799,1), they have the same dataset id >> value. Im assuming the missing ones in the middle will therefore be from the >> same dataset as well then... So I know which files did not get uploaded >> properly and I can re-upload them.... > > But can you tell whether there *were* any in the middle? It might be > worth counting the rows currently present for that dataset, then > reloading and counting again. Ah yes I see what you mean. I have already done a count(*) on them: bugasbase2=# select count(*) from mba_data_base where bioassay_id in (5176); count ------- 20000 (1 row) bugasbase2=# select count(*) from mba_data_base where bioassay_id in (5177); count ------- 19988 (1 row) bugasbase2=# select count(*) from mba_data_base where bioassay_id in (5203); count ------- 19928 And the two affected datasets (5177, 5203) are short on rows compared to a correct one (5176) Does this help identifying what went wrong? Thanks Adam -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
Adam Witney <awitney@sgul.ac.uk> writes: > Does this help identifying what went wrong? At this point I think there's no question that your filesystem is dropping blocks :-(. Might want to check for available kernel updates, or contemplate changing to a different filesystem. regards, tom lane
On 24/11/05 5:28 pm, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: > Adam Witney <awitney@sgul.ac.uk> writes: >> Does this help identifying what went wrong? > > At this point I think there's no question that your filesystem is > dropping blocks :-(. Might want to check for available kernel updates, > or contemplate changing to a different filesystem. Well I have been considering moving up to the 2.6.x kernels, this has made my other machines more reliable (they are not db machines though) Thanks again for all your help adam -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
"Tom Lane" <tgl@sss.pgh.pa.us> wrote > > At this point I think there's no question that your filesystem is > dropping blocks :-(. It is very interesting to follow this thread. But at this point, can you explain more why "there is no question" is file system's fault? Thanks, Qingqing
On 24/11/05 5:27 pm, "Adam Witney" <awitney@sgul.ac.uk> wrote: > On 24/11/05 5:28 pm, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: > >> Adam Witney <awitney@sgul.ac.uk> writes: >>> Does this help identifying what went wrong? >> >> At this point I think there's no question that your filesystem is >> dropping blocks :-(. Might want to check for available kernel updates, >> or contemplate changing to a different filesystem. Ok I think this problem is not entirely fixed, my backups are failing now with this: pg_dump: ERROR: unexpected chunk number 5153 (expected 21) for toast value 245334402 pg_dump: SQL command to dump the contents of table "measured_bioassay_base" failed: PQendcopy() failed. pg_dump: Error message from server: ERROR: unexpected chunk number 5153 (expected 21) for toast value 245334402 measured_bioassay_base is always inserted at the same time as mba_data_base (the table where I had the problem before) and it has a text field which is very large.. I guess this is where the TOAST comes in. Any ideas what is going on here? Thanks again for any help Adam -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
Could it be faulty hardware? Run memtest86? Test your drives? At 10:49 AM 11/26/2005 +0000, Adam Witney wrote: >Any ideas what is going on here? > >Thanks again for any help > >Adam
Adam Witney <awitney@sgul.ac.uk> writes: > pg_dump: ERROR: unexpected chunk number 5153 (expected 21) for toast value > 245334402 > measured_bioassay_base is always inserted at the same time as mba_data_base > (the table where I had the problem before) and it has a text field which is > very large.. How large is "very large" ... on the order of 10Mb? If so I'd say this is the same problem as we saw in your table --- a lot of consecutive rows have gone missing. You could examine the toast table to confirm or deny this. regards, tom lane
On 26/11/05 4:14 pm, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: > Adam Witney <awitney@sgul.ac.uk> writes: >> pg_dump: ERROR: unexpected chunk number 5153 (expected 21) for toast value >> 245334402 > >> measured_bioassay_base is always inserted at the same time as mba_data_base >> (the table where I had the problem before) and it has a text field which is >> very large.. > > How large is "very large" ... on the order of 10Mb? If so I'd say this > is the same problem as we saw in your table --- a lot of consecutive > rows have gone missing. You could examine the toast table to confirm > or deny this. Yes, around 10Mb is about right. I deleted the two datasets in mba_data_base that were affected by the empty pages, I also deleted the relevant two rows in measured_bioassay_base... But maybe it didn't do the right thing with the toast table for these two rows? Where would I need to look to see what the problem is? Thanks again for the help. Adam -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
Adam Witney <awitney@sgul.ac.uk> writes: > I deleted the two datasets in mba_data_base that were affected by the empty > pages, I also deleted the relevant two rows in measured_bioassay_base... But > maybe it didn't do the right thing with the toast table for these two rows? Evidently the missing data in the toast table is associated with yet a different dataset. I'd suggest first looking into the toast table to see if you can confirm that the missing data corresponds to a swath of zeroed-out pages. If that's the case then it gives even more urgency to the need to find out what's going wrong with your filesystem (or possibly your disk drive, but my gut feel is that this is a kernel filesystem problem). The other thing you'd need to do is figure out which dataset you have to reload. A tedious way to do this is something like select sum(length(bigfield)) from maintable where dataset = 'xxx'; for various values of xxx until you see the error. regards, tom lane
On 26/11/05 4:48 pm, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: > Adam Witney <awitney@sgul.ac.uk> writes: >> I deleted the two datasets in mba_data_base that were affected by the empty >> pages, I also deleted the relevant two rows in measured_bioassay_base... But >> maybe it didn't do the right thing with the toast table for these two rows? > > Evidently the missing data in the toast table is associated with yet a > different dataset. > > I'd suggest first looking into the toast table to see if you can confirm > that the missing data corresponds to a swath of zeroed-out pages. If > that's the case then it gives even more urgency to the need to find out > what's going wrong with your filesystem (or possibly your disk drive, > but my gut feel is that this is a kernel filesystem problem). > > The other thing you'd need to do is figure out which dataset you have to > reload. A tedious way to do this is something like > select sum(length(bigfield)) from maintable where dataset = 'xxx'; > for various values of xxx until you see the error. Well I tracked down which row went wrong and deleted that dataset also, the backups worked fine and it seems to be ok now. Not really sure what caused all this, all these datasets (190 in total) went in in one batch the other day, so for some reason 3 of them got screwed up. Anyway I have upgraded the box to linux 2.6.14.3, so I will keep an eye on it and see how things go. Thanks again for your help Tom Adam -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
On Thu, Nov 24, 2005 at 02:59:28PM -0500, Qingqing Zhou wrote: > > "Tom Lane" <tgl@sss.pgh.pa.us> wrote > > > > At this point I think there's no question that your filesystem is > > dropping blocks :-(. > > It is very interesting to follow this thread. But at this point, can you > explain more why "there is no question" is file system's fault? Not to put words in Tom's mouth, but as he said there's very few ways that PostgreSQL will leave a blank page laying around, and the tests he had the OP perform show that this almost certainly isn't one of those cases. That means something other than PostgreSQL is dropping data. Since it's apparently multiples of 4k it's reasonable to suspect the kernel or the filesystem; it's pretty unlikely it's the drives. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461