Thread: Corrupted index
During a vacuum, I ran into this error: vacuumdb: vacuuming of database "friend" failed: ERROR: invalid page header in block 41661 of relation "friend_pkey" I've read the posts on this newsgroup and it's clear that I have to REINDEX to fix this. The bigger question is -- why did this happen in the first place? I'm using open_sync as my WAL_SYNC_METHOD. Is it better to use fsync? Running Suse 9.0 on a Quad Opteron, with direct attached storage. The RAID card is set to write through all of the writes, so I don't think this is a case where the RAID card is lying about a commit. -Akash
Akash Garg <akash.garg@gmail.com> writes: > During a vacuum, I ran into this error: > vacuumdb: vacuuming of database "friend" failed: ERROR: invalid page > header in block 41661 of relation "friend_pkey" > I've read the posts on this newsgroup and it's clear that I have to > REINDEX to fix this. The bigger question is -- why did this happen in > the first place? Good question. If you haven't REINDEXed yet, it would be worthwhile to get a hex dump (use "od -x") of that page of the index first. Speculation with no data to look at is a bit pointless ... regards, tom lane
Akash Garg <akash.garg@gmail.com> writes: > I'll try running the od command -- I'm just a little confused on where > I run it. I tried running od -x 41661 but that doesn't yield any > results. I'm assuming I have to run this command on the actual index > file itself -- how do I do this? See contrib/oid2name, or read the docs at http://developer.postgresql.org/docs/postgres/storage.html to learn how to figure out which file is the index. (In pre-8.0 PG, ignore what those docs say about tablespaces, but the rest of the info goes back a ways.) The short answer is you need the database's OID from pg_database, and the index's relfilenode from pg_class. regards, tom lane
Tom, I'll try running the od command -- I'm just a little confused on where I run it. I tried running od -x 41661 but that doesn't yield any results. I'm assuming I have to run this command on the actual index file itself -- how do I do this? Thanks, Akash On 6/23/05, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Akash Garg <akash.garg@gmail.com> writes: > > During a vacuum, I ran into this error: > > vacuumdb: vacuuming of database "friend" failed: ERROR: invalid page > > header in block 41661 of relation "friend_pkey" > > > I've read the posts on this newsgroup and it's clear that I have to > > REINDEX to fix this. The bigger question is -- why did this happen in > > the first place? > > Good question. If you haven't REINDEXed yet, it would be worthwhile > to get a hex dump (use "od -x") of that page of the index first. > Speculation with no data to look at is a bit pointless ... > > regards, tom lane >
I used oid2name to find the index files: 168807081 168807081.1 168807081.2 168807081.3 168807081.4 Now how do I run the od command to find the block in question? Thanks, Akash On 6/23/05, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Akash Garg <akash.garg@gmail.com> writes: > > I'll try running the od command -- I'm just a little confused on where > > I run it. I tried running od -x 41661 but that doesn't yield any > > results. I'm assuming I have to run this command on the actual index > > file itself -- how do I do this? > > See contrib/oid2name, or read the docs at > http://developer.postgresql.org/docs/postgres/storage.html > to learn how to figure out which file is the index. (In pre-8.0 PG, > ignore what those docs say about tablespaces, but the rest of the info > goes back a ways.) The short answer is you need the database's OID > from pg_database, and the index's relfilenode from pg_class. > > regards, tom lane >
Akash Garg <akash.garg@gmail.com> writes: > I used oid2name to find the index files: > 168807081 > 168807081.1 > 168807081.2 > 168807081.3 > 168807081.4 > Now how do I run the od command to find the block in question? Rather than doing the math by hand, let dd do it: dd bs=8k skip=41661 count=1 </path/to/indexfile | od -x This assumes your PG was built with the default 8k block size, but if it wasn't you'd presumably know that ... regards, tom lane
Akash Garg <akash.garg@gmail.com> writes: > Ok, I ran that command on the index files -- they are attached. I'm a bit confused --- you mean you extracted block 41661 from each of the index's segments? If so, only the first one is actually relevant here. > I notice that in file2, file2 and file3, I notice a pattern like this: > 0002660 8980 0020 8970 0020 8960 0020 8950 0020 > 0002700 8940 0020 8930 0020 8920 0020 8910 0020 > 0002720 0900 0020 0000 0000 0000 0000 0000 0000 > 0002740 0000 0000 0000 0000 0000 0000 0000 0000 > * > 0004400 000f 611c 003e 0010 70d9 0b69 0000 0000 > 0004420 000f 5f97 0003 0010 70d8 0b69 0000 0000 > 0004440 000f 6118 0077 0010 70d7 0b69 0000 0000 > 0004460 000f 6118 0075 0010 70d6 0b69 0000 0000 That's what it should look like --- "*" is od's notation for "more of the same", in this case lines containing all zeroes. Those pages look exactly like what I'd expect a PG index page to look like. The file1 extract, however, is pure text and not PG data of any kind. The first part of it looks like 00000000: 732c 205c 725c 6e77 6861 7420 646f 6573 s, \r\nwhat does 00000010: 2061 2068 756d 616e 6974 6172 6961 6e20 a humanitarian 00000020: 6561 743f 205c 725c 6e5c 725c 6e53 6f6d eat? \r\n\r\nSom 00000030: 6574 696d 6573 2c20 4920 7468 696e 6b20 etimes, I think 00000040: 616c 6c20 7468 6520 666f 6c6b 7320 7768 all the folks wh 00000050: 6f20 6772 6577 2075 7020 7370 6561 6b69 o grew up speaki 00000060: 6e67 2045 6e67 6c69 7368 205c 725c 6e73 ng English \r\ns 00000070: 686f 756c 6420 6265 2063 6f6d 6d69 7474 hould be committ 00000080: 6564 2074 6f20 616e 2061 7379 6c75 6d20 ed to an asylum 00000090: 666f 7220 7468 6520 7665 7262 616c 6c79 for the verbally 000000a0: 2069 6e73 616e 652e 205c 725c 6e5c 725c insane. \r\n\r\ 000000b0: 6e49 6e20 7768 6174 206f 7468 6572 206c nIn what other l and it goes downhill from there (something out of a spam folder maybe?) What it looks like to me is that a page of an entirely unrelated file got dropped into the Postgres index. This suggests either a disk drive error (writing someplace else than it was commanded to) or a kernel bug (writing the wrong buffer). I'd suggest some disk-drive testing as well as checking into bug fixes for your kernel. I'm pretty well convinced that this wasn't Postgres' fault. regards, tom lane