Re: duplicate primary index in bayes db from SpamAssassin - Mailing list pgsql-admin
From | Tom Lane |
---|---|
Subject | Re: duplicate primary index in bayes db from SpamAssassin |
Date | |
Msg-id | 21948.1209587149@sss.pgh.pa.us Whole thread Raw |
In response to | Re: duplicate primary index in bayes db from SpamAssassin (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: duplicate primary index in bayes db from SpamAssassin
|
List | pgsql-admin |
I wrote: > This latest bit with many fewer index entries than table entries is just > weird, though. The 8.1.10 VACUUM bug could have led to having *more* > index entries than table entries, but not the other way around. > Before you zap the database, would you be willing to make the actual > database files (not a dump, but the stored files) available? I'd > be interested to take a closer look at that index. I finally got a chance to do some analysis, and what I find is multiple instances of index corruption. There are 18 index pages that are all zeroes: Bogus pd_special pointer on page 65663 Bogus pd_special pointer on page 65847 Bogus pd_special pointer on page 65932 Bogus pd_special pointer on page 65948 Bogus pd_special pointer on page 65956 Bogus pd_special pointer on page 65967 Bogus pd_special pointer on page 65974 Bogus pd_special pointer on page 66066 Bogus pd_special pointer on page 66456 Bogus pd_special pointer on page 66775 Bogus pd_special pointer on page 67979 Bogus pd_special pointer on page 68126 Bogus pd_special pointer on page 68501 Bogus pd_special pointer on page 68565 Bogus pd_special pointer on page 68854 Bogus pd_special pointer on page 70321 Bogus pd_special pointer on page 70326 Bogus pd_special pointer on page 75066 It is possible to have a zero page in an index, but only if a backend extended the index by one page and then crashed before it could write a WAL record about the added page. Unless you've been having a whole lot of PANIC crashes you didn't mention, there's something else going on here. I also found a bunch of places where the sibling links between pages are wrong/inconsistent: block 1174 has left link 27874, but block 27874 has right link 66918 block 1309 has right link 59817, but block 59817 has left link 46980 block 2476 has right link 27885, but block 27885 has left link 71959 block 4489 has right link 59819, but block 59819 has left link 61853 block 7591 has right link 25594, but block 25594 has left link 27816 block 20986 has right link 1174, but block 1174 has left link 27874 block 24616 has left link 1309, but block 1309 has right link 59817 block 25416 has left link 5155, but block 5155 has right link 59809 block 27484 has left link 4489, but block 4489 has right link 59819 block 27816 has left link 7591, but block 7591 has right link 25594 block 43322 has left link 2476, but block 2476 has right link 27885 block 59809 has right link 25416, but block 25416 has left link 5155 block 59810 has left link 15449, but block 15449 has right link 39432 block 59810 has right link 23744, but block 23744 has left link 40043 block 59811 has left link 10693, but block 10693 has right link 24259 block 59811 has right link 24259, but block 24259 has left link 10693 block 59812 has left link 17191, but block 17191 has right link 68457 block 59812 has right link 24704, but block 24704 has left link 68457 block 59813 has left link 20391, but block 20391 has right link 21998 block 59813 has right link 27337, but block 27337 has left link 52006 block 59814 has left link 6918, but block 6918 has right link 17551 block 59814 has right link 17551, but block 17551 has left link 6918 block 59815 has left link 16727, but block 16727 has right link 32141 block 59815 has right link 22466, but block 22466 has left link 32141 THe fact that blocks between 59809 and 59819 are involved in the majority of these cases is interesting. I'm not sure what to make of it, but it suggests that some single event messed up a number of block splits or deletions, rather than that there's an infrequent bug in those operations. There are also odd inconsistencies between the tree levels: there are 64142 blocks in tree level 0 but only 64136 downlinks in level 1, and there are four places where a downlink's next key fails to match the high key of the child page. So this index has definitely suffered multiple events where a split or deletion didn't work right. On the whole I'm thinking that you may have an intermittent hardware problem, especially since we've not seen comparable symptoms reported by anyone else. regards, tom lane
pgsql-admin by date: