Thread: index corruption?
This is the second time I've seen this. 7.3.2 This particular table is empty. I'm trying to read it in a perl script. It doesn't duplicate regularly (I have a script that creates the database by copying table data from another databases). This is the error in the pgsql log: 2003-02-13 16:21:42 [8843] ERROR: Index external_signstops_pkey is not a btree 2003-02-13 16:21:42 [8843] ERROR: current transaction is aborted, queries ignored until end of transaction block Any ideas? -- Laurette Cisneros, L.D. The Database Group (510) 420-3137 NextBus Information Systems, Inc. www.nextbus.com ---------------------------------- "No man is wise enough by himself" -- Titus Maccius Plautus (254 Bc - 184 BC), Miles Gloriosus
Laurette Cisneros <laurette@nextbus.com> writes: > This is the error in the pgsql log: > 2003-02-13 16:21:42 [8843] ERROR: Index external_signstops_pkey is not a > btree This says that one of two fields that should never change, in fixed positions in the first block of a btree index, didn't have the right values. I am not aware of any PG bugs that could overwrite those fields. I think the most likely bet is that you've got hardware issues ... have you run memory and disk diagnostics lately? regards, tom lane
On Feb 13, 2003, Tom Lane wrote: > > Laurette Cisneros <laurette@nextbus.com> writes: > > This is the error in the pgsql log: > > 2003-02-13 16:21:42 [8843] ERROR: Index external_signstops_pkey is > > not a btree > > This says that one of two fields that should never change, in fixed > positions in the first block of a btree index, didn't have the right > values. I am not aware of any PG bugs that could overwrite those > fields. I think the most likely bet is that you've got hardware > issues ... have you run memory and disk diagnostics lately? I am seeing this same problem on two separate machines, one brand new, one older. Not sure yet what is causing it, but seems pretty unlikely that it is hardware-related. Ed
On Monday March 31 2003 3:38, Ed L. wrote: > On Feb 13, 2003, Tom Lane wrote: > > Laurette Cisneros <laurette@nextbus.com> writes: > > > This is the error in the pgsql log: > > > 2003-02-13 16:21:42 [8843] ERROR: Index external_signstops_pkey is > > > not a btree > > > > This says that one of two fields that should never change, in fixed > > positions in the first block of a btree index, didn't have the right > > values. I am not aware of any PG bugs that could overwrite those > > fields. I think the most likely bet is that you've got hardware > > issues ... have you run memory and disk diagnostics lately? > > I am seeing this same problem on two separate machines, one brand new, > one older. Not sure yet what is causing it, but seems pretty unlikely > that it is hardware-related. I am dabbling for the first time with a (crashing) C trigger, so that may be the culprit here. Ed
"Ed L." <pgsql@bluepolka.net> writes: >> I am seeing this same problem on two separate machines, one brand new, >> one older. Not sure yet what is causing it, but seems pretty unlikely >> that it is hardware-related. > I am dabbling for the first time with a (crashing) C trigger, so that may be > the culprit here. Could well be, although past experience has been that crashes in C code seldom lead directly to disk corruption. (First, the bogus code has to overwrite a shared disk buffer. If you follow what I consider the better path of not making your shared buffers a large fraction of the address space, the odds of a wild store happening to hit a disk buffer aren't high. Second, once it's corrupted a shared buffer, it has to contrive to cause that buffer to get written out before the core dump occurs --- in most cases, the fact that the postmaster abandons the contents of shared memory after a backend crash protects us from this kind of failure.) When you find the problem, please take note of whether there's something involved that increases the chances of corruption getting to disk. We might want to try to do something about it ... regards, tom lane
On Monday March 31 2003 3:54, Tom Lane wrote: > "Ed L." <pgsql@bluepolka.net> writes: > >> I am seeing this same problem on two separate machines, one brand new, > >> one older. Not sure yet what is causing it, but seems pretty unlikely > >> that it is hardware-related. > > > > I am dabbling for the first time with a (crashing) C trigger, so that > > may be the culprit here. > > Could well be, although past experience has been that crashes in C code > seldom lead directly to disk corruption. (First, the bogus code has to > overwrite a shared disk buffer. If you follow what I consider the > better path of not making your shared buffers a large fraction of the > address space, the odds of a wild store happening to hit a disk buffer > aren't high. Second, once it's corrupted a shared buffer, it has to > contrive to cause that buffer to get written out before the core dump > occurs --- in most cases, the fact that the postmaster abandons the > contents of shared memory after a backend crash protects us from this > kind of failure.) > > When you find the problem, please take note of whether there's something > involved that increases the chances of corruption getting to disk. We > might want to try to do something about it ... It is definitely due to some rogue trigger code. Not sure what exactly, but if I remove a certain code segment the problem disappears. Ed
On Mon, 31 Mar 2003, Ed L. wrote: > On Feb 13, 2003, Tom Lane wrote: > > > > Laurette Cisneros <laurette@nextbus.com> writes: > > > This is the error in the pgsql log: > > > 2003-02-13 16:21:42 [8843] ERROR: Index external_signstops_pkey is > > > not a btree > > > > This says that one of two fields that should never change, in fixed > > positions in the first block of a btree index, didn't have the right > > values. I am not aware of any PG bugs that could overwrite those > > fields. I think the most likely bet is that you've got hardware > > issues ... have you run memory and disk diagnostics lately? > > I am seeing this same problem on two separate machines, one brand new, one > older. Not sure yet what is causing it, but seems pretty unlikely that it > is hardware-related. Until you've tested them, the likelyhood is unimportant. If you've tested the boxes, and the memory tests good and the hard drives test good, then there is still likely to be another explanation, like a runaway kernel bug is writing somewhere it should every fifth eon or two. If you haven't tested the boxes, they're reliability is part of the NULL set. :-)
On Monday March 31 2003 4:15, Ed L. wrote: > On Monday March 31 2003 3:54, Tom Lane wrote: > > "Ed L." <pgsql@bluepolka.net> writes: > > >> I am seeing this same problem on two separate machines, one brand > > >> new, one older. Not sure yet what is causing it, but seems pretty > > >> unlikely that it is hardware-related. > > > > > > I am dabbling for the first time with a (crashing) C trigger, so that > > > may be the culprit here. > > > > Could well be, although past experience has been that crashes in C code > > seldom lead directly to disk corruption. (First, the bogus code has to > > overwrite a shared disk buffer. If you follow what I consider the > > better path of not making your shared buffers a large fraction of the > > address space, the odds of a wild store happening to hit a disk buffer > > aren't high. Second, once it's corrupted a shared buffer, it has to > > contrive to cause that buffer to get written out before the core dump > > occurs --- in most cases, the fact that the postmaster abandons the > > contents of shared memory after a backend crash protects us from this > > kind of failure.) > > > > When you find the problem, please take note of whether there's > > something involved that increases the chances of corruption getting to > > disk. We might want to try to do something about it ... Well, I fixed it but cannot now remember exactly what change did it amidst a bunch of rewrites of some existing stuff, and I cannot get back to that state from here. :( It was definitely arising from some funky C trigger code of my own making. Ed