Jeff Boes <jboes@nexcerpt.com> writes:
> On Mon, 2003-08-18 at 09:53, Tom Lane wrote:
>> Always the same OID, or different ones? Does that OID actually exist in
>> pg_class? Can you tell us exactly what SQL command(s) are producing the
>> error? (If not, better turn on query logging so you can find out.)
> Different OIDs, and they do not exist in pg_class (it's the OID of that
> table's row, right?
Right. My best guess is that you are seeing some weird failure in temp
table creation ... do you use lots of temp tables?
> I'd turn on query logging, but since we're getting these about every 3-7
> days, I'm not sure that would be the most effective use of all that disk
Perhaps you can recycle the logs every few hours?
BTW, the symptom sounds the same as the one that led up to the discovery
of this bug:
2003-07-29 18:18 tgl
* src/backend/access/nbtree/: nbtsearch.c (REL7_3_STABLE),
nbtsearch.c (REL7_2_STABLE), nbtsearch.c: Fix longstanding error in
_bt_search(): should moveright at top of loop not bottom.
Otherwise we fail to moveright when the root page was split while
we were "in flight" to it. This is not a significant problem when
the root is above the leaf level, but if the root was also a leaf
(ie, a single-page index just got split) we may return the wrong
leaf page to the caller, resulting in failure to find a key that is
in fact present. Bug has existed at least since 7.1, probably
forever.
However, I doubt that that is your problem. The moveright bug could
only lead to a pg_class lookup failure if a lookup occurred while
pg_class' OID index was being split from one page to two, which is an
event that happens at most once in the lifetime of an index (before
7.4 anyway). Unless you frequently create new databases, or frequently
reindex pg_class, I don't see how you would see that bug with any
reproducibility. (We were only able to track down the bug because
the regression tests evolved to a point where they caused it with
nontrivial probability.)
regards, tom lane