Re: Vacuum Error - Mailing list pgsql-general

From Tom Lane
Subject Re: Vacuum Error
Date
Msg-id 5931.1073949623@sss.pgh.pa.us
Whole thread Raw
In response to Re: Vacuum Error  ("Andrew Bartley" <abartley@evolvosystems.com>)
Responses Re: Vacuum Error  (Kragen Sitaker <kragen+pgsql@airwave.com>)
List pgsql-general
"Andrew Bartley" <abartley@evolvosystems.com> writes:
> I believe this error usually comes about due to OID wrapping.

No; an OID collision would have occurred when you tried to create a
table.  If two tables are present in pg_class then they have different
OIDs, and shouldn't have any conflicts in pg_statistic.

Since my last message I've thought of a possibly plausible explanation:
the bt_moveright bug that was isolated just a few months ago.  If two
backends were analyzing some tables (not necessarily the same table) at
the same time, and one of them caused the first root-page split in
pg_statistic_relid_att_index, it would be possible for the other one to
miss spotting an existing row for the OID/attnum it wanted to insert.
I think this could lead to two valid entries for the same OID/attnum in
pg_statistic, and consequently a persistent error on every subsequent
attempt to analyze that table.

This seems moderately plausible because pg_statistic_relid_att_index
would fit in a single btree page up till about 300-400 entries, which
is about right for a moderate-sized database (I see 299 pg_statistic
entries in the current regression database, for example).  First split
could easily happen in a database that had been running for awhile.

The relevant CVS log entry is

2003-07-29 18:18  tgl

    * src/backend/access/nbtree/: nbtsearch.c (REL7_3_STABLE),
    nbtsearch.c (REL7_2_STABLE), nbtsearch.c: Fix longstanding error in
    _bt_search(): should moveright at top of loop not bottom.
    Otherwise we fail to moveright when the root page was split while
    we were "in flight" to it.  This is not a significant problem when
    the root is above the leaf level, but if the root was also a leaf
    (ie, a single-page index just got split) we may return the wrong
    leaf page to the caller, resulting in failure to find a key that is
    in fact present.  Bug has existed at least since 7.1, probably
    forever.

(Note that although the patch was committed into 7.2 series, there has
been no 7.2 release since then.  You could pull REL7_2_STABLE tip if you
wanted to build a 7.2-series server with this fix in place.)

            regards, tom lane

pgsql-general by date:

Previous
From: Alex Satrapa
Date:
Subject: Protection From Inference (was Re: Drawbacks of using BYTEA for PK?)
Next
From: Stephan Szabo
Date:
Subject: Re: Column qualifier issue