Thread: BUG #2667: vacuuming a 70GB table causes a "crash"

BUG #2667: vacuuming a 70GB table causes a "crash"

From
"Stephen R. van den Berg"
Date:
The following bug has been logged online:

Bug reference:      2667
Logged by:          Stephen R. van den Berg
Email address:      srb@cuci.nl
PostgreSQL version: 8.1.4
Operating system:   Linux 2.6.16.20 Debian (pg: 8.1.4-6)
Description:        vacuuming a 70GB table causes a "crash"
Details:

datalog=# vacuum verbose fnetsense;
INFO:  vacuuming "public.fnetsense"
PANIC:  right sibling is not next child in "fnetsense_utstamp_key"
server closed the connection unexpectedly

Previously, the database was fully packed, no deletetions, just insertions
at the end (kind of a logging DB).
Then I deleted around 50GB worth of data (the first 50GB).
That worked just fine, then I ran vacuum the first
time (on this table).  It ran, and did 3 or 4 passes
successfully, then it bombed reporting the
error above.  Then I tried running vacuum again, same
error as above again.

Restarted the server, didn't help.  Error is persistent
now, DB is still working, don't know how to repair the
DB.

The table being vacuumed is inserted into in realtime
at a rate of 23 new entries per second.

This is the table definition of the table in question:
              Table "public.fnetsense"
            Column             |  Type   | Modifiers
-------------------------------+---------+-----------
 nsb                           | integer | not null
 utstamp                       | integer | not null
 ifoutoctets_1                 | bigint  | not null
 ifoutucastpkts_1              | bigint  | not null
 ifinoctets_1                  | bigint  | not null
 ifinucastpkts_1               | bigint  | not null
 opampsdownchannelcolor        | "char"  | not null
 opampsdownchannelmains        | "char"  | not null
 opampsupchannelcolor          | "char"  | not null
 opampssigqcorrecteds          | "char"  | not null
 opampssigquncorrectables      | "char"  | not null
 opampssigqsignalwater         | "char"  | not null
 opampssigqmicrobeachsand      | "char"  | not null
 opampsfostatustxmains         | "char"  | not null
 opampsfostatustttimeouts      | "char"  | not null
 opampsfotsfostatusvalue       | "char"  | not null
 opampsfotsfostatussignalwater | "char"  | not null
 opampsfotsfostatusrxmains     | "char"  | not null
 fotsnloss                     | "char"  | not null
 fotsnpadj                     | "char"  | not null
 fotsrnginvminreqs             | "char"  | not null
Indexes:
    "fnetsense_nsbutstamp_key" btree (nsb, utstamp)
    "fnetsense_utstamp_key" btree (utstamp)
Foreign-key constraints:
    "fnetsense_nsb_fkey" FOREIGN KEY (nsb) REFERENCES netsensebase(nsb)
      ON UPDATE CASCADE ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED

Re: BUG #2667: vacuuming a 70GB table causes a "crash"

From
Tom Lane
Date:
"Stephen R. van den Berg" <srb@cuci.nl> writes:
> datalog=# vacuum verbose fnetsense;
> INFO:  vacuuming "public.fnetsense"
> PANIC:  right sibling is not next child in "fnetsense_utstamp_key"
> server closed the connection unexpectedly

Were you running with full_page_writes off during some earlier use of
this database?  There is a known failure mode that can produce this
situation in 8.1.x for x < 4, see thread here:

http://archives.postgresql.org/pgsql-bugs/2006-04/msg00135.php

> Restarted the server, didn't help.  Error is persistent
> now, DB is still working, don't know how to repair the
> DB.

REINDEX that index.

            regards, tom lane

Re: BUG #2667: vacuuming a 70GB table causes a "crash"

From
"Stephen R. van den Berg"
Date:
On 9/30/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Stephen R. van den Berg" <srb@cuci.nl> writes:
> > datalog=# vacuum verbose fnetsense;
> > INFO:  vacuuming "public.fnetsense"
> > PANIC:  right sibling is not next child in "fnetsense_utstamp_key"
> > server closed the connection unexpectedly

> Were you running with full_page_writes off during some earlier use of
> this database?  There is a known failure mode that can produce this
> situation in 8.1.x for x < 4, see thread here:

My postgres.conf file says:

full_page_writes = off

As I understand it, the 8.1.4 postgres ignores this setting?
--
Sincerely,
                Stephen R. van den Berg (AKA BuGless).

Re: BUG #2667: vacuuming a 70GB table causes a "crash"

From
Tom Lane
Date:
"Stephen R. van den Berg" <srb@cuci.nl> writes:
> On 9/30/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Were you running with full_page_writes off during some earlier use of
>> this database?

> My postgres.conf file says:
> full_page_writes = off
> As I understand it, the 8.1.4 postgres ignores this setting?

Indeed.  But I'm speculating that the damage was done by an earlier
8.1.x release and you've only just now discovered it.

            regards, tom lane