BUG #4929: Corrupted pg_class, possibly truncate/rollback related - Mailing list pgsql-bugs

From Robert Treat
Subject BUG #4929: Corrupted pg_class, possibly truncate/rollback related
Date
Msg-id 200907202246.n6KMkE46057083@wwwmaster.postgresql.org
Whole thread Raw
Responses Re: BUG #4929: Corrupted pg_class, possibly truncate/rollback related  (Andrew Gierth <andrew@tao11.riddles.org.uk>)
Re: BUG #4929: Corrupted pg_class, possibly truncate/rollback related  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
The following bug has been logged online:

Bug reference:      4929
Logged by:          Robert Treat
Email address:      xzilla@users.sourceforge.net
PostgreSQL version: 8.3.1
Operating system:   Linux 2.6.18-53.1.14.el5 #1 SMP x86_64 GNU/Linux
(Centos)
Description:        Corrupted pg_class, possibly truncate/rollback related
Details:

Howdy,

I wanted to see if a situation I've encountered is a known issue or perhaps
worth more investigation. I'm working on a Postgres 8.3.1 system, where I've
uncovered some catalog corruption. The problems started with a failing
pg_dump...

pg_dump: SQL command failed
pg_dump: Error message from server: ERROR:  more than one row returned by a
subquery used as an expression
pg_dump: The command was: SELECT tableoid, oid, typname, typnamespace,
(SELECT rolname FROM pg_catalog.pg_roles WHERE oid = typowner) as rolname,
typinput::oid as typinput, typoutput::oid as typoutput
, typelem, typrelid, CASE WHEN typrelid = 0 THEN ' '::"char" ELSE (SELECT
relkind FROM pg_class WHERE oid = typrelid) END as typrelkind, typtype,
typisdefined, typname[0] = '_' AND typelem != 0 AND (SELECT typarray FROM
pg_type te WHERE oid = pg_type.typelem) = oid AS isarray FROM pg_type


I tracked this down to this problem in pg_class

pe2=# select xmin, xmax, cmin, cmax, ctid, oid, relnamespace, relname,
reltype, relowner, relfilenode, relpages, reltuples from pg_class  where oid
= 23708;
    xmin    |    xmax    | cmin | cmax |   ctid   |  oid  | relnamespace |
relname  | reltype | relowner | relfilenode | relpages |  reltuples
------------+------------+------+------+----------+-------+--------------+--
--------+---------+----------+-------------+----------+-------------
 3291061347 |          0 |    6 |    6 | (1118,2) | 23708 |        23681 |
prooln_m |   23710 |       10 |      654963 |   114055 | 7.42746e+06
          2 | 3291061347 |    6 |    6 | (23,39)  | 23708 |        23681 |
prooln_m |   23710 |       10 |      181519 |   104401 |  6.5017e+06


So, clearly this is bad. It seems we've gotten some level of corruption on
disk. The most perculiar bits of information around this system are that we
tend to have long running vacuum jobs (multiple days), and we recently did a
truncate + rollback within a transaction on the table in question. I've also
noticed that the file on disk for the 181519 row is not actually there. Oh,
and this does run on lvm, though we haven't used the lvm feature set for a
long time. So, worth investigating?

pgsql-bugs by date:

Previous
From: Jaime Casanova
Date:
Subject: Re: fix: plpgsql: return query and dropped columns problem
Next
From: Andrew Gierth
Date:
Subject: Re: BUG #4929: Corrupted pg_class, possibly truncate/rollback related