Re: zombie primary key lurches out of database to devour the brains of the unwary - Mailing list pgsql-general

From Tom Lane
Subject Re: zombie primary key lurches out of database to devour the brains of the unwary
Date
Msg-id 19417.1094134292@sss.pgh.pa.us
Whole thread Raw
In response to Re: zombie primary key lurches out of database to devour the brains of the unwary  (Kevin Murphy <murphy@genome.chop.edu>)
List pgsql-general
Kevin Murphy <murphy@genome.chop.edu> writes:
> On Aug 31, 2004, at 4:41 PM, Tom Lane wrote:
>> That's strange.  Maybe some sort of catalog corruption?  Try "psql -E"
>> to see the queries issued by \d, and then execute them by hand to see
>> what you get.

> egenome_test=# SELECT i.indisunique, i.indisprimary, a.amname,
> c2.relname,
>    pg_catalog.pg_get_expr(i.indpred, i.indrelid)
> FROM pg_catalog.pg_index i, pg_catalog.pg_class c, pg_catalog.pg_class
> c2, pg_catalog.pg_am a
> WHERE i.indexrelid = c.oid AND c.oid = '112337075' AND c.relam = a.oid
> AND i.indrelid = c2.oid
> ;
> egenome_test-# egenome_test-# egenome_test-# egenome_test-#
> egenome_test-#  indisunique | indisprimary | amname | relname |
> pg_get_expr
> -------------+--------------+--------+---------+-------------
> (0 rows)

Looking at psql's describe code, it will indeed abort and print nothing
at all when this query returns zero rows.  So the question now reduces
to why this didn't find anything.  The theories that come to mind are
(a) one of the relevant catalog rows is in fact missing, or (b) one of
the indexes that are used is corrupted so that the needed row isn't
found.  From the prior queries we already know the index's pg_class and
pg_index rows are there, and it's highly unlikely that pg_am would be
damaged since it's essentially read-only, so that means the index's
parent table's pg_class row is deleted or unfindable.

Corruption of pg_class's OID index is certainly not impossible, but I'd
think it'd lead to worse problems than this.  My bet is on an incomplete
DROP operation --- ie, the parent table was dropped but for some reason
its child index wasn't.  Corruption of the pg_depend link from the one
to the other could have done it.  If you've had any database crashes
lately, then it's possible that the ultimate culprit is the transaction-
vs-checkpoint sync bug we identified a couple weeks ago.  That can lead
to whole or partial loss of committed transactions :-(

            regards, tom lane

pgsql-general by date:

Previous
From: Graeme Hinchliffe
Date:
Subject: C Function causes backend to die in 7.4.3
Next
From: Tom Lane
Date:
Subject: Re: C Function causes backend to die in 7.4.3