Thread: zombie primary key lurches out of database to devour the brains of the unwary

zombie primary key lurches out of database to devour the brains of the unwary

From
Kevin Murphy
Date:
Using PG 7.4.5 on Mac OS X 10.3 ...

I have a primary key that I can't destroy and can't create.  One weird
symptom is that when I use \d in psql to attempt to display the
constraint, there is no output at all!  Normally, psql either shows the
constraint or reports that the constraint doesn't exist.

(BTW, I know that the default clauses below are kind of weird, but I am
using PG to process data for a database that doesn't use NULL and uses
0 and empty strings as default values.  The DDL is generated
automatically, so the defaults don't all make sense, but they should be
harmless.  Don't worry; the app is almost ported to PG ;-)

In the transcript below, snp_main_chr22 is the table, and there is
supposed to be a primary key snp_main_chr22_pk on the refsnp_id column:

egenome_test=# egenome_test=# \d snp_main_chr22
                     Table "build.snp_main_chr22"
    Column    |         Type          |           Modifiers
-------------+-----------------------+-------------------------------
  refsnp_id   | integer               | default 0
  variation   | character varying(10) | default ''::character varying
  het         | character varying(20) | default ''::character varying
  validated   | character varying(5)  | default ''::character varying
  chr         | character varying(2)  | default ''::character varying
  assay_size  | integer               | default 0
  pop_size    | integer               | default 0
  seq_pos     | integer               | default 0
  transcribed | character varying(1)  | default ''::character varying
egenome_test=#

egenome_test=# alter table snp_main_chr22 drop constraint
snp_main_chr22_pk;
ERROR:  constraint "snp_main_chr22_pk" does not exist

egenome_test=# drop table snp_main_chr22 cascade;
DROP TABLE

egenome_test=# \d snp_main_chr22_pk

egenome_test=# \d snp_main_chr22_pk_gibberish
Did not find any relation named "snp_main_chr22_pk_gibberish".

egenome_test=# CREATE TABLE snp_main_chr22 (
refsnp_id                      integer DEFAULT 0,
variation                      varchar(10) DEFAULT '',
het                            varchar(20) DEFAULT '',
validated                      varchar(5) DEFAULT '',
chr                            varchar(2) DEFAULT '',
assay_size                     integer DEFAULT 0,
pop_size                       integer DEFAULT 0,
seq_pos                        integer DEFAULT 0,
transcribed                    varchar(1) DEFAULT ''
);
egenome_test(# egenome_test(# egenome_test(# egenome_test(#
egenome_test(# egenome_test(# egenome_test(# egenome_test(#
egenome_test(# egenome_test(# CREATE TABLE

egenome_test=# ALTER TABLE snp_main_chr22
   ADD CONSTRAINT snp_main_chr22_pk
   PRIMARY KEY (refsnp_id)
   ;
egenome_test-# egenome_test-# egenome_test-# NOTICE:  ALTER TABLE / ADD
PRIMARY KEY will create implicit index "snp_main_chr22_pk" for table
"snp_main_chr22"
ERROR:  relation "snp_main_chr22_pk" already exists

egenome_test=# egenome_test=# \d snp_main_chr22
                     Table "build.snp_main_chr22"
    Column    |         Type          |           Modifiers
-------------+-----------------------+-------------------------------
  refsnp_id   | integer               | default 0
  variation   | character varying(10) | default ''::character varying
  het         | character varying(20) | default ''::character varying
  validated   | character varying(5)  | default ''::character varying
  chr         | character varying(2)  | default ''::character varying
  assay_size  | integer               | default 0
  pop_size    | integer               | default 0
  seq_pos     | integer               | default 0
  transcribed | character varying(1)  | default ''::character varying
egenome_test=#


Thanks for any advice.

Kevin Murphy


Kevin Murphy <murphy@genome.chop.edu> writes:
> I have a primary key that I can't destroy and can't create.  One weird
> symptom is that when I use \d in psql to attempt to display the
> constraint, there is no output at all!

That's strange.  Maybe some sort of catalog corruption?  Try "psql -E"
to see the queries issued by \d, and then execute them by hand to see
what you get.

> egenome_test=# alter table snp_main_chr22 drop constraint
> snp_main_chr22_pk;
> ERROR:  constraint "snp_main_chr22_pk" does not exist

Note that this only says the table doesn't have a constraint by that
name.  There could for instance be a non-constraint-associated index
or table by that name.  I'm not sure why you're not seeing it in \d,
though, if that were the case.

            regards, tom lane

Re: zombie primary key lurches out of database to devour the brains of the unwary

From
Kevin Murphy
Date:
On Aug 31, 2004, at 4:41 PM, Tom Lane wrote:
>> I have a primary key that I can't destroy and can't create.  One weird
>> symptom is that when I use \d in psql to attempt to display the
>> constraint, there is no output at all!
>
> That's strange.  Maybe some sort of catalog corruption?  Try "psql -E"
> to see the queries issued by \d, and then execute them by hand to see
> what you get.

First off, I dumped and restored the database, and the problem seems to have
been resolved.  (Before doing so, I restarted the database, which did _not_
help).

Tom, I'm not sure if my problem is still of interest to you, but I have a
backup of my data directory from the time of the problem (database was shut
down during copy), although it is pretty big.

Re: the catalog corruption, I used '\set ECHO_HIDDEN' and '\d
snp_main_chr22_pk', and then manually ran the commands:

egenome_test=# SELECT c.oid,
   n.nspname,
   c.relname
FROM pg_catalog.pg_class c
      LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
WHERE pg_catalog.pg_table_is_visible(c.oid)
       AND c.relname ~ '^snp_main_chr22_pk$'
ORDER BY 2, 3;

egenome_test-# egenome_test-# egenome_test-# egenome_test-#
egenome_test-# egenome_test-# egenome_test-#     oid    | nspname |
  relname
-----------+---------+-------------------
  112337075 | build   | snp_main_chr22_pk
(1 row)

egenome_test=# egenome_test=# SELECT relhasindex, relkind, relchecks,
reltriggers, relhasrules
FROM pg_catalog.pg_class WHERE oid = '112337075'

egenome_test-# egenome_test-# egenome_test-# ;
  relhasindex | relkind | relchecks | reltriggers | relhasrules
-------------+---------+-----------+-------------+-------------
  f           | i       |         0 |           0 | f
(1 row)

egenome_test=# SELECT a.attname,
   pg_catalog.format_type(a.atttypid, a.atttypmod),
   (SELECT substring(d.adsrc for 128) FROM pg_catalog.pg_attrdef d
    WHERE d.adrelid = a.attrelid AND d.adnum = a.attnum AND a.atthasdef),
   a.attnotnull, a.attnum
FROM pg_catalog.pg_attribute a, pg_catalog.pg_index i
WHERE a.attrelid = '112337075' AND a.attnum > 0 AND NOT a.attisdropped
AND a.attrelid = i.indexrelid
ORDER BY a.attnum
;
egenome_test-# egenome_test-# egenome_test(# egenome_test-#
egenome_test-# egenome_test-# egenome_test-# egenome_test-#   attname
| format_type | ?column? | attnotnull |\
  attnum
-----------+-------------+----------+------------+--------
  refsnp_id | integer     |          | f          |      1
(1 row)

egenome_test=# SELECT i.indisunique, i.indisprimary, a.amname,
c2.relname,
   pg_catalog.pg_get_expr(i.indpred, i.indrelid)
FROM pg_catalog.pg_index i, pg_catalog.pg_class c, pg_catalog.pg_class
c2, pg_catalog.pg_am a
WHERE i.indexrelid = c.oid AND c.oid = '112337075' AND c.relam = a.oid
AND i.indrelid = c2.oid
;
egenome_test-# egenome_test-# egenome_test-# egenome_test-#
egenome_test-#  indisunique | indisprimary | amname | relname |
pg_get_expr
-------------+--------------+--------+---------+-------------
(0 rows)

egenome_test=#

(No \d results)

-Kevin Murphy

Kevin Murphy <murphy@genome.chop.edu> writes:
> On Aug 31, 2004, at 4:41 PM, Tom Lane wrote:
>> That's strange.  Maybe some sort of catalog corruption?  Try "psql -E"
>> to see the queries issued by \d, and then execute them by hand to see
>> what you get.

> egenome_test=# SELECT i.indisunique, i.indisprimary, a.amname,
> c2.relname,
>    pg_catalog.pg_get_expr(i.indpred, i.indrelid)
> FROM pg_catalog.pg_index i, pg_catalog.pg_class c, pg_catalog.pg_class
> c2, pg_catalog.pg_am a
> WHERE i.indexrelid = c.oid AND c.oid = '112337075' AND c.relam = a.oid
> AND i.indrelid = c2.oid
> ;
> egenome_test-# egenome_test-# egenome_test-# egenome_test-#
> egenome_test-#  indisunique | indisprimary | amname | relname |
> pg_get_expr
> -------------+--------------+--------+---------+-------------
> (0 rows)

Looking at psql's describe code, it will indeed abort and print nothing
at all when this query returns zero rows.  So the question now reduces
to why this didn't find anything.  The theories that come to mind are
(a) one of the relevant catalog rows is in fact missing, or (b) one of
the indexes that are used is corrupted so that the needed row isn't
found.  From the prior queries we already know the index's pg_class and
pg_index rows are there, and it's highly unlikely that pg_am would be
damaged since it's essentially read-only, so that means the index's
parent table's pg_class row is deleted or unfindable.

Corruption of pg_class's OID index is certainly not impossible, but I'd
think it'd lead to worse problems than this.  My bet is on an incomplete
DROP operation --- ie, the parent table was dropped but for some reason
its child index wasn't.  Corruption of the pg_depend link from the one
to the other could have done it.  If you've had any database crashes
lately, then it's possible that the ultimate culprit is the transaction-
vs-checkpoint sync bug we identified a couple weeks ago.  That can lead
to whole or partial loss of committed transactions :-(

            regards, tom lane