Thread: zombie primary key lurches out of database to devour the brains of the unwary
zombie primary key lurches out of database to devour the brains of the unwary
From
Kevin Murphy
Date:
Using PG 7.4.5 on Mac OS X 10.3 ... I have a primary key that I can't destroy and can't create. One weird symptom is that when I use \d in psql to attempt to display the constraint, there is no output at all! Normally, psql either shows the constraint or reports that the constraint doesn't exist. (BTW, I know that the default clauses below are kind of weird, but I am using PG to process data for a database that doesn't use NULL and uses 0 and empty strings as default values. The DDL is generated automatically, so the defaults don't all make sense, but they should be harmless. Don't worry; the app is almost ported to PG ;-) In the transcript below, snp_main_chr22 is the table, and there is supposed to be a primary key snp_main_chr22_pk on the refsnp_id column: egenome_test=# egenome_test=# \d snp_main_chr22 Table "build.snp_main_chr22" Column | Type | Modifiers -------------+-----------------------+------------------------------- refsnp_id | integer | default 0 variation | character varying(10) | default ''::character varying het | character varying(20) | default ''::character varying validated | character varying(5) | default ''::character varying chr | character varying(2) | default ''::character varying assay_size | integer | default 0 pop_size | integer | default 0 seq_pos | integer | default 0 transcribed | character varying(1) | default ''::character varying egenome_test=# egenome_test=# alter table snp_main_chr22 drop constraint snp_main_chr22_pk; ERROR: constraint "snp_main_chr22_pk" does not exist egenome_test=# drop table snp_main_chr22 cascade; DROP TABLE egenome_test=# \d snp_main_chr22_pk egenome_test=# \d snp_main_chr22_pk_gibberish Did not find any relation named "snp_main_chr22_pk_gibberish". egenome_test=# CREATE TABLE snp_main_chr22 ( refsnp_id integer DEFAULT 0, variation varchar(10) DEFAULT '', het varchar(20) DEFAULT '', validated varchar(5) DEFAULT '', chr varchar(2) DEFAULT '', assay_size integer DEFAULT 0, pop_size integer DEFAULT 0, seq_pos integer DEFAULT 0, transcribed varchar(1) DEFAULT '' ); egenome_test(# egenome_test(# egenome_test(# egenome_test(# egenome_test(# egenome_test(# egenome_test(# egenome_test(# egenome_test(# egenome_test(# CREATE TABLE egenome_test=# ALTER TABLE snp_main_chr22 ADD CONSTRAINT snp_main_chr22_pk PRIMARY KEY (refsnp_id) ; egenome_test-# egenome_test-# egenome_test-# NOTICE: ALTER TABLE / ADD PRIMARY KEY will create implicit index "snp_main_chr22_pk" for table "snp_main_chr22" ERROR: relation "snp_main_chr22_pk" already exists egenome_test=# egenome_test=# \d snp_main_chr22 Table "build.snp_main_chr22" Column | Type | Modifiers -------------+-----------------------+------------------------------- refsnp_id | integer | default 0 variation | character varying(10) | default ''::character varying het | character varying(20) | default ''::character varying validated | character varying(5) | default ''::character varying chr | character varying(2) | default ''::character varying assay_size | integer | default 0 pop_size | integer | default 0 seq_pos | integer | default 0 transcribed | character varying(1) | default ''::character varying egenome_test=# Thanks for any advice. Kevin Murphy
Re: zombie primary key lurches out of database to devour the brains of the unwary
From
Tom Lane
Date:
Kevin Murphy <murphy@genome.chop.edu> writes: > I have a primary key that I can't destroy and can't create. One weird > symptom is that when I use \d in psql to attempt to display the > constraint, there is no output at all! That's strange. Maybe some sort of catalog corruption? Try "psql -E" to see the queries issued by \d, and then execute them by hand to see what you get. > egenome_test=# alter table snp_main_chr22 drop constraint > snp_main_chr22_pk; > ERROR: constraint "snp_main_chr22_pk" does not exist Note that this only says the table doesn't have a constraint by that name. There could for instance be a non-constraint-associated index or table by that name. I'm not sure why you're not seeing it in \d, though, if that were the case. regards, tom lane
Re: zombie primary key lurches out of database to devour the brains of the unwary
From
Kevin Murphy
Date:
On Aug 31, 2004, at 4:41 PM, Tom Lane wrote: >> I have a primary key that I can't destroy and can't create. One weird >> symptom is that when I use \d in psql to attempt to display the >> constraint, there is no output at all! > > That's strange. Maybe some sort of catalog corruption? Try "psql -E" > to see the queries issued by \d, and then execute them by hand to see > what you get. First off, I dumped and restored the database, and the problem seems to have been resolved. (Before doing so, I restarted the database, which did _not_ help). Tom, I'm not sure if my problem is still of interest to you, but I have a backup of my data directory from the time of the problem (database was shut down during copy), although it is pretty big. Re: the catalog corruption, I used '\set ECHO_HIDDEN' and '\d snp_main_chr22_pk', and then manually ran the commands: egenome_test=# SELECT c.oid, n.nspname, c.relname FROM pg_catalog.pg_class c LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace WHERE pg_catalog.pg_table_is_visible(c.oid) AND c.relname ~ '^snp_main_chr22_pk$' ORDER BY 2, 3; egenome_test-# egenome_test-# egenome_test-# egenome_test-# egenome_test-# egenome_test-# egenome_test-# oid | nspname | relname -----------+---------+------------------- 112337075 | build | snp_main_chr22_pk (1 row) egenome_test=# egenome_test=# SELECT relhasindex, relkind, relchecks, reltriggers, relhasrules FROM pg_catalog.pg_class WHERE oid = '112337075' egenome_test-# egenome_test-# egenome_test-# ; relhasindex | relkind | relchecks | reltriggers | relhasrules -------------+---------+-----------+-------------+------------- f | i | 0 | 0 | f (1 row) egenome_test=# SELECT a.attname, pg_catalog.format_type(a.atttypid, a.atttypmod), (SELECT substring(d.adsrc for 128) FROM pg_catalog.pg_attrdef d WHERE d.adrelid = a.attrelid AND d.adnum = a.attnum AND a.atthasdef), a.attnotnull, a.attnum FROM pg_catalog.pg_attribute a, pg_catalog.pg_index i WHERE a.attrelid = '112337075' AND a.attnum > 0 AND NOT a.attisdropped AND a.attrelid = i.indexrelid ORDER BY a.attnum ; egenome_test-# egenome_test-# egenome_test(# egenome_test-# egenome_test-# egenome_test-# egenome_test-# egenome_test-# attname | format_type | ?column? | attnotnull |\ attnum -----------+-------------+----------+------------+-------- refsnp_id | integer | | f | 1 (1 row) egenome_test=# SELECT i.indisunique, i.indisprimary, a.amname, c2.relname, pg_catalog.pg_get_expr(i.indpred, i.indrelid) FROM pg_catalog.pg_index i, pg_catalog.pg_class c, pg_catalog.pg_class c2, pg_catalog.pg_am a WHERE i.indexrelid = c.oid AND c.oid = '112337075' AND c.relam = a.oid AND i.indrelid = c2.oid ; egenome_test-# egenome_test-# egenome_test-# egenome_test-# egenome_test-# indisunique | indisprimary | amname | relname | pg_get_expr -------------+--------------+--------+---------+------------- (0 rows) egenome_test=# (No \d results) -Kevin Murphy
Re: zombie primary key lurches out of database to devour the brains of the unwary
From
Tom Lane
Date:
Kevin Murphy <murphy@genome.chop.edu> writes: > On Aug 31, 2004, at 4:41 PM, Tom Lane wrote: >> That's strange. Maybe some sort of catalog corruption? Try "psql -E" >> to see the queries issued by \d, and then execute them by hand to see >> what you get. > egenome_test=# SELECT i.indisunique, i.indisprimary, a.amname, > c2.relname, > pg_catalog.pg_get_expr(i.indpred, i.indrelid) > FROM pg_catalog.pg_index i, pg_catalog.pg_class c, pg_catalog.pg_class > c2, pg_catalog.pg_am a > WHERE i.indexrelid = c.oid AND c.oid = '112337075' AND c.relam = a.oid > AND i.indrelid = c2.oid > ; > egenome_test-# egenome_test-# egenome_test-# egenome_test-# > egenome_test-# indisunique | indisprimary | amname | relname | > pg_get_expr > -------------+--------------+--------+---------+------------- > (0 rows) Looking at psql's describe code, it will indeed abort and print nothing at all when this query returns zero rows. So the question now reduces to why this didn't find anything. The theories that come to mind are (a) one of the relevant catalog rows is in fact missing, or (b) one of the indexes that are used is corrupted so that the needed row isn't found. From the prior queries we already know the index's pg_class and pg_index rows are there, and it's highly unlikely that pg_am would be damaged since it's essentially read-only, so that means the index's parent table's pg_class row is deleted or unfindable. Corruption of pg_class's OID index is certainly not impossible, but I'd think it'd lead to worse problems than this. My bet is on an incomplete DROP operation --- ie, the parent table was dropped but for some reason its child index wasn't. Corruption of the pg_depend link from the one to the other could have done it. If you've had any database crashes lately, then it's possible that the ultimate culprit is the transaction- vs-checkpoint sync bug we identified a couple weeks ago. That can lead to whole or partial loss of committed transactions :-( regards, tom lane