Thread: BUG #13935: Duplicate row in pg_constraint table which is not accessible via indexes
BUG #13935: Duplicate row in pg_constraint table which is not accessible via indexes
From
bear2k@mail.ru
Date:
The following bug has been logged on the website: Bug reference: 13935 Logged by: Alexey Makhmutov Email address: bear2k@mail.ru PostgreSQL version: 9.4.4 Operating system: Linux x86_64 Description: We are seeing a very strange situation - table pg_constrint has a duplicate row for particular primary key constraint which is not accessible via indexes but is visible during segment scan. There wasn't any specific manipulation with catalog data, however the database is used with standby node, so switchover from one instance to another is performed from time to time. Here is the PG version: # select version(); version --------------------------------------------------------------------------------------------------------------- PostgreSQL 9.4.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-11), 64-bit (1 row) First, select rows with segment scan: # select oid,ctid,xmin,xmax,conrelid,contype from pg_constraint where conrelid::int + 0 = 50621; oid | ctid | xmin | xmax | conrelid | contype --------+---------+---------+------+----------+--------- 301952 | (6,136) | 4883898 | 0 | 50621 | p 300466 | (7,1) | 4786734 | 0 | 50621 | p (2 rows) # explain select oid,ctid,xmin,xmax,conrelid,contype from pg_constraint where conrelid::int + 0 = 50621; QUERY PLAN --------------------------------------------------------------- Seq Scan on pg_constraint (cost=0.00..11.81 rows=1 width=23) Filter: (((conrelid)::integer + 0) = 50621) (2 rows) Here we can see two primary key constraints for the same relation, which is very strange by itself. Now, let's try to access the same information by using index scan: # select oid,ctid,xmin,xmax,conrelid,contype from pg_constraint where conrelid = 50621; oid | ctid | xmin | xmax | conrelid | contype --------+---------+---------+------+----------+--------- 301952 | (6,136) | 4883898 | 0 | 50621 | p (1 row) # explain select oid,ctid,xmin,xmax,conrelid,contype from pg_constraint where conrelid = 50621; QUERY PLAN --------------------------------------------------------------------------------------------------- Index Scan using pg_constraint_conrelid_index on pg_constraint (cost=0.14..4.16 rows=1 width=23) Index Cond: (conrelid = 50621::oid) (2 rows) We can see only one row now. Moreover, if we try to access that row by its oid it will be still not accessible via index: # select oid,ctid,xmin,xmax,conrelid,contype from pg_constraint where oid in (301952, 300466); oid | ctid | xmin | xmax | conrelid | contype --------+---------+---------+------+----------+--------- 301952 | (6,136) | 4883898 | 0 | 50621 | p (1 row) # explain select oid,ctid,xmin,xmax,conrelid,contype from pg_constraint where oid in (301952, 300466); QUERY PLAN -------------------------------------------------------------------------------------- Bitmap Heap Scan on pg_constraint (cost=4.31..7.33 rows=2 width=23) Recheck Cond: (oid = ANY ('{301952,300466}'::oid[])) -> Bitmap Index Scan on pg_constraint_oid_index (cost=0.00..4.30 rows=2 width=0) Index Cond: (oid = ANY ('{301952,300466}'::oid[])) (4 rows) However, if accessed by full segment scan - it is visible: # set enable_bitmapscan = off; SET # set enable_indexscan = off; SET # select oid,ctid,xmin,xmax,conrelid,contype from pg_constraint where oid in (301952, 300466); oid | ctid | xmin | xmax | conrelid | contype --------+---------+---------+------+----------+--------- 301952 | (6,136) | 4883898 | 0 | 50621 | p 300466 | (7,1) | 4786734 | 0 | 50621 | p (2 rows) # explain select oid,ctid,xmin,xmax,conrelid,contype from pg_constraint where oid in (301952, 300466); QUERY PLAN --------------------------------------------------------------- Seq Scan on pg_constraint (cost=0.00..11.18 rows=2 width=23) Filter: (oid = ANY ('{301952,300466}'::oid[])) (2 rows) >From my perspective this is something which shouldn't be observed in the system, so it looks like some kind of bug. Is there any known problem which may lead to such behavior? Thanks, Alexey Makhmutov
Re: BUG #13935: Duplicate row in pg_constraint table which is not accessible via indexes
From
Alvaro Herrera
Date:
bear2k@mail.ru wrote: > First, select rows with segment scan: > > # select oid,ctid,xmin,xmax,conrelid,contype from pg_constraint where > conrelid::int + 0 = 50621; > oid | ctid | xmin | xmax | conrelid | contype > --------+---------+---------+------+----------+--------- > 301952 | (6,136) | 4883898 | 0 | 50621 | p > 300466 | (7,1) | 4786734 | 0 | 50621 | p > (2 rows) We've had reports of duplicate rows appearing in user tables, where only one copy is visible through indexes, but most updates of pg_constraint don't go through the normal executor, but rather direct catalog manipulation of DDL implementation, so I'd be surprising if it's the same thing. Also, the OIDs of the two rows are different, so really the two physical rows are not update copies of the same logical row. I wonder if you got into this state by adding primary keys concurrently somehow. That should of course not be allowed, but of course there is no UNIQUE constraint on conrelid itself, so the normal unique-ification code in btree does not fire for this situation. Perhaps we have a bug in some ALTER TABLE command. Can you tell us how you created the PK in the affected table? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: BUG #13935: Duplicate row in pg_constraint table which is not accessible via indexes
From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > I wonder if you got into this state by adding primary keys concurrently > somehow. That should of course not be allowed, but of course there is > no UNIQUE constraint on conrelid itself, so the normal unique-ification > code in btree does not fire for this situation. I thought maybe Simon's changes to reduce lock levels in ALTER TABLE had gone too far, but simple experiment shows that ALTER ADD PRIMARY KEY commands still block each other (and then the second one fails as expected). So there isn't an obvious hole here. Given that the OIDs are different, it seems more likely that this is the result of a primary key being dropped and then re-created, and later somehow the commit state of the original row got reverted. regards, tom lane
Re: BUG #13935: Duplicate row in pg_constraint table which is not accessible via indexes
From
Andres Freund
Date:
On 2016-02-08 16:32:22 +0000, bear2k@mail.ru wrote: > We are seeing a very strange situation - table pg_constrint has a duplicate > row for particular primary key constraint which is not accessible via > indexes but is visible during segment scan. There wasn't any specific > manipulation with catalog data, however the database is used with standby > node, so switchover from one instance to another is performed from time to > time. Did your have any failovers or restore from backup, at some point in the history of the cluster? If so, what were the exact procedures you used? Greetings, Andres Freund
Re[2]: [BUGS] BUG #13935: Duplicate row in pg_constraint table which is not accessible via indexes
From
Alexey Makhmutov
Date:
> > I wonder if you got into this state by adding primary keys concurrently > > somehow. That should of course not be allowed, but of course there is > > no UNIQUE constraint on conrelid itself, so the normal unique-ification > > code in btree does not fire for this situation. > > I thought maybe Simon's changes to reduce lock levels in ALTER TABLE > had gone too far, but simple experiment shows that ALTER ADD PRIMARY KEY > commands still block each other (and then the second one fails as > expected). So there isn't an obvious hole here. > > Given that the OIDs are different, it seems more likely that this is the > result of a primary key being dropped and then re-created, and later > somehow the commit state of the original row got reverted. Yes, sure - sorry for the misleading phrasing. These two rows are not completely identical - their OIDs are different andreference to the supporting index is different: # select oid,ctid,xmin,xmax,conrelid,contype,conindid from pg_constraint where conrelid::int+0=50621; oid | ctid | xmin | xmax | conrelid | contype | conindid --------+---------+---------+------+----------+---------+---------- 301952 | (6,136) | 4883898 | 0 | 50621 | p | 301951 300466 | (7,1) | 4786734 | 0 | 50621 | p | 300465 (2 rows) Of course, only one index exists - the one referenced by indexed row in pg_constraint. So, yes - it looks like this ‘phantom’row wasn’t properly deleted. This table is recreated in two steps - first, a script is executed via psql, which drops and recreate table structure usingslightly weird PL/PgSQL fragment: do $$ declare begin begin execute 'drop table this_table'; exception when undefined_table then null; end; begin execute 'create table this_table ( part_id NUMERIC(20), restart_id CHARACTER VARYING(250), restart_info BYTEA )'; exception when duplicate_table then null; end; end $$ language 'plpgsql'; Script invocation is wrapped into begin; .. commit; command. The script doesn't create PK. And then Java application performs table modification (in separate transaction): alter table this_table add key character varying(4000); alter table this_table add session_binary bytea; alter table this_table add insert_time timestamp; alter table this_table add constraint this_table_pk primary key (key); create index this_table_insert_time on this_table(insert_time); These two steps are repeated on patch reinstallation, so this table was for sure dropped and recreated multiple times. Thanks, Alexey Makhmutov