Thread: [BUG] Fix DETACH with FK pointing to a partitioned table fails
[BUG] Fix DETACH with FK pointing to a partitioned table fails
From
Jehan-Guillaume de Rorthais
Date:
Hi, (patch proposal below). Consider a table with a FK pointing to a partitioned table. CREATE TABLE p ( id bigint PRIMARY KEY ) PARTITION BY list (id); CREATE TABLE p_1 PARTITION OF p FOR VALUES IN (1); CREATE TABLE r_1 ( id bigint PRIMARY KEY, p_id bigint NOT NULL, FOREIGN KEY (p_id) REFERENCES p (id) ); Now, attach this table "refg_1" as partition of another one having the same FK: CREATE TABLE r ( id bigint PRIMARY KEY, p_id bigint NOT NULL, FOREIGN KEY (p_id) REFERENCES p (id) ) PARTITION BY list (id); ALTER TABLE r ATTACH PARTITION r_1 FOR VALUES IN (1); The old sub-FKs (below 18289) created in this table to enforce the action triggers on referenced partitions are not deleted when the table becomes a partition. Because of this, we have additional and useless triggers on the referenced partitions and we can not DETACH this partition on the referencing side anymore: => ALTER TABLE r DETACH PARTITION r_1; ERROR: could not find ON INSERT check triggers of foreign key constraint 18289 => SELECT c.oid, conparentid, conrelid::regclass, confrelid::regclass, t.tgfoid::regproc FROM pg_constraint c JOIN pg_trigger t ON t.tgconstraint = c.oid WHERE confrelid::regclass = 'p_1'::regclass; oid │ conparentid │ conrelid │ confrelid │ tgfoid ───────┼─────────────┼──────────┼───────────┼──────────────────────── 18289 │ 18286 │ r_1 │ p_1 │ "RI_FKey_noaction_del" 18289 │ 18286 │ r_1 │ p_1 │ "RI_FKey_noaction_upd" 18302 │ 18299 │ r │ p_1 │ "RI_FKey_noaction_del" 18302 │ 18299 │ r │ p_1 │ "RI_FKey_noaction_upd" (4 rows) The legitimate constraint and triggers here are 18302. The old sub-FK 18289 having 18286 as parent should have gone during the ATTACH PARTITION. Please, find in attachment a patch dropping old "sub-FK" during the ATTACH PARTITION command and adding a regression test about it. At the very least, it help understanding the problem and sketch a possible solution. Regards,
Attachment
On 2023-Jul-05, Jehan-Guillaume de Rorthais wrote: > ALTER TABLE r ATTACH PARTITION r_1 FOR VALUES IN (1); > > The old sub-FKs (below 18289) created in this table to enforce the action > triggers on referenced partitions are not deleted when the table becomes a > partition. Because of this, we have additional and useless triggers on the > referenced partitions and we can not DETACH this partition on the referencing > side anymore: Oh, hm, interesting. Thanks for the report and patch. I found a couple of minor issues with it (most serious one: nkeys should be 3, not 2; also sysscan should use conrelid index), but I'll try and complete it so that it's ready for 2023-08-10's releases. Regards -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
I think old "sub-FK" should not be dropped, that will be violates foreign key constraint. For example :
postgres=# insert into r values(1,1);
INSERT 0 1
postgres=# ALTER TABLE r DETACH PARTITION r_1;
ALTER TABLE
INSERT 0 1
postgres=# ALTER TABLE r DETACH PARTITION r_1;
ALTER TABLE
postgres=# delete from p_1 where id = 1;
DELETE 1
DELETE 1
postgres=# select * from r_1;
id | p_id
----+------
1 | 1
(1 row)
id | p_id
----+------
1 | 1
(1 row)
If I run above SQLs on pg12.12, it will report error below:
postgres=# delete from p_1 where id = 1;
ERROR: update or delete on table "p_1" violates foreign key constraint "r_1_p_id_fkey1" on table "r_1"
DETAIL: Key (id)=(1) is still referenced from table "r_1".
ERROR: update or delete on table "p_1" violates foreign key constraint "r_1_p_id_fkey1" on table "r_1"
DETAIL: Key (id)=(1) is still referenced from table "r_1".
Alvaro Herrera <alvherre@alvh.no-ip.org> 于2023年7月31日周一 20:58写道:
On 2023-Jul-05, Jehan-Guillaume de Rorthais wrote:
> ALTER TABLE r ATTACH PARTITION r_1 FOR VALUES IN (1);
>
> The old sub-FKs (below 18289) created in this table to enforce the action
> triggers on referenced partitions are not deleted when the table becomes a
> partition. Because of this, we have additional and useless triggers on the
> referenced partitions and we can not DETACH this partition on the referencing
> side anymore:
Oh, hm, interesting. Thanks for the report and patch. I found a couple
of minor issues with it (most serious one: nkeys should be 3, not 2;
also sysscan should use conrelid index), but I'll try and complete it so
that it's ready for 2023-08-10's releases.
Regards
--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
On 2023-Aug-03, tender wang wrote: > I think old "sub-FK" should not be dropped, that will be violates foreign > key constraint. Yeah, I've been playing more with the patch and it is definitely not doing the right things. Just eyeballing the contents of pg_trigger and pg_constraint for partitions added by ALTER...ATTACH shows that the catalog contents are inconsistent with those added by CREATE TABLE PARTITION OF. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
I think the code to determine that fk of a partition is inherited or not is not enough.
For example, in this case, foreign key r_1_p_id_fkey1 is not inherited from parent.
If conform->conparentid(in DetachPartitionFinalize func) is valid, we should recheck confrelid(pg_constraint) field.
I try to fix this problem in the attached patch.
Any thoughts.
Alvaro Herrera <alvherre@alvh.no-ip.org> 于2023年8月3日周四 17:02写道:
On 2023-Aug-03, tender wang wrote:
> I think old "sub-FK" should not be dropped, that will be violates foreign
> key constraint.
Yeah, I've been playing more with the patch and it is definitely not
doing the right things. Just eyeballing the contents of pg_trigger and
pg_constraint for partitions added by ALTER...ATTACH shows that the
catalog contents are inconsistent with those added by CREATE TABLE
PARTITION OF.
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Attachment
Oversight the DetachPartitionFinalize(), I found another bug below:
postgres=# CREATE TABLE p ( id bigint PRIMARY KEY ) PARTITION BY list (id);
CREATE TABLE
postgres=# CREATE TABLE p_1 PARTITION OF p FOR VALUES IN (1);
CREATE TABLE
postgres=# CREATE TABLE r_1 (
postgres(# id bigint PRIMARY KEY,
postgres(# p_id bigint NOT NULL
postgres(# );
CREATE TABLE
postgres=# CREATE TABLE r (
postgres(# id bigint PRIMARY KEY,
postgres(# p_id bigint NOT NULL,
postgres(# FOREIGN KEY (p_id) REFERENCES p (id)
postgres(# ) PARTITION BY list (id);
CREATE TABLE
postgres=# ALTER TABLE r ATTACH PARTITION r_1 FOR VALUES IN (1);
ALTER TABLE
postgres=# ALTER TABLE r DETACH PARTITION r_1;
ALTER TABLE
postgres=# insert into r_1 values(1,1);
ERROR: insert or update on table "r_1" violates foreign key constraint "r_p_id_fkey"
DETAIL: Key (p_id)=(1) is not present in table "p".
CREATE TABLE
postgres=# CREATE TABLE p_1 PARTITION OF p FOR VALUES IN (1);
CREATE TABLE
postgres=# CREATE TABLE r_1 (
postgres(# id bigint PRIMARY KEY,
postgres(# p_id bigint NOT NULL
postgres(# );
CREATE TABLE
postgres=# CREATE TABLE r (
postgres(# id bigint PRIMARY KEY,
postgres(# p_id bigint NOT NULL,
postgres(# FOREIGN KEY (p_id) REFERENCES p (id)
postgres(# ) PARTITION BY list (id);
CREATE TABLE
postgres=# ALTER TABLE r ATTACH PARTITION r_1 FOR VALUES IN (1);
ALTER TABLE
postgres=# ALTER TABLE r DETACH PARTITION r_1;
ALTER TABLE
postgres=# insert into r_1 values(1,1);
ERROR: insert or update on table "r_1" violates foreign key constraint "r_p_id_fkey"
DETAIL: Key (p_id)=(1) is not present in table "p".
After detach operation, r_1 is normal relation and the inherited foreign key 'r_p_id_fkey' should be removed.
tender wang <tndrwang@gmail.com> 于2023年8月3日周四 17:34写道:
I think the code to determine that fk of a partition is inherited or not is not enough.For example, in this case, foreign key r_1_p_id_fkey1 is not inherited from parent.If conform->conparentid(in DetachPartitionFinalize func) is valid, we should recheck confrelid(pg_constraint) field.I try to fix this problem in the attached patch.Any thoughts.Alvaro Herrera <alvherre@alvh.no-ip.org> 于2023年8月3日周四 17:02写道:On 2023-Aug-03, tender wang wrote:
> I think old "sub-FK" should not be dropped, that will be violates foreign
> key constraint.
Yeah, I've been playing more with the patch and it is definitely not
doing the right things. Just eyeballing the contents of pg_trigger and
pg_constraint for partitions added by ALTER...ATTACH shows that the
catalog contents are inconsistent with those added by CREATE TABLE
PARTITION OF.
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Oversight the DetachPartitionFinalize() again, I found the root cause why 'r_p_id_fkey' wat not removed.
DetachPartitionFinalize() call the GetParentedForeignKeyRefs() func to get tuple from pg_constraint that will be delete but failed.
according to the comments, the GetParentedForeignKeyRefs() func get the tuple reference me not I reference others.
I try to fix this bug :
i. ConstraintSetParentConstraint() should not be called in DetachPartitionFinalize(), because after conparentid was set to 0,
we can not find inherited foreign keys.
ii. create another function like GetParentedForeignKeyRefs(), but the ScanKey should be conrelid field not confrelid.
I quickly test on my above solution in my env, can be solve above issue.
tender wang <tndrwang@gmail.com> 于2023年8月4日周五 17:04写道:
Oversight the DetachPartitionFinalize(), I found another bug below:postgres=# CREATE TABLE p ( id bigint PRIMARY KEY ) PARTITION BY list (id);
CREATE TABLE
postgres=# CREATE TABLE p_1 PARTITION OF p FOR VALUES IN (1);
CREATE TABLE
postgres=# CREATE TABLE r_1 (
postgres(# id bigint PRIMARY KEY,
postgres(# p_id bigint NOT NULL
postgres(# );
CREATE TABLE
postgres=# CREATE TABLE r (
postgres(# id bigint PRIMARY KEY,
postgres(# p_id bigint NOT NULL,
postgres(# FOREIGN KEY (p_id) REFERENCES p (id)
postgres(# ) PARTITION BY list (id);
CREATE TABLE
postgres=# ALTER TABLE r ATTACH PARTITION r_1 FOR VALUES IN (1);
ALTER TABLE
postgres=# ALTER TABLE r DETACH PARTITION r_1;
ALTER TABLE
postgres=# insert into r_1 values(1,1);
ERROR: insert or update on table "r_1" violates foreign key constraint "r_p_id_fkey"
DETAIL: Key (p_id)=(1) is not present in table "p".After detach operation, r_1 is normal relation and the inherited foreign key 'r_p_id_fkey' should be removed.tender wang <tndrwang@gmail.com> 于2023年8月3日周四 17:34写道:I think the code to determine that fk of a partition is inherited or not is not enough.For example, in this case, foreign key r_1_p_id_fkey1 is not inherited from parent.If conform->conparentid(in DetachPartitionFinalize func) is valid, we should recheck confrelid(pg_constraint) field.I try to fix this problem in the attached patch.Any thoughts.Alvaro Herrera <alvherre@alvh.no-ip.org> 于2023年8月3日周四 17:02写道:On 2023-Aug-03, tender wang wrote:
> I think old "sub-FK" should not be dropped, that will be violates foreign
> key constraint.
Yeah, I've been playing more with the patch and it is definitely not
doing the right things. Just eyeballing the contents of pg_trigger and
pg_constraint for partitions added by ALTER...ATTACH shows that the
catalog contents are inconsistent with those added by CREATE TABLE
PARTITION OF.
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
The foreign key still works even though partition was detached. Is this behavior expected?
I can't find the answer in the document. If it is expected behavior , please ignore the bug I reported a few days ago.
tender wang <tndrwang@gmail.com> 于2023年8月4日周五 17:04写道:
Oversight the DetachPartitionFinalize(), I found another bug below:postgres=# CREATE TABLE p ( id bigint PRIMARY KEY ) PARTITION BY list (id);
CREATE TABLE
postgres=# CREATE TABLE p_1 PARTITION OF p FOR VALUES IN (1);
CREATE TABLE
postgres=# CREATE TABLE r_1 (
postgres(# id bigint PRIMARY KEY,
postgres(# p_id bigint NOT NULL
postgres(# );
CREATE TABLE
postgres=# CREATE TABLE r (
postgres(# id bigint PRIMARY KEY,
postgres(# p_id bigint NOT NULL,
postgres(# FOREIGN KEY (p_id) REFERENCES p (id)
postgres(# ) PARTITION BY list (id);
CREATE TABLE
postgres=# ALTER TABLE r ATTACH PARTITION r_1 FOR VALUES IN (1);
ALTER TABLE
postgres=# ALTER TABLE r DETACH PARTITION r_1;
ALTER TABLE
postgres=# insert into r_1 values(1,1);
ERROR: insert or update on table "r_1" violates foreign key constraint "r_p_id_fkey"
DETAIL: Key (p_id)=(1) is not present in table "p".After detach operation, r_1 is normal relation and the inherited foreign key 'r_p_id_fkey' should be removed.tender wang <tndrwang@gmail.com> 于2023年8月3日周四 17:34写道:I think the code to determine that fk of a partition is inherited or not is not enough.For example, in this case, foreign key r_1_p_id_fkey1 is not inherited from parent.If conform->conparentid(in DetachPartitionFinalize func) is valid, we should recheck confrelid(pg_constraint) field.I try to fix this problem in the attached patch.Any thoughts.Alvaro Herrera <alvherre@alvh.no-ip.org> 于2023年8月3日周四 17:02写道:On 2023-Aug-03, tender wang wrote:
> I think old "sub-FK" should not be dropped, that will be violates foreign
> key constraint.
Yeah, I've been playing more with the patch and it is definitely not
doing the right things. Just eyeballing the contents of pg_trigger and
pg_constraint for partitions added by ALTER...ATTACH shows that the
catalog contents are inconsistent with those added by CREATE TABLE
PARTITION OF.
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
On 2023-Aug-07, tender wang wrote: > The foreign key still works even though partition was detached. Is this > behavior expected? Well, there's no reason for it not to, right? For example, if you detach a partition and then attach it again, you don't have to scan the partition on attach, because you know the constraint has remained valid all along. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "In fact, the basic problem with Perl 5's subroutines is that they're not crufty enough, so the cruft leaks out into user-defined code instead, by the Conservation of Cruft Principle." (Larry Wall, Apocalypse 6)
Re: [BUG] Fix DETACH with FK pointing to a partitioned table fails
From
Jehan-Guillaume de Rorthais
Date:
On Thu, 3 Aug 2023 11:02:43 +0200 Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > On 2023-Aug-03, tender wang wrote: > > > I think old "sub-FK" should not be dropped, that will be violates foreign > > key constraint. > > Yeah, I've been playing more with the patch and it is definitely not > doing the right things. Just eyeballing the contents of pg_trigger and > pg_constraint for partitions added by ALTER...ATTACH shows that the > catalog contents are inconsistent with those added by CREATE TABLE > PARTITION OF. Well, as stated in my orignal message, at the patch helps understanding the problem and sketch a possible solution. It definitely is not complete. After DETACHing the table, we surely needs to check everything again and recreating what is needed to keep the FK consistent. But should we keep the FK after DETACH? Did you check the two other discussions related to FK, self-FK & partition? Unfortunately, as Tender experienced, the more we dig the more we find bugs. Moreover, the second one might seems unsolvable and deserve a closer look. See: * FK broken after DETACHing referencing part https://www.postgresql.org/message-id/20230420144344.40744130%40karst * Issue attaching a table to a partitioned table with an auto-referenced foreign key https://www.postgresql.org/message-id/20230707175859.17c91538%40karst
Hi
Is there any conclusion to this issue?
Jehan-Guillaume de Rorthais <jgdr@dalibo.com> 于2023年8月10日周四 23:03写道:
On Thu, 3 Aug 2023 11:02:43 +0200
Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> On 2023-Aug-03, tender wang wrote:
>
> > I think old "sub-FK" should not be dropped, that will be violates foreign
> > key constraint.
>
> Yeah, I've been playing more with the patch and it is definitely not
> doing the right things. Just eyeballing the contents of pg_trigger and
> pg_constraint for partitions added by ALTER...ATTACH shows that the
> catalog contents are inconsistent with those added by CREATE TABLE
> PARTITION OF.
Well, as stated in my orignal message, at the patch helps understanding the
problem and sketch a possible solution. It definitely is not complete.
After DETACHing the table, we surely needs to check everything again and
recreating what is needed to keep the FK consistent.
But should we keep the FK after DETACH? Did you check the two other discussions
related to FK, self-FK & partition? Unfortunately, as Tender experienced, the
more we dig the more we find bugs. Moreover, the second one might seems
unsolvable and deserve a closer look. See:
* FK broken after DETACHing referencing part
https://www.postgresql.org/message-id/20230420144344.40744130%40karst
* Issue attaching a table to a partitioned table with an auto-referenced
foreign key
https://www.postgresql.org/message-id/20230707175859.17c91538%40karst
On 2023-Oct-25, tender wang wrote: > Hi > Is there any conclusion to this issue? None yet. I intend to work on this at some point, hopefully soon. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Hi Alvaro,
I re-analyzed this issue, and here is my analysis process.
step 1: CREATE TABLE p ( id bigint PRIMARY KEY )
PARTITION BY list (id);
step2: CREATE TABLE p_1 PARTITION OF p FOR VALUES IN (1);
step3: CREATE TABLE r_1 (
id bigint PRIMARY KEY,
p_id bigint NOT NULL,
FOREIGN KEY (p_id) REFERENCES p (id)
);
After above step 3 operations, we have below catalog tuples:
postgres=# select oid, relname from pg_class where relname = 'p';
oid | relname
-------+---------
16384 | p
(1 row)
postgres=# select oid, relname from pg_class where relname = 'p_1';
oid | relname
-------+---------
16389 | p_1
(1 row)
postgres=# select oid, relname from pg_class where relname = 'r_1';
oid | relname
-------+---------
16394 | r_1
(1 row)
postgres=# select oid, conname,conrelid,conparentid,confrelid from pg_constraint where conrelid = 16394;
oid | conname | conrelid | conparentid | confrelid
-------+-------------------+----------+-------------+-----------
16397 | r_1_p_id_not_null | 16394 | 0 | 0
16399 | r_1_pkey | 16394 | 0 | 0
16400 | r_1_p_id_fkey | 16394 | 0 | 16384
16403 | r_1_p_id_fkey1 | 16394 | 16400 | 16389
(4 rows)
postgres=# select oid, tgrelid, tgparentid, tgconstrrelid,tgconstrindid,tgconstraint from pg_trigger where tgconstraint = 16403;
oid | tgrelid | tgparentid | tgconstrrelid | tgconstrindid | tgconstraint
-------+---------+------------+---------------+---------------+--------------
16404 | 16389 | 16401 | 16394 | 16392 | 16403
16405 | 16389 | 16402 | 16394 | 16392 | 16403
(2 rows)
postgres=# select oid, tgrelid, tgparentid, tgconstrrelid,tgconstrindid,tgconstraint from pg_trigger where tgconstraint = 16400;
oid | tgrelid | tgparentid | tgconstrrelid | tgconstrindid | tgconstraint
-------+---------+------------+---------------+---------------+--------------
16401 | 16384 | 0 | 16394 | 16387 | 16400
16402 | 16384 | 0 | 16394 | 16387 | 16400
16406 | 16394 | 0 | 16384 | 16387 | 16400
16407 | 16394 | 0 | 16384 | 16387 | 16400
(4 rows)
Because table p is partitioned table and it has one child table p_1. So when r_1 add foreign key constraint, according to addFkRecurseReferenced(),
each partition should have one pg_constraint row(e.g. r_1_p_id_fkey1).
After called addFkRecurseReferenced() in ATAddForeignKeyConstraint(), addFkRecurseReferencing() will be called, in which
it will add INSERT check trigger and UPDATE check trigger for r_1_p_id_fkey but not for r_1_p_id_fkey1.
So when detach r_1 from r, according to DetachPartitionFinalize(), the inherited fks should unlink relationship from parent.
The created INSERT and UPDATE check triggers should unlink relationship link fks. But just like I said above, the r_1_p_id_fkey1
actually doesn't have INSERT check trigger.
I slightly modified the previous patch,but I didn't add test case, because I found another issue.
After done ALTER TABLE r ATTACH PARTITION r_1 FOR VALUES IN (1);
I run the oidjoins.sql and has warnings as belwo:
psql:/tender/postgres/src/test/regress/sql/oidjoins.sql:49: WARNING: FK VIOLATION IN pg_trigger({tgparentid}): ("(0,3)",16401)
psql:/tender/postgres/src/test/regress/sql/oidjoins.sql:49: WARNING: FK VIOLATION IN pg_trigger({tgparentid}): ("(0,4)",16402)
psql:/tender/postgres/src/test/regress/sql/oidjoins.sql:49: WARNING: FK VIOLATION IN pg_trigger({tgparentid}): ("(0,4)",16402)
postgres=# select oid, tgrelid, tgparentid, tgconstrrelid,tgconstrindid,tgconstraint from pg_trigger where oid >= 16384;
oid | tgrelid | tgparentid | tgconstrrelid | tgconstrindid | tgconstraint
-------+---------+------------+---------------+---------------+--------------
16404 | 16389 | 16401 | 16394 | 16392 | 16403
16405 | 16389 | 16402 | 16394 | 16392 | 16403
16415 | 16384 | 0 | 16408 | 16387 | 16414
16416 | 16384 | 0 | 16408 | 16387 | 16414
16418 | 16389 | 16415 | 16408 | 16392 | 16417
16419 | 16389 | 16416 | 16408 | 16392 | 16417
16420 | 16408 | 0 | 16384 | 16387 | 16414
16421 | 16408 | 0 | 16384 | 16387 | 16414
16406 | 16394 | 16420 | 16384 | 16387 | 16400
16407 | 16394 | 16421 | 16384 | 16387 | 16400
(10 rows)
oid | tgrelid | tgparentid | tgconstrrelid | tgconstrindid | tgconstraint
-------+---------+------------+---------------+---------------+--------------
16404 | 16389 | 16401 | 16394 | 16392 | 16403
16405 | 16389 | 16402 | 16394 | 16392 | 16403
16415 | 16384 | 0 | 16408 | 16387 | 16414
16416 | 16384 | 0 | 16408 | 16387 | 16414
16418 | 16389 | 16415 | 16408 | 16392 | 16417
16419 | 16389 | 16416 | 16408 | 16392 | 16417
16420 | 16408 | 0 | 16384 | 16387 | 16414
16421 | 16408 | 0 | 16384 | 16387 | 16414
16406 | 16394 | 16420 | 16384 | 16387 | 16400
16407 | 16394 | 16421 | 16384 | 16387 | 16400
(10 rows)
oid = 16401 and oid = 16402 has been deleted.
The two trigger tuples are deleted in tryAttachPartitionForeignKey called by CloneFkReferencing.
/*
* Looks good! Attach this constraint. The action triggers in the new
* partition become redundant -- the parent table already has equivalent
* ones, and those will be able to reach the partition. Remove the ones
* in the partition. We identify them because they have our constraint
* OID, as well as being on the referenced rel.
*/
* Looks good! Attach this constraint. The action triggers in the new
* partition become redundant -- the parent table already has equivalent
* ones, and those will be able to reach the partition. Remove the ones
* in the partition. We identify them because they have our constraint
* OID, as well as being on the referenced rel.
*/
The attached patch can't fix above issue. I'm not sure about the impact of this issue. Maybe redundant triggers no need removed.
But it surely make oidjoings.sql fail if I add test case into v2 patch, so I don't add test case in v2 patch.
No test case is not good patch. I just share my idea about this issue. Hope to get your reply.
Alvaro Herrera <alvherre@alvh.no-ip.org> 于2023年10月25日周三 20:13写道:
On 2023-Oct-25, tender wang wrote:
> Hi
> Is there any conclusion to this issue?
None yet. I intend to work on this at some point, hopefully soon.
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Attachment
Hi Alvaro,
Recently, Alexander reported the same issue on [1]. And before that, another same issue was reported on [2].
So I try to re-work those issues. In my last email on this thread, I said that
"
I slightly modified the previous patch,but I didn't add test case, because I found another issue.
After done ALTER TABLE r ATTACH PARTITION r_1 FOR VALUES IN (1);
I run the oidjoins.sql and has warnings as belwo:
psql:/tender/postgres/src/test/regress/sql/oidjoins.sql:49: WARNING: FK VIOLATION IN pg_trigger({tgparentid}): ("(0,3)",16401)
psql:/tender/postgres/src/test/regress/sql/oidjoins.sql:49: WARNING: FK VIOLATION IN pg_trigger({tgparentid}): ("(0,4)",16402)
psql:/tender/postgres/src/test/regress/sql/oidjoins.sql:49: WARNING: FK VIOLATION IN pg_trigger({tgparentid}): ("(0,4)",16402)
"
And I gave the explanation:
"
The two trigger tuples are deleted in tryAttachPartitionForeignKey called by CloneFkReferencing.
/*
* Looks good! Attach this constraint. The action triggers in the new
* partition become redundant -- the parent table already has equivalent
* ones, and those will be able to reach the partition. Remove the ones
* in the partition. We identify them because they have our constraint
* OID, as well as being on the referenced rel.
*/
* Looks good! Attach this constraint. The action triggers in the new
* partition become redundant -- the parent table already has equivalent
* ones, and those will be able to reach the partition. Remove the ones
* in the partition. We identify them because they have our constraint
* OID, as well as being on the referenced rel.
*/
"
I try to fix above fk violation. I have two ideas.
i. Do not remove redundant, but when detaching parittion, the action trigger on referenced side will be create again.
I have consider about this situation.
ii. We still remove redundant, and the remove the child action trigger, too. If we do this way.
Should we create action trigger recursively on referced side when detaching partition.
I can't decide which one is better. And I'm not sure that keep this FK VIOLATION will cause some problem.
I rebase and send v3 patch, which only fix NOT FOUND INSERT CHECK TRIGGER.
Tender Wang
Attachment
Hello, I think the fix for the check triggers should be as the attached. Very close to what you did, but you were skipping some operations that needed to be kept. AFAICS this patch works correctly for the posted cases. I haven't looked at the action triggers yet; I think we need to create one trigger for each partition of the referenced side, so we need to loop instead of doing a single one. I find this pair of queries useful; they show which constraints exist and which triggers belong to each. We need to make the constraints and triggers match after a detach right as things would be if the just-detached partition were an individual table having the same foreign key. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "People get annoyed when you try to debug them." (Larry Wall)
Attachment
Alvaro Herrera <alvherre@alvh.no-ip.org> 于2024年7月19日周五 21:18写道:
Hello,
I think the fix for the check triggers should be as the attached. Very
close to what you did, but you were skipping some operations that needed
to be kept. AFAICS this patch works correctly for the posted cases.
After applying the attached, the r_1_p_id_fkey1 will have redundant action
triggers, as below:
postgres=# select oid, conname, contype, conrelid, conindid,conparentid, confrelid,conislocal,coninhcount, connoinherit from pg_constraint where oid = 16402;
oid | conname | contype | conrelid | conindid | conparentid | confrelid | conislocal | coninhcount | connoinherit
-------+----------------+---------+----------+----------+-------------+-----------+------------+-------------+--------------
16402 | r_1_p_id_fkey1 | f | 16394 | 16392 | 0 | 16389 | t | 0 | f
(1 row)
postgres=# select oid, tgrelid, tgparentid, tgconstrrelid, tgconstrindid, tgconstraint from pg_trigger where tgconstraint = 16402;
oid | tgrelid | tgparentid | tgconstrrelid | tgconstrindid | tgconstraint
-------+---------+------------+---------------+---------------+--------------
16403 | 16389 | 16400 | 16394 | 16392 | 16402
16404 | 16389 | 16401 | 16394 | 16392 | 16402
16422 | 16389 | 0 | 16394 | 16392 | 16402
16423 | 16389 | 0 | 16394 | 16392 | 16402
(4 rows)
oid | conname | contype | conrelid | conindid | conparentid | confrelid | conislocal | coninhcount | connoinherit
-------+----------------+---------+----------+----------+-------------+-----------+------------+-------------+--------------
16402 | r_1_p_id_fkey1 | f | 16394 | 16392 | 0 | 16389 | t | 0 | f
(1 row)
postgres=# select oid, tgrelid, tgparentid, tgconstrrelid, tgconstrindid, tgconstraint from pg_trigger where tgconstraint = 16402;
oid | tgrelid | tgparentid | tgconstrrelid | tgconstrindid | tgconstraint
-------+---------+------------+---------------+---------------+--------------
16403 | 16389 | 16400 | 16394 | 16392 | 16402
16404 | 16389 | 16401 | 16394 | 16392 | 16402
16422 | 16389 | 0 | 16394 | 16392 | 16402
16423 | 16389 | 0 | 16394 | 16392 | 16402
(4 rows)
Tender Wang
Alvaro Herrera <alvherre@alvh.no-ip.org> 于2024年7月19日周五 21:18写道:
I find this pair of queries useful; they show which constraints exist
and which triggers belong to each. We need to make the constraints and
triggers match after a detach right as things would be if the
just-detached partition were an individual table having the same foreign
key.
I don't find the useful queries in your last email. Can you provide them.
Thanks.
Tender Wang
On Mon, Jul 22, 2024 at 1:52 PM Tender Wang <tndrwang@gmail.com> wrote: > > > > Alvaro Herrera <alvherre@alvh.no-ip.org> 于2024年7月19日周五 21:18写道: >> >> Hello, >> >> I think the fix for the check triggers should be as the attached. Very >> close to what you did, but you were skipping some operations that needed >> to be kept. AFAICS this patch works correctly for the posted cases. > > > After applying the attached, the r_1_p_id_fkey1 will have redundant action > triggers, as below: > postgres=# select oid, conname, contype, conrelid, conindid,conparentid, confrelid,conislocal,coninhcount, connoinheritfrom pg_constraint where oid = 16402; > oid | conname | contype | conrelid | conindid | conparentid | confrelid | conislocal | coninhcount | connoinherit > -------+----------------+---------+----------+----------+-------------+-----------+------------+-------------+-------------- > 16402 | r_1_p_id_fkey1 | f | 16394 | 16392 | 0 | 16389 | t | 0 | f > (1 row) > > postgres=# select oid, tgrelid, tgparentid, tgconstrrelid, tgconstrindid, tgconstraint from pg_trigger where tgconstraint= 16402; > oid | tgrelid | tgparentid | tgconstrrelid | tgconstrindid | tgconstraint > -------+---------+------------+---------------+---------------+-------------- > 16403 | 16389 | 16400 | 16394 | 16392 | 16402 > 16404 | 16389 | 16401 | 16394 | 16392 | 16402 > 16422 | 16389 | 0 | 16394 | 16392 | 16402 > 16423 | 16389 | 0 | 16394 | 16392 | 16402 > (4 rows) > Yes, seems Alvaro has mentioned that he hasn't looked at the action triggers, in the attached patch, I add some logic that first check if there exists action triggers, if yes, just update their Parent Trigger to InvalidOid. > > -- > Tender Wang -- Regards Junwang Zhao
Attachment
On Fri, Jul 26, 2024 at 2:36 PM Junwang Zhao <zhjwpku@gmail.com> wrote: > > On Mon, Jul 22, 2024 at 1:52 PM Tender Wang <tndrwang@gmail.com> wrote: > > > > > > > > Alvaro Herrera <alvherre@alvh.no-ip.org> 于2024年7月19日周五 21:18写道: > >> > >> Hello, > >> > >> I think the fix for the check triggers should be as the attached. Very > >> close to what you did, but you were skipping some operations that needed > >> to be kept. AFAICS this patch works correctly for the posted cases. > > > > > > After applying the attached, the r_1_p_id_fkey1 will have redundant action > > triggers, as below: > > postgres=# select oid, conname, contype, conrelid, conindid,conparentid, confrelid,conislocal,coninhcount, connoinheritfrom pg_constraint where oid = 16402; > > oid | conname | contype | conrelid | conindid | conparentid | confrelid | conislocal | coninhcount | connoinherit > > -------+----------------+---------+----------+----------+-------------+-----------+------------+-------------+-------------- > > 16402 | r_1_p_id_fkey1 | f | 16394 | 16392 | 0 | 16389 | t | 0 | f > > (1 row) > > > > postgres=# select oid, tgrelid, tgparentid, tgconstrrelid, tgconstrindid, tgconstraint from pg_trigger where tgconstraint= 16402; > > oid | tgrelid | tgparentid | tgconstrrelid | tgconstrindid | tgconstraint > > -------+---------+------------+---------------+---------------+-------------- > > 16403 | 16389 | 16400 | 16394 | 16392 | 16402 > > 16404 | 16389 | 16401 | 16394 | 16392 | 16402 > > 16422 | 16389 | 0 | 16394 | 16392 | 16402 > > 16423 | 16389 | 0 | 16394 | 16392 | 16402 > > (4 rows) > > > > Yes, seems Alvaro has mentioned that he hasn't looked at the > action triggers, in the attached patch, I add some logic that > first check if there exists action triggers, if yes, just update > their Parent Trigger to InvalidOid. > > > > > -- > > Tender Wang > > > > -- > Regards > Junwang Zhao There is a bug report[0] Tender comments might be the same issue as this one, but I tried Alvaro's and mine patch, neither could solve that problem, I did not tried Tender's earlier patch thought. I post the test script below in case you are interested. CREATE TABLE t1 (a int, PRIMARY KEY (a)); CREATE TABLE t (a int, PRIMARY KEY (a), FOREIGN KEY (a) REFERENCES t1) PARTITION BY LIST (a); ALTER TABLE t ATTACH PARTITION t1 FOR VALUES IN (1); ALTER TABLE t DETACH PARTITION t1; ALTER TABLE t ATTACH PARTITION t1 FOR VALUES IN (1); [0] https://www.postgresql.org/message-id/18541-628a61bc267cd2d3@postgresql.org -- Regards Junwang Zhao
On 2024-Jul-26, Junwang Zhao wrote: > There is a bug report[0] Tender comments might be the same > issue as this one, but I tried Alvaro's and mine patch, neither > could solve that problem, I did not tried Tender's earlier patch > thought. I post the test script below in case you are interested. Yeah, I've been looking at this whole debacle this week and after looking at it more closely, I realized that the overall problem requires a much more invasive solution -- namely, that on DETACH, if the referenced table is partitioned, we need to create additional pg_constraint entries from the now-standalone table (was partition) to each of the partitions of the referenced table; and also add action triggers to each of those. Without that, the constraint is incomplete and doesn't work (as reported multiple times already). One thing I have not yet tried is what if the partition being detach is also partitioned. I mean, do we need to handle each sub-partition explicitly in some way? I think the answer is no, but it needs tests. I have written the patch to do this on detach, and AFAICS it works well, though it changes the behavior of some existing tests (IIRC related to self-referencing FKs). Also, the next problem is making sure that ATTACH deals with it correctly. I'm on this bit today. Self-referencing FKs seem to have additional problems :-( The queries I was talking about are these \set tables ''''prim.*''',''forign.*''',''''lone'''' select oid, conparentid, contype, conname, conrelid::regclass, confrelid::regclass, conkey, confkey, conindid::regclass frompg_constraint where contype = 'f' and (conrelid::regclass::text ~ any (array[:tables]) or confrelid::regclass::text ~any (array[:tables])) order by contype, conrelid, confrelid; select tgconstraint, oid, tgrelid::regclass, tgconstrrelid::regclass,tgname, tgparentid, tgconstrindid::regclass, tgfoid::regproc from pg_trigger where tgconstraint in(select oid from pg_constraint where conrelid::regclass::text ~ any (array[:tables]) or confrelid::regclass::text ~ any(array[:tables])) order by tgconstraint, tgrelid::regclass::text, tgfoid; Written as a single line in psql they let you quickly see all the constraints and their associated triggers, so for instance you can see whether this sequence create table prim (a int primary key) partition by list (a); create table prim1 partition of prim for values in (1); create table prim2 partition of prim for values in (2); create table forign (a int references prim) partition by list (a); create table forign1 partition of forign for values in (1); create table forign2 partition of forign for values in (2); alter table forign detach partition forign1; produces the same set of constraints and triggers as this other sequence create table prim (a int primary key) partition by list (a); create table prim1 partition of prim for values in (1); create table prim2 partition of prim for values in (2); create table forign (a int references prim) partition by list (a); create table forign2 partition of forign for values in (2); create table forign1 (a int references prim); The patch is more or less like the attached, far from ready. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ Syntax error: function hell() needs an argument. Please choose what hell you want to involve.
Attachment
Junwang Zhao <zhjwpku@gmail.com> 于2024年7月26日周五 14:57写道:
There is a bug report[0] Tender comments might be the same
issue as this one, but I tried Alvaro's and mine patch, neither
could solve that problem, I did not tried Tender's earlier patch
thought. I post the test script below in case you are interested.
My earlier patch should handle Alexander reported case. But I did not do more
test. I'm not sure that wether or not has dismatching between pg_constraint and pg_trigger.
I aggred with Alvaro said that "requires a much more invasive solution".
--
Tender Wang
On 2024-Jul-26, Tender Wang wrote: > Junwang Zhao <zhjwpku@gmail.com> 于2024年7月26日周五 14:57写道: > > > There is a bug report[0] Tender comments might be the same issue as > > this one, but I tried Alvaro's and mine patch, neither could solve > > that problem, I did not tried Tender's earlier patch thought. I post > > the test script below in case you are interested. > > My earlier patch should handle Alexander reported case. But I did not > do more test. I'm not sure that wether or not has dismatching between > pg_constraint and pg_trigger. > > I aggred with Alvaro said that "requires a much more invasive > solution". Here's the patch which, as far as I can tell, fixes all the reported problems (other than the one in bug 18541, for which I proposed an unrelated fix in that thread[1]). If you can double-check, I would very much appreciate that. Also, I think the test cases the patch adds reflect the provided examples sufficiently, but if we're still failing to cover some, please let me know. As I understand, this fix needs to be applied all the way back to 12, because the overall functionality is that old. However, in branches 14 and back, the patch doesn't apply cleanly, because of the changes we made in commit f4566345cf40 :-( I'm tempted to fix it in branches 15, 16, 17, master now and potentially backpatch later, to avoid dragging things along further. It's taken long enough already. [1] https://postgr.es/m/202408072222.icgykv5yrql5@alvherre.pgsql -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ "World domination is proceeding according to plan" (Andrew Morton)
Attachment
Alvaro Herrera <alvherre@alvh.no-ip.org> 于2024年8月8日周四 06:50写道:
On 2024-Jul-26, Tender Wang wrote:
> Junwang Zhao <zhjwpku@gmail.com> 于2024年7月26日周五 14:57写道:
>
> > There is a bug report[0] Tender comments might be the same issue as
> > this one, but I tried Alvaro's and mine patch, neither could solve
> > that problem, I did not tried Tender's earlier patch thought. I post
> > the test script below in case you are interested.
>
> My earlier patch should handle Alexander reported case. But I did not
> do more test. I'm not sure that wether or not has dismatching between
> pg_constraint and pg_trigger.
>
> I aggred with Alvaro said that "requires a much more invasive
> solution".
Here's the patch which, as far as I can tell, fixes all the reported
problems (other than the one in bug 18541, for which I proposed an
unrelated fix in that thread[1]). If you can double-check, I would very
much appreciate that. Also, I think the test cases the patch adds
reflect the provided examples sufficiently, but if we're still failing
to cover some, please let me know.
Thanks for your work. I will take some time to look it in detail.
Tender Wang
Alvaro Herrera <alvherre@alvh.no-ip.org> 于2024年8月8日周四 06:50写道:
On 2024-Jul-26, Tender Wang wrote:
> Junwang Zhao <zhjwpku@gmail.com> 于2024年7月26日周五 14:57写道:
>
> > There is a bug report[0] Tender comments might be the same issue as
> > this one, but I tried Alvaro's and mine patch, neither could solve
> > that problem, I did not tried Tender's earlier patch thought. I post
> > the test script below in case you are interested.
>
> My earlier patch should handle Alexander reported case. But I did not
> do more test. I'm not sure that wether or not has dismatching between
> pg_constraint and pg_trigger.
>
> I aggred with Alvaro said that "requires a much more invasive
> solution".
Here's the patch which, as far as I can tell, fixes all the reported
problems (other than the one in bug 18541, for which I proposed an
unrelated fix in that thread[1]). If you can double-check, I would very
much appreciate that. Also, I think the test cases the patch adds
reflect the provided examples sufficiently, but if we're still failing
to cover some, please let me know.
I did a lot of tests, and did not report error and did not find any warnings using oidjoins.sql.
+1
Tender Wang
Alvaro Herrera <alvherre@alvh.no-ip.org> 于2024年8月8日周四 06:50写道:
On 2024-Jul-26, Tender Wang wrote:
> Junwang Zhao <zhjwpku@gmail.com> 于2024年7月26日周五 14:57写道:
>
> > There is a bug report[0] Tender comments might be the same issue as
> > this one, but I tried Alvaro's and mine patch, neither could solve
> > that problem, I did not tried Tender's earlier patch thought. I post
> > the test script below in case you are interested.
>
> My earlier patch should handle Alexander reported case. But I did not
> do more test. I'm not sure that wether or not has dismatching between
> pg_constraint and pg_trigger.
>
> I aggred with Alvaro said that "requires a much more invasive
> solution".
Here's the patch which, as far as I can tell, fixes all the reported
problems (other than the one in bug 18541, for which I proposed an
unrelated fix in that thread[1]). If you can double-check, I would very
much appreciate that. Also, I think the test cases the patch adds
reflect the provided examples sufficiently, but if we're still failing
to cover some, please let me know.
As I understand, this fix needs to be applied all the way back to 12,
because the overall functionality is that old. However, in branches 14
and back, the patch doesn't apply cleanly, because of the changes we
made in commit f4566345cf40 :-( I'm tempted to fix it in branches 15,
16, 17, master now and potentially backpatch later, to avoid dragging
things along further. It's taken long enough already.
I haven't seen this patch on master. Is there something that we fotget to handle?
Tender Wang
On 2024-Aug-20, Tender Wang wrote: > > As I understand, this fix needs to be applied all the way back to 12, > > because the overall functionality is that old. However, in branches 14 > > and back, the patch doesn't apply cleanly, because of the changes we > > made in commit f4566345cf40 :-( I'm tempted to fix it in branches 15, > > 16, 17, master now and potentially backpatch later, to avoid dragging > > things along further. It's taken long enough already. > > I haven't seen this patch on master. Is there something that we fotget to > handle? I haven't pushed it yet, mostly because of being unsure about not doing anything for the oldest branches (14 and back). -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "Java is clearly an example of money oriented programming" (A. Stepanov)
Alvaro Herrera <alvherre@alvh.no-ip.org> 于2024年8月20日周二 10:25写道:
On 2024-Aug-20, Tender Wang wrote:
> > As I understand, this fix needs to be applied all the way back to 12,
> > because the overall functionality is that old. However, in branches 14
> > and back, the patch doesn't apply cleanly, because of the changes we
> > made in commit f4566345cf40 :-( I'm tempted to fix it in branches 15,
> > 16, 17, master now and potentially backpatch later, to avoid dragging
> > things along further. It's taken long enough already.
>
> I haven't seen this patch on master. Is there something that we fotget to
> handle?
I haven't pushed it yet, mostly because of being unsure about not doing
anything for the oldest branches (14 and back).
I only did codes and tests on master. I'm not sure how much complicated it would be
to fix this issue on branches 14 and back.
Tender Wang
Re: [BUG] Fix DETACH with FK pointing to a partitioned table fails
From
Jehan-Guillaume de Rorthais
Date:
On Wed, 7 Aug 2024 18:50:10 -0400 Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > On 2024-Jul-26, Tender Wang wrote: > > > Junwang Zhao <zhjwpku@gmail.com> 于2024年7月26日周五 14:57写道: > > > > > There is a bug report[0] Tender comments might be the same issue as > > > this one, but I tried Alvaro's and mine patch, neither could solve > > > that problem, I did not tried Tender's earlier patch thought. I post > > > the test script below in case you are interested. > > > > My earlier patch should handle Alexander reported case. But I did not > > do more test. I'm not sure that wether or not has dismatching between > > pg_constraint and pg_trigger. > > > > I aggred with Alvaro said that "requires a much more invasive > > solution". > > Here's the patch which, as far as I can tell, fixes all the reported > problems (other than the one in bug 18541, for which I proposed an > unrelated fix in that thread[1]). If you can double-check, I would very > much appreciate that. Also, I think the test cases the patch adds > reflect the provided examples sufficiently, but if we're still failing > to cover some, please let me know. I'm back on this issue as well. I start poking at this patch to review it, test it, challenge it and then report here. I'll try to check if some other issues might have lost/forgot on they way as well. In the meantime, thank you Alvaro, Tender and Junwang for your work, time, research and patches!
On 2024-Aug-20, Jehan-Guillaume de Rorthais wrote: > I'm back on this issue as well. I start poking at this patch to review it, > test it, challenge it and then report here. > > I'll try to check if some other issues might have lost/forgot on they way as > well. Thanks, much appreciated, looking forward to your feedback. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
Alvaro Herrera <alvherre@alvh.no-ip.org> 于2024年8月22日周四 06:00写道:
On 2024-Aug-19, Alvaro Herrera wrote:
> I haven't pushed it yet, mostly because of being unsure about not doing
> anything for the oldest branches (14 and back).
Last night, after much mulling on this, it occurred to me that one easy
way out of this problem for the old branches, without having to write
more code, is to simply remove the constraint from the partition when
it's detached (but only if they reference a partitioned relation). It's
not a great solution, but at least we're no longer leaving bogus catalog
entries around. That would be like the attached patch, which was cut
from 14 and applies cleanly to 12 and 13. I'd throw in a couple of
tests and call it a day.
I apply the v14 patch on branch REL_14_STABLE. I run this thread issue and I
find below error.
postgres=# CREATE TABLE p ( id bigint PRIMARY KEY ) PARTITION BY list (id);CREATE TABLE p_1 PARTITION OF p FOR VALUES IN (1);
CREATE TABLE r_1 (
id bigint PRIMARY KEY,
p_id bigint NOT NULL,
FOREIGN KEY (p_id) REFERENCES p (id)
);
CREATE TABLE r (
id bigint PRIMARY KEY,
p_id bigint NOT NULL,
FOREIGN KEY (p_id) REFERENCES p (id)
) PARTITION BY list (id);
ALTER TABLE r ATTACH PARTITION r_1 FOR VALUES IN (1);
ALTER TABLE r DETACH PARTITION r_1;
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
ALTER TABLE
ERROR: cache lookup failed for constraint 16400
I haven't look into details to find out where cause above error.
Tender Wang
Tender Wang <tndrwang@gmail.com> 于2024年8月22日周四 11:19写道:
Alvaro Herrera <alvherre@alvh.no-ip.org> 于2024年8月22日周四 06:00写道:On 2024-Aug-19, Alvaro Herrera wrote:
> I haven't pushed it yet, mostly because of being unsure about not doing
> anything for the oldest branches (14 and back).
Last night, after much mulling on this, it occurred to me that one easy
way out of this problem for the old branches, without having to write
more code, is to simply remove the constraint from the partition when
it's detached (but only if they reference a partitioned relation). It's
not a great solution, but at least we're no longer leaving bogus catalog
entries around. That would be like the attached patch, which was cut
from 14 and applies cleanly to 12 and 13. I'd throw in a couple of
tests and call it a day.I apply the v14 patch on branch REL_14_STABLE. I run this thread issue and Ifind below error.postgres=# CREATE TABLE p ( id bigint PRIMARY KEY ) PARTITION BY list (id);
CREATE TABLE p_1 PARTITION OF p FOR VALUES IN (1);
CREATE TABLE r_1 (
id bigint PRIMARY KEY,
p_id bigint NOT NULL,
FOREIGN KEY (p_id) REFERENCES p (id)
);
CREATE TABLE r (
id bigint PRIMARY KEY,
p_id bigint NOT NULL,
FOREIGN KEY (p_id) REFERENCES p (id)
) PARTITION BY list (id);
ALTER TABLE r ATTACH PARTITION r_1 FOR VALUES IN (1);
ALTER TABLE r DETACH PARTITION r_1;
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
ALTER TABLEERROR: cache lookup failed for constraint 16400
I guess it is because cascade dropping, the oid=16400 has been deleted.
Adding a list that remember dropped constraint, if we find the parent of constraint
is in the list, we skip.
By the way, I run above SQL sequences on REL_14_STABLE without your partch.
I didn't find reporting error, and running oidjoins.sql didn't report warnings.
Do I miss something?
Tender Wang
Alvaro Herrera <alvherre@alvh.no-ip.org> 于2024年8月23日周五 02:41写道:
On 2024-Aug-22, Tender Wang wrote:
> I apply the v14 patch on branch REL_14_STABLE. I run this thread issue and I
> find below error.
> [...]
> ERROR: cache lookup failed for constraint 16400
>
> I haven't look into details to find out where cause above error.
Right, we try to drop the constraint twice. We can dodge this by
collecting all constraints to drop in the loop and process them in a
single performMultipleDeletions, as in the attached v14-2.
Can we move the CommandCounterIncrement() in
if (get_rel_relkind(fk->confrelid) == RELKIND_PARTITIONED_TABLE) block
to be close to performMultipleDeletions().
Others look good to me.
TBH I think it's a bit infuriating that we lose the constraint (which
was explicitly declared) because of ATTACH/DETACH.
Agree.
Do you think it is friendly to users if we add hints that inform them the constraint was dropped?
Tender Wang
Re: [BUG] Fix DETACH with FK pointing to a partitioned table fails
From
Jehan-Guillaume de Rorthais
Date:
Hi, On Tue, 20 Aug 2024 23:09:27 -0400 Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > On 2024-Aug-20, Jehan-Guillaume de Rorthais wrote: > > > I'm back on this issue as well. I start poking at this patch to review it, > > test it, challenge it and then report here. > > > > I'll try to check if some other issues might have lost/forgot on they way as > > well. > > Thanks, much appreciated, looking forward to your feedback. Sorry, it took me a while to come back to you on this topic. It has been hard to untangle subjects, reproductions and patch… There's three distinct issues/thread: * Constraint & trigger catalog cleanup [1] (this thread) * FK broken after DETACH [2] * Maintenance consideration about self referencing FK between partitions [3] 0. Splitting in two commits Your patch addresses two bugs: * one for the constraint & trigger catalog cleanup; * one for the FK broken after DETACH. These issues are unrelated, therefore I am wondering if it would be better to split their resolution in two different patches. Last year, I reported them in two different threads [1][2]. The first with implementation consideration, the second with a demo/proposal/draft fix. Unfortunately, this discussion about the first bug slipped to the second one when Tender stumbled on this bug as well and reported it. But, both bugs can be triggered independently, and have distinct fixes. Finally, splitting the patch might help setting finer patch co-authoring. I know my patch for [2] was a draft and somewhat trivial, but I spend a fair amount of time to report, then produce a draft patch, so I was wondering if it would be candidate to a co-author flag on this (small, humble and refactored by you) patch? I'm definitely not involved (yet) in the second part though. 1. Constraint & trigger catalog cleanup [1] I have been focusing on the current master branch and haven't taken into consideration backpatching related issues yet. When I first studied this bug and reported it, I held on writing a patch because it seemed it would duplicate some existing code. I wrote: > I poked around DetachPartitionFinalize() to try to find a way to fix this, > but it looks like it would duplicate a bunch of code from other code path > (eg. from CloneFkReferenced). My proposal was to clean everything related to the old FK and use some existing code path to create a fresh and cleaner one. This requires some refactoring in existing code, but we would win a common path of code between create/attach/detach, a cleaner catalog and easier code maintenance. I've finally been able to write a PoC that implement this by calling addFkRecurseReferenced() from DetachPartitionFinalize(). I can't join it here because it is currently an ugly draft and I still have some work to do. But I would really like to have a little more time (one or two days) to explore this avenue further before you commit yours, if you don't mind? Or maybe you already have considered this avenue and rejected it? 2. FK broken after DETACH [2] Comparing your patch to my draft from [2], I just have a question about the refactoring. Fencing the constraint/trigger removal inside a conditional RELKIND_PARTITIONED_TABLE block of code was obvious. It avoids some useless catalog scan compared to my draft patch. Also, the "contype == CONSTRAINT_FOREIGN" I had sounds safe to remove. However, is it clean/light enough to add the "conparentid == fk->conoid" in the scan key as I did? I'm not sure it saves anything else but the small conditional block you inserted inside the loop, but I wonder if there's a serious concern about this anyway? Last, considering the tests, I think we should add some rows in the tables, to make sure the FK is correctly enforced after DETACH. Something like: CREATE SCHEMA fkpart12 CREATE TABLE fk_p ( id bigint PRIMARY KEY ) PARTITION BY list (id) CREATE TABLE fk_p_1 PARTITION OF fk_p FOR VALUES IN (1) CREATE TABLE fk_p_2 PARTITION OF fk_p FOR VALUES IN (2) CREATE TABLE fk_r_1 ( id bigint PRIMARY KEY, p_id bigint NOT NULL) CREATE TABLE fk_r_2 ( id bigint PRIMARY KEY, p_id bigint NOT NULL) CREATE TABLE fk_r ( id bigint PRIMARY KEY, p_id bigint NOT NULL, FOREIGN KEY (p_id) REFERENCES fk_p (id) ) PARTITION BY list (id); SET search_path TO fkpart12; INSERT INTO fk_p VALUES (1); ALTER TABLE fk_r ATTACH PARTITION fk_r_2 FOR VALUES IN (2); ALTER TABLE fk_r ATTACH PARTITION fk_r_1 FOR VALUES IN (1); \d fk_r_1 INSERT INTO fk_r VALUES (1,1); ALTER TABLE fk_r DETACH PARTITION fk_r_1; \d fk_r_1 INSERT INTO c_1 VALUES (2,2); -- fails as EXPECTED DELETE FROM p; -- should fails but was buggy ALTER TABLE fk_r ATTACH PARTITION fk_r_1 FOR VALUES IN (1); \d fk_r_1 3. Self referencing FK between partitions [3] You added to your commit message: verify: 20230707175859.17c91538@karst I'm not sure what the "verify" flag means. Unfortunately, your patch doesn't help on this topic. This bug really needs more discussion and design consideration. I have thought about this problem and haven't found any solution that don't involve breaking the current core behavior. It really looks like an impossible bug to fix without dropping the constraint itself. On both side. Maybe the only sane behavior would be to forbid detaching the partition if it would break the constraint. But let's discuss this on the related thread, should we? Thank you for reading me all the way down to the bottom! Regards, [1] https://www.postgresql.org/message-id/20230705233028.2f554f73%40karst [2] https://www.postgresql.org/message-id/20230420144344.40744130%40karst [3] https://www.postgresql.org/message-id/20230707175859.17c91538%40karst
Jehan-Guillaume de Rorthais <jgdr@dalibo.com> 于2024年9月3日周二 05:02写道:
Hi,
On Tue, 20 Aug 2024 23:09:27 -0400
Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> On 2024-Aug-20, Jehan-Guillaume de Rorthais wrote:
>
> > I'm back on this issue as well. I start poking at this patch to review it,
> > test it, challenge it and then report here.
> >
> > I'll try to check if some other issues might have lost/forgot on they way as
> > well.
>
> Thanks, much appreciated, looking forward to your feedback.
Sorry, it took me a while to come back to you on this topic. It has been hard to
untangle subjects, reproductions and patch…
There's three distinct issues/thread:
* Constraint & trigger catalog cleanup [1] (this thread)
* FK broken after DETACH [2]
* Maintenance consideration about self referencing FK between partitions [3]
The third issue has been fixed, and codes have been pushed. Because of my misunderstanding,
It should not be here.
0. Splitting in two commits
Your patch addresses two bugs:
* one for the constraint & trigger catalog cleanup;
* one for the FK broken after DETACH.
These issues are unrelated, therefore I am wondering if it would be better
to split their resolution in two different patches.
Last year, I reported them in two different threads [1][2]. The first with
implementation consideration, the second with a demo/proposal/draft fix.
Unfortunately, this discussion about the first bug slipped to the second one
when Tender stumbled on this bug as well and reported it. But, both bugs can
be triggered independently, and have distinct fixes.
It's ok that these two issues are fixed together. It is because current codes don't handle better
when the referenced side is the partition table.
Finally, splitting the patch might help setting finer patch co-authoring. I
know my patch for [2] was a draft and somewhat trivial, but I spend a fair
amount of time to report, then produce a draft patch, so I was wondering if
it would be candidate to a co-author flag on this (small, humble and
refactored by you) patch?
I'm definitely not involved (yet) in the second part though.
1. Constraint & trigger catalog cleanup [1]
I have been focusing on the current master branch and haven't taken into
consideration backpatching related issues yet.
When I first studied this bug and reported it, I held on writing a patch
because it seemed it would duplicate some existing code. I wrote:
> I poked around DetachPartitionFinalize() to try to find a way to fix this,
> but it looks like it would duplicate a bunch of code from other code path
> (eg. from CloneFkReferenced).
My proposal was to clean everything related to the old FK and use some
existing code path to create a fresh and cleaner one. This requires some
refactoring in existing code, but we would win a common path of code between
create/attach/detach, a cleaner catalog and easier code maintenance.
I've finally been able to write a PoC that implement this by calling
addFkRecurseReferenced() from DetachPartitionFinalize(). I can't join
it here because it is currently an ugly draft and I still have some work
to do. But I would really like to have a little more time (one or two days) to
explore this avenue further before you commit yours, if you don't mind? Or
maybe you already have considered this avenue and rejected it?
2. FK broken after DETACH [2]
Comparing your patch to my draft from [2], I just have a question about the
refactoring.
Fencing the constraint/trigger removal inside a conditional
RELKIND_PARTITIONED_TABLE block of code was obvious. It avoids some useless
catalog scan compared to my draft patch.
Also, the "contype == CONSTRAINT_FOREIGN" I had sounds safe to remove.
However, is it clean/light enough to add the "conparentid == fk->conoid" in
the scan key as I did? I'm not sure it saves anything else but the small
conditional block you inserted inside the loop, but I wonder if there's a
serious concern about this anyway?
Last, considering the tests, I think we should add some rows in the tables,
to make sure the FK is correctly enforced after DETACH. Something like:
CREATE SCHEMA fkpart12
CREATE TABLE fk_p ( id bigint PRIMARY KEY ) PARTITION BY list (id)
CREATE TABLE fk_p_1 PARTITION OF fk_p FOR VALUES IN (1)
CREATE TABLE fk_p_2 PARTITION OF fk_p FOR VALUES IN (2)
CREATE TABLE fk_r_1 ( id bigint PRIMARY KEY, p_id bigint NOT NULL)
CREATE TABLE fk_r_2 ( id bigint PRIMARY KEY, p_id bigint NOT NULL)
CREATE TABLE fk_r ( id bigint PRIMARY KEY, p_id bigint NOT NULL,
FOREIGN KEY (p_id) REFERENCES fk_p (id)
) PARTITION BY list (id);
SET search_path TO fkpart12;
INSERT INTO fk_p VALUES (1);
ALTER TABLE fk_r ATTACH PARTITION fk_r_2 FOR VALUES IN (2);
ALTER TABLE fk_r ATTACH PARTITION fk_r_1 FOR VALUES IN (1);
\d fk_r_1
INSERT INTO fk_r VALUES (1,1);
ALTER TABLE fk_r DETACH PARTITION fk_r_1;
\d fk_r_1
INSERT INTO c_1 VALUES (2,2); -- fails as EXPECTED
DELETE FROM p; -- should fails but was buggy
ALTER TABLE fk_r ATTACH PARTITION fk_r_1 FOR VALUES IN (1);
\d fk_r_1
3. Self referencing FK between partitions [3]
You added to your commit message:
verify: 20230707175859.17c91538@karst
I'm not sure what the "verify" flag means. Unfortunately, your patch doesn't
help on this topic.
This bug really needs more discussion and design consideration. I have
thought about this problem and haven't found any solution that don't involve
breaking the current core behavior. It really looks like an impossible bug to
fix without dropping the constraint itself. On both side. Maybe the only sane
behavior would be to forbid detaching the partition if it would break the
constraint.
But let's discuss this on the related thread, should we?
Thank you for reading me all the way down to the bottom!
Regards,
[1] https://www.postgresql.org/message-id/20230705233028.2f554f73%40karst
[2] https://www.postgresql.org/message-id/20230420144344.40744130%40karst
[3] https://www.postgresql.org/message-id/20230707175859.17c91538%40karst
Tender Wang
Re: [BUG] Fix DETACH with FK pointing to a partitioned table fails
From
Jehan-Guillaume de Rorthais
Date:
Hi Tender, On Tue, 3 Sep 2024 10:16:44 +0800 Tender Wang <tndrwang@gmail.com> wrote: > Jehan-Guillaume de Rorthais <jgdr@dalibo.com> 于2024年9月3日周二 05:02写道: […] > > * Constraint & trigger catalog cleanup [1] (this thread) > > * FK broken after DETACH [2] > > * Maintenance consideration about self referencing FK between partitions > > [3] > > > > The third issue has been fixed, and codes have been pushed. Because of my > misunderstanding, > It should not be here. I just retried the SQL scenario Guillaume gave on both master and master with Alvaro's patch. See: https://www.postgresql.org/message-id/flat/CAECtzeWHCA%2B6tTcm2Oh2%2Bg7fURUJpLZb-%3DpRXgeWJ-Pi%2BVU%3D_w%40mail.gmail.com It doesn't seem fixed at all. Maybe you are mixing up with another thread/issue? > > 0. Splitting in two commits > > > > […] > > > > Unfortunately, this discussion about the first bug slipped to the second > > one when Tender stumbled on this bug as well and reported it. But, both > > bugs can be triggered independently, and have distinct fixes. > > It's ok that these two issues are fixed together. It is because current > codes don't handle better when the referenced side is the partition table. I don't feel the same. Mixing two discussions and fixes together in the same thread and commit makes life harder. Last year, when you found the other bug, I tried to point you to the right thread to avoid mixing subjects: https://www.postgresql.org/message-id/20230810170345.26e41b05%40karst If I wrote about the third (non fixed) issue yesterday, it's just because Alvaro included a reference to it in his commit message. But I think we should really keep up with this issue on its own, dedicated discussion: https://www.postgresql.org/message-id/flat/CAECtzeWHCA%2B6tTcm2Oh2%2Bg7fURUJpLZb-%3DpRXgeWJ-Pi%2BVU%3D_w%40mail.gmail.com Regards
Jehan-Guillaume de Rorthais <jgdr@dalibo.com> 于2024年9月3日周二 17:26写道:
Hi Tender,
On Tue, 3 Sep 2024 10:16:44 +0800
Tender Wang <tndrwang@gmail.com> wrote:
> Jehan-Guillaume de Rorthais <jgdr@dalibo.com> 于2024年9月3日周二 05:02写道:
[…]
> > * Constraint & trigger catalog cleanup [1] (this thread)
> > * FK broken after DETACH [2]
> > * Maintenance consideration about self referencing FK between partitions
> > [3]
> >
>
> The third issue has been fixed, and codes have been pushed. Because of my
> misunderstanding,
> It should not be here.
I just retried the SQL scenario Guillaume gave on both master and master with
Alvaro's patch. See:
https://www.postgresql.org/message-id/flat/CAECtzeWHCA%2B6tTcm2Oh2%2Bg7fURUJpLZb-%3DpRXgeWJ-Pi%2BVU%3D_w%40mail.gmail.com
It doesn't seem fixed at all. Maybe you are mixing up with another thread/issue?
Sorry, I mixed up the third issue with the Alexander reported issue. Please ignore the above noise.
> > 0. Splitting in two commits
> >
> > […]
> >
> > Unfortunately, this discussion about the first bug slipped to the second
> > one when Tender stumbled on this bug as well and reported it. But, both
> > bugs can be triggered independently, and have distinct fixes.
>
> It's ok that these two issues are fixed together. It is because current
> codes don't handle better when the referenced side is the partition table.
I don't feel the same. Mixing two discussions and fixes together in the same
thread and commit makes life harder.
Hmm, these two issues have a close relationship. Anyway, I think it's ok to fix the two issues together.
Last year, when you found the other bug, I tried to point you to the
right thread to avoid mixing subjects:
https://www.postgresql.org/message-id/20230810170345.26e41b05%40karst
If I wrote about the third (non fixed) issue yesterday, it's just because
Alvaro included a reference to it in his commit message. But I think we should
really keep up with this issue on its own, dedicated discussion:
https://www.postgresql.org/message-id/flat/CAECtzeWHCA%2B6tTcm2Oh2%2Bg7fURUJpLZb-%3DpRXgeWJ-Pi%2BVU%3D_w%40mail.gmail.com
Thanks for the reminder. I didn't take the time to look into the third issue. Please give me some to analyze it.
Thanks,
Tender Wang
Alvaro Herrera <alvherre@alvh.no-ip.org> 于2024年8月8日周四 06:50写道:
On 2024-Jul-26, Tender Wang wrote:
> Junwang Zhao <zhjwpku@gmail.com> 于2024年7月26日周五 14:57写道:
>
> > There is a bug report[0] Tender comments might be the same issue as
> > this one, but I tried Alvaro's and mine patch, neither could solve
> > that problem, I did not tried Tender's earlier patch thought. I post
> > the test script below in case you are interested.
>
> My earlier patch should handle Alexander reported case. But I did not
> do more test. I'm not sure that wether or not has dismatching between
> pg_constraint and pg_trigger.
>
> I aggred with Alvaro said that "requires a much more invasive
> solution".
Here's the patch which, as far as I can tell, fixes all the reported
problems (other than the one in bug 18541, for which I proposed an
unrelated fix in that thread[1]). If you can double-check, I would very
much appreciate that. Also, I think the test cases the patch adds
reflect the provided examples sufficiently, but if we're still failing
to cover some, please let me know.
When I review Jehan-Guillaume v2 patch, I found the below codes that need
a little tweak. In DetachPartitionFinalize()
/*
* If the referenced side is partitioned (which we know because our
* parent's constraint points to a different relation than ours) then
* we must, in addition to the above, create pg_constraint rows that
* point to each partition, each with its own action triggers.
*/
if (parentConForm->conrelid != conform->conrelid)
* If the referenced side is partitioned (which we know because our
* parent's constraint points to a different relation than ours) then
* we must, in addition to the above, create pg_constraint rows that
* point to each partition, each with its own action triggers.
*/
if (parentConForm->conrelid != conform->conrelid)
I found that the above IF was always true, regardless of whether the referenced side is partitioned.
Although find_all_inheritors() can return an empty children list when the referenced side is not partitioned,
we can avoid much useless work.
How about this way:
if (get_rel_relkind(conform->confrelid) == RELKIND_PARTITIONED_TABLE)
--
Thanks,
Tender Wang
Alvaro Herrera <alvherre@alvh.no-ip.org> 于2024年10月18日周五 22:52写道:
On 2024-Sep-26, Jehan-Guillaume de Rorthais wrote:
> REL_14_STABLE backport doesn't seem trivial, so I'll wait for some
> feedback, review & decision before going further down in backpatching.
Hi, thanks for these patches. I have made some edits of my own. In the
end I decided that I quite liked the new structure of the
addFkConstraint() function and friends. I added a bunch of comments and
changed names somewhat. Also, I think the benefit of the refactoring is
more obvious when all patches are squashed together, so I did that.
For branch 14, I opted to make it delete the constraints on detach.
This isn't ideal but unless somebody wants to spend a lot more time on
this, it seems the best we can do. Leaving broken constraints around
seems worse. The patch for 14 doesn't apply to 13 sadly. I didn't have
time to verify why, but it seems there's some rather larger conflict in
one spot.
I hope to be able to get these pushed over the weekend. That would give
us a couple of weeks before the November releases (but my availability
in those two weeks might be spotty.)
I spent some time going through your test additions and ended up with
your functions in this form:
-- displays constraints in schema fkpart12
CREATE OR REPLACE FUNCTION
fkpart12_constraints(OUT conrel regclass, OUT conname name,
OUT confrelid regclass, OUT conparent text)
RETURNS SETOF record AS $$
WITH RECURSIVE arrh AS (
SELECT oid, conrelid, conname, confrelid, NULL::name AS conparent
FROM pg_constraint
WHERE connamespace = 'fkpart12'::regnamespace AND
contype = 'f' AND conparentid = 0
UNION ALL
SELECT c.oid, c.conrelid, c.conname, c.confrelid,
(pg_identify_object('pg_constraint'::regclass, arrh.oid, 0)).identity
FROM pg_constraint c
JOIN arrh ON c.conparentid = arrh.oid
) SELECT conrelid::regclass, conname, confrelid::regclass, conparent
FROM arrh
ORDER BY conrelid::regclass::text, conname;
$$
LANGUAGE SQL;
-- displays triggers in tables in schema fkpart12
CREATE OR REPLACE FUNCTION
fkpart12_triggers(OUT tablename regclass, OUT constr text,
OUT samefunc boolean, OUT parent text)
RETURNS SETOF record AS $$
WITH RECURSIVE arrh AS (
SELECT t.oid, t.tgrelid::regclass as tablename, tgname,
t.tgfoid::regproc as trigfn,
(pg_identify_object('pg_constraint'::regclass, c.oid, 0)).identity as constr,
NULL::bool as samefunc,
NULL::name AS parent
FROM pg_trigger t
LEFT JOIN pg_constraint c ON c.oid = t.tgconstraint
WHERE (SELECT relnamespace FROM pg_class WHERE oid = t.tgrelid) = 'fkpart12'::regnamespace
AND c.contype = 'f' AND t.tgparentid = 0
UNION ALL
SELECT t2.oid, t2.tgrelid::regclass as tablename, t2.tgname,
t2.tgfoid::regproc as trigfn,
(pg_identify_object('pg_constraint'::regclass, c2.oid, 0)).identity,
arrh.trigfn = t2.tgfoid as samefunc,
replace((pg_identify_object('pg_trigger'::regclass, t2.tgparentid, 0)).identity,
t2.tgparentid::text, 'TGOID')
FROM pg_trigger t2
LEFT JOIN pg_constraint c2 ON c2.oid = t2.tgconstraint
JOIN arrh ON t2.tgparentid = arrh.oid
) SELECT tablename, constr, samefunc, parent
FROM arrh
ORDER BY tablename::text, constr;
$$
LANGUAGE SQL;
However, in the end I think this is a very good technique to verify that
the fix works correctly, but it's excessive to include these results in
the tests forevermore. So I've left them out for now. Maybe we should
reconsider on the older branches?
Hi,
I looked at the patch on master. I gave a little comment in [1]
I reconsider the codes. I suspect that we don't need the below if statement anymore.
/*
* If the referenced side is partitioned (which we know because our
* parent's constraint points to a different relation than ours) then
* we must, in addition to the above, create pg_constraint rows that
* point to each partition, each with its own action triggers.
*/
if (parentConForm->conrelid != conform->conrelid)
* If the referenced side is partitioned (which we know because our
* parent's constraint points to a different relation than ours) then
* we must, in addition to the above, create pg_constraint rows that
* point to each partition, each with its own action triggers.
*/
if (parentConForm->conrelid != conform->conrelid)
{
...
}
The above contidion is always true according to my test.
I haven't figured out an anti-case.
Any thoughts?
Thanks,
Tender Wang
Alvaro Herrera <alvherre@alvh.no-ip.org> 于2024年10月22日周二 05:52写道:
On 2024-Oct-21, Tender Wang wrote:
> I suspect that we don't need the below if
> statement anymore.
> /*
> * If the referenced side is partitioned (which we know because our
> * parent's constraint points to a different relation than ours) then
> * we must, in addition to the above, create pg_constraint rows that
> * point to each partition, each with its own action triggers.
> */
> if (parentConForm->conrelid != conform->conrelid)
> {
> ...
> }
>
> The above contidion is always true according to my test.
> I haven't figured out an anti-case.
You're right, this is useless, we can remove the 'if' line. I kept the
block though, to have a place for all those local variable declarations
(otherwise the code looks messier than it needs to).
Agree.
I also noticed that addFkRecurseReferenced() is uselessly taking a List
**wqueue argument but doesn't use it, so I removed it (as fallout, other
routines don't need it either, especially DetachPartitionFinalize). I
added some commentary on the creation of dependencies in
addFkConstraint().
Yeah, I also noticed this before.
I also include a few other cosmetic changes; just comment layout
changes.
This time I attach the patch for master only; the others have changed
identically. 14 is unchanged from before. I figured that the conflict
from 14 to 13 was trivial to resolve -- it was just because of DETACH
CONCURRENTLY, so some code moves around, but that's all that happens.
I don't find any other problems with the newest patch.
Thanks,
Tender Wang
Re: [BUG] Fix DETACH with FK pointing to a partitioned table fails
From
Jehan-Guillaume de Rorthais
Date:
On Fri, 18 Oct 2024 16:50:59 +0200 Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > On 2024-Sep-26, Jehan-Guillaume de Rorthais wrote: > > > REL_14_STABLE backport doesn't seem trivial, so I'll wait for some > > feedback, review & decision before going further down in backpatching. > > Hi, thanks for these patches. I have made some edits of my own. In the > end I decided that I quite liked the new structure of the > addFkConstraint() function and friends. Yes, I especially like the fact that addFkRecurseReferencing() and addFkRecurseReferenced() have now the same logic/behavior. > I added a bunch of comments and changed names somewhat. checked. > Also, I think the benefit of the refactoring is > more obvious when all patches are squashed together, so I did that. OK. > For branch 14, I opted to make it delete the constraints on detach. > This isn't ideal but unless somebody wants to spend a lot more time on > this, it seems the best we can do. Leaving broken constraints around > seems worse. Keep that in mind, and move ahead to the next level: self-FK on partitioned table! ;-) https://www.postgresql.org/message-id/flat/20230707175859.17c91538%40karst#0dc7b8afd8b780899021bbb075598250 > […] > > I spent some time going through your test additions and ended up with > your functions in this form: > […] > > However, in the end I think this is a very good technique to verify that > the fix works correctly, but it's excessive to include these results in > the tests forevermore. So I've left them out for now. Maybe we should > reconsider on the older branches? The point here was to make sure futur work/refactoring don't forget/break anything in the catalog representation of FK on partitions. Regards,
On 2024-Oct-22, Jehan-Guillaume de Rorthais wrote: > On Fri, 18 Oct 2024 16:50:59 +0200 > Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > For branch 14, I opted to make it delete the constraints on detach. > > This isn't ideal but unless somebody wants to spend a lot more time on > > this, it seems the best we can do. Leaving broken constraints around > > seems worse. > > Keep that in mind, and move ahead to the next level: self-FK on partitioned > table! ;-) > > https://www.postgresql.org/message-id/flat/20230707175859.17c91538%40karst#0dc7b8afd8b780899021bbb075598250 Yeah. I pushed these patches finally, thanks! Now we can discuss what to do about self-referencing FKs ... maybe we should consider dropping the FK on detach in problematic cases. > > However, in the end I think this is a very good technique to verify that > > the fix works correctly, but it's excessive to include these results in > > the tests forevermore. So I've left them out for now. Maybe we should > > reconsider on the older branches? > > The point here was to make sure futur work/refactoring don't forget/break > anything in the catalog representation of FK on partitions. There is that ... but I think testing for user-visible symptoms is better. I'm not sure I want to bet against the odds that this will become make-work to keep the output of internal catalog state up to date for whatever other changes we need to do. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ Are you not unsure you want to delete Firefox? [Not unsure] [Not not unsure] [Cancel] http://smylers.hates-software.com/2008/01/03/566e45b2.html
Re: [BUG] Fix DETACH with FK pointing to a partitioned table fails
From
Jehan-Guillaume de Rorthais
Date:
On Mon, 21 Oct 2024 23:52:18 +0200 Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > On 2024-Oct-21, Tender Wang wrote: > > > I suspect that we don't need the below if > > statement anymore. > > /* > > * If the referenced side is partitioned (which we know because our > > * parent's constraint points to a different relation than ours) then > > * we must, in addition to the above, create pg_constraint rows that > > * point to each partition, each with its own action triggers. > > */ > > if (parentConForm->conrelid != conform->conrelid) > > { > > ... > > } > > > > The above contidion is always true according to my test. > > I haven't figured out an anti-case. > > You're right, this is useless, we can remove the 'if' line. I kept the > block though, to have a place for all those local variable declarations > (otherwise the code looks messier than it needs to). I'm confused the original patch was considering the trigger creation on the referenced side ONLY when it was partitioned, which was the point of this conditional block (as the comment said), no matter if the test was always true or not. Maybe this was a leftover of some refactoring… > I also noticed that addFkRecurseReferenced() is uselessly taking a List > **wqueue argument but doesn't use it, so I removed it (as fallout, other > routines don't need it either, especially DetachPartitionFinalize). I > added some commentary on the creation of dependencies in > addFkConstraint(). Good catch. Looking at this, on a side note, I realize now addFkRecurseReferenced() and addFkRecurseReferencing() are not exactly mirroring each other as the later is doing more than the former. Either this is fine, or maybe this is the sign something might need some refactoring as addFkRecurseReferencing() carry more than it should. I'm not sure it deserve more than a comment for futur work/study, if only this is justified. Regards,
Re: [BUG] Fix DETACH with FK pointing to a partitioned table fails
From
Jehan-Guillaume de Rorthais
Date:
On Tue, 22 Oct 2024 16:32:33 +0200 Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > On 2024-Oct-22, Jehan-Guillaume de Rorthais wrote: > > > On Fri, 18 Oct 2024 16:50:59 +0200 > > Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > > > For branch 14, I opted to make it delete the constraints on detach. > > > This isn't ideal but unless somebody wants to spend a lot more time on > > > this, it seems the best we can do. Leaving broken constraints around > > > seems worse. > > > > Keep that in mind, and move ahead to the next level: self-FK on partitioned > > table! ;-) > > > > https://www.postgresql.org/message-id/flat/20230707175859.17c91538%40karst#0dc7b8afd8b780899021bbb075598250 > > Yeah. I pushed these patches finally, thanks! Great! Many thanks Alvaro and Tender for your work on this issue.
Hello Alvaro, 22.10.2024 17:32, Alvaro Herrera wrote: > Yeah. I pushed these patches finally, thanks! Please look at a new anomaly introduced with 53af9491a. When running the following script: CREATE TABLE t (a int, b int, PRIMARY KEY (a, b)); CREATE TABLE pt (a int, b int, FOREIGN KEY (a, b) REFERENCES t(a, b)) PARTITION BY LIST (a); CREATE TABLE tp1 (x int, a int, b int); ALTER TABLE tp1 DROP COLUMN x; ALTER TABLE pt ATTACH PARTITION tp1 FOR VALUES IN (1); ALTER TABLE pt DETACH PARTITION tp1; I get a memory access error detected by Valgrind: 2024-10-24 12:05:04.645 UTC [1079077] LOG: statement: ALTER TABLE pt DETACH PARTITION tp1; ==00:00:00:07.887 1079077== Invalid read of size 2 ==00:00:00:07.887 1079077== at 0x4A61DD: DetachPartitionFinalize (tablecmds.c:19545) ==00:00:00:07.887 1079077== by 0x4A5C11: ATExecDetachPartition (tablecmds.c:19386) ==00:00:00:07.887 1079077== by 0x48561E: ATExecCmd (tablecmds.c:5540) ==00:00:00:07.887 1079077== by 0x4845DE: ATRewriteCatalogs (tablecmds.c:5203) ==00:00:00:07.887 1079077== by 0x4838EC: ATController (tablecmds.c:4758) ==00:00:00:07.887 1079077== by 0x4834F1: AlterTable (tablecmds.c:4404) ==00:00:00:07.887 1079077== by 0x7D6D52: ProcessUtilitySlow (utility.c:1318) ==00:00:00:07.887 1079077== by 0x7D65F7: standard_ProcessUtility (utility.c:1067) ==00:00:00:07.887 1079077== by 0x7D54F7: ProcessUtility (utility.c:523) ==00:00:00:07.887 1079077== by 0x7D3D70: PortalRunUtility (pquery.c:1158) ==00:00:00:07.887 1079077== by 0x7D3FE7: PortalRunMulti (pquery.c:1316) ==00:00:00:07.887 1079077== by 0x7D3431: PortalRun (pquery.c:791) Reproduced on REL_15_STABLE .. master. Best regards, Alexander
Alexander Lakhin <exclusion@gmail.com> 于2024年10月24日周四 22:00写道:
Hello Alvaro,
22.10.2024 17:32, Alvaro Herrera wrote:
> Yeah. I pushed these patches finally, thanks!
Please look at a new anomaly introduced with 53af9491a. When running the
following script:
CREATE TABLE t (a int, b int, PRIMARY KEY (a, b));
CREATE TABLE pt (a int, b int, FOREIGN KEY (a, b) REFERENCES t(a, b))
PARTITION BY LIST (a);
CREATE TABLE tp1 (x int, a int, b int);
ALTER TABLE tp1 DROP COLUMN x;
ALTER TABLE pt ATTACH PARTITION tp1 FOR VALUES IN (1);
ALTER TABLE pt DETACH PARTITION tp1;
I get a memory access error detected by Valgrind:
2024-10-24 12:05:04.645 UTC [1079077] LOG: statement: ALTER TABLE pt DETACH PARTITION tp1;
==00:00:00:07.887 1079077== Invalid read of size 2
==00:00:00:07.887 1079077== at 0x4A61DD: DetachPartitionFinalize (tablecmds.c:19545)
==00:00:00:07.887 1079077== by 0x4A5C11: ATExecDetachPartition (tablecmds.c:19386)
==00:00:00:07.887 1079077== by 0x48561E: ATExecCmd (tablecmds.c:5540)
==00:00:00:07.887 1079077== by 0x4845DE: ATRewriteCatalogs (tablecmds.c:5203)
==00:00:00:07.887 1079077== by 0x4838EC: ATController (tablecmds.c:4758)
==00:00:00:07.887 1079077== by 0x4834F1: AlterTable (tablecmds.c:4404)
==00:00:00:07.887 1079077== by 0x7D6D52: ProcessUtilitySlow (utility.c:1318)
==00:00:00:07.887 1079077== by 0x7D65F7: standard_ProcessUtility (utility.c:1067)
==00:00:00:07.887 1079077== by 0x7D54F7: ProcessUtility (utility.c:523)
==00:00:00:07.887 1079077== by 0x7D3D70: PortalRunUtility (pquery.c:1158)
==00:00:00:07.887 1079077== by 0x7D3FE7: PortalRunMulti (pquery.c:1316)
==00:00:00:07.887 1079077== by 0x7D3431: PortalRun (pquery.c:791)
Reproduced on REL_15_STABLE .. master.
Sorry, I can't reproduce this leak on master.
Thanks,
Tender Wang
On 2024-Oct-25, Tender Wang wrote: > Thanks for reporting. I can reproduce this memory invalid access on HEAD. > After debuging codes, I found the root cause. > In DetachPartitionFinalize(), below code: > att = TupleDescAttr(RelationGetDescr(partRel), > attmap->attnums[conkey[i] - 1] - 1); > > We should use confkey[i] -1 not conkey[i] - 1; Wow, how embarrasing, now that's one _really_ stupid bug and it's totally my own. Thanks for the analysis! I'll get this patched soon. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
Alvaro Herrera <alvherre@alvh.no-ip.org> 于2024年10月25日周五 16:30写道:
On 2024-Oct-25, Tender Wang wrote:
> Thanks for reporting. I can reproduce this memory invalid access on HEAD.
> After debuging codes, I found the root cause.
> In DetachPartitionFinalize(), below code:
> att = TupleDescAttr(RelationGetDescr(partRel),
> attmap->attnums[conkey[i] - 1] - 1);
>
> We should use confkey[i] -1 not conkey[i] - 1;
Wow, how embarrasing, now that's one _really_ stupid bug and it's
totally my own. Thanks for the analysis! I'll get this patched soon.
Hi,
When I debug codes, I find that the way to access AttrMap almost uses "attrmp_ptr->attnums[offset]."
The codes now usually don't check if the offset is out of bounds, which seems unsafe.
Can we wrap an access function? For example:
inline AttrNumber(attrmap_ptr, offset)
{
Assert(offset >= 0 && offset < attrmap_ptr->maplen);
return attrmap_ptr->attnums[offset];
}
Or a Macro.
I'm not sure if it's worth doing this.
Thanks,
Tender Wang
Hello Alvaro and Tender Wang, 24.10.2024 17:00, Alexander Lakhin wrote: > Please look at a new anomaly introduced with 53af9491a. When running the > following script: > CREATE TABLE t (a int, b int, PRIMARY KEY (a, b)); > CREATE TABLE pt (a int, b int, FOREIGN KEY (a, b) REFERENCES t(a, b)) > PARTITION BY LIST (a); > > CREATE TABLE tp1 (x int, a int, b int); > ALTER TABLE tp1 DROP COLUMN x; > > ALTER TABLE pt ATTACH PARTITION tp1 FOR VALUES IN (1); > > ALTER TABLE pt DETACH PARTITION tp1; I've also discovered another anomaly with a similar setup, but it's not related to 53af9491a. CREATE TABLE t (a int, PRIMARY KEY (a)); INSERT INTO t VALUES (1); CREATE TABLE pt (b int, a int) PARTITION BY RANGE (a); ALTER TABLE pt DROP COLUMN b; ALTER TABLE pt ADD FOREIGN KEY (a) REFERENCES t ON DELETE SET DEFAULT (a); CREATE TABLE tp1 PARTITION OF pt FOR VALUES FROM (1) TO (2); ALTER TABLE pt DETACH PARTITION tp1; DELETE FROM t; \d+ t This script ends up with: ERROR: invalid attribute number 2 ERROR: cache lookup failed for attribute 2 of relation 16398 (Perhaps it deserves a separate discussion.) Best regards, Alexander
On 2024-Oct-25, Alexander Lakhin wrote: > I've also discovered another anomaly with a similar setup, but it's not > related to 53af9491a. Hmm, it may well be a preexisting problem, but I do think it involves the same code. As far as I can tell, the value "2" here > This script ends up with: > ERROR: invalid attribute number 2 > ERROR: cache lookup failed for attribute 2 of relation 16398 is coming from riinfo->confdelsetcols which was set up by DetachPartitionFinalize during the last DETACH operation. > (Perhaps it deserves a separate discussion.) No need for that. A backtrace from the errfinish gives me this (apologies for the wide terminal): #0 errfinish (filename=0x5610d93e4510 "../../../../../pgsql/source/master/src/backend/parser/parse_relation.c", lineno=3618, funcname=0x5610d93e5518 <__func__.5> "attnumAttName") at ../../../../../../pgsql/source/master/src/backend/utils/error/elog.c:475 #1 0x00005610d8d68bcc in attnumAttName (rd=rd@entry=0x7f69dd0c1a90, attid=2) at ../../../../../pgsql/source/master/src/backend/parser/parse_relation.c:3618 #2 0x00005610d924a8c5 in ri_set (trigdata=0x7ffe38d924c0, is_set_null=<optimized out>, tgkind=<optimized out>) at ../../../../../../pgsql/source/master/src/backend/utils/adt/ri_triggers.c:1219 #3 0x00005610d8f897ea in ExecCallTriggerFunc (trigdata=trigdata@entry=0x7ffe38d924c0, tgindx=tgindx@entry=2, finfo=0x5611153f5768,finfo@entry=0x5611153f5708, instr=instr@entry=0x0, per_tuple_context=per_tuple_context@entry=0x561115471c30) at ../../../../../pgsql/source/master/src/backend/commands/trigger.c:2362 #4 0x00005610d8f8b646 in AfterTriggerExecute (trig_tuple_slot2=0x0, trig_tuple_slot1=0x0, per_tuple_context=0x561115471c30,instr=0x0, finfo=0x5611153f5708, trigdesc=0x5611153f53e8, dst_relInfo=<optimized out>, src_relInfo=<optimized out>, relInfo=0x5611153f51d8, event=0x56111546fd4c,estate=0x5611153f4d50) at ../../../../../pgsql/source/master/src/backend/commands/trigger.c:4498 #5 afterTriggerInvokeEvents (events=events@entry=0x5611154292a0, firing_id=1, estate=estate@entry=0x5611153f4d50, delete_ok=delete_ok@entry=false) at ../../../../../pgsql/source/master/src/backend/commands/trigger.c:4735 #6 0x00005610d8f8ff10 in AfterTriggerEndQuery (estate=estate@entry=0x5611153f4d50) at ../../../../../pgsql/source/master/src/backend/commands/trigger.c:5090 #7 0x00005610d8fb2457 in standard_ExecutorFinish (queryDesc=0x561115460810) at ../../../../../pgsql/source/master/src/backend/executor/execMain.c:442 #8 0x00005610d917df48 in ProcessQuery (plan=0x56111545ef50, sourceText=0x56111539cfb0 "DELETE FROM t;", params=0x0, queryEnv=0x0,dest=0x56111545f0b0, qc=0x7ffe38d92830) at ../../../../../pgsql/source/master/src/backend/tcop/pquery.c:193 and then (gdb) frame 2 #2 0x00005610d924a8c5 in ri_set (trigdata=0x7ffe38d924c0, is_set_null=<optimized out>, tgkind=<optimized out>) at ../../../../../../pgsql/source/master/src/backend/utils/adt/ri_triggers.c:1219 1219 quoteOneName(attname, RIAttName(fk_rel, set_cols[i])); (gdb) print *riinfo $5 = {constraint_id = 16400, valid = true, constraint_root_id = 16400, oidHashValue = 3001019032, rootHashValue = 3001019032,conname = { data = "pt_a_fkey", '\000' <repeats 54 times>}, pk_relid = 16384, fk_relid = 16397, confupdtype = 97 'a', confdeltype= 100 'd', ndelsetcols = 1, confdelsetcols = {2, 0 <repeats 31 times>}, confmatchtype = 115 's', hasperiod = false, nkeys = 1, pk_attnums = {1, 0 <repeats31 times>}, fk_attnums = {1, 0 <repeats 31 times>}, pf_eq_oprs = {96, 0 <repeats 31 times>}, pp_eq_oprs = {96, 0 <repeats 31 times>}, ff_eq_oprs ={96, 0 <repeats 31 times>}, period_contained_by_oper = 0, agged_period_contained_by_oper = 0, valid_link = {prev = 0x5611153f4c30, next = 0x5610d96b51b0<ri_constraint_cache_valid_list>}} -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "Digital and video cameras have this adjustment and film cameras don't for the same reason dogs and cats lick themselves: because they can." (Ken Rockwell)
On 2024-Oct-25, Tender Wang wrote: > When I debug codes, I find that the way to access AttrMap almost uses > "attrmp_ptr->attnums[offset]." > The codes now usually don't check if the offset is out of bounds, which > seems unsafe. > Can we wrap an access function? For example: > inline AttrNumber(attrmap_ptr, offset) > { > Assert(offset >= 0 && offset < attrmap_ptr->maplen); > return attrmap_ptr->attnums[offset]; > } I don't see any reason not to do this, though it's not directly related to the bugs in this thread. I encourage you to submit a patch, opening a new thread. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
Alvaro Herrera <alvherre@alvh.no-ip.org> 于2024年10月27日周日 05:47写道:
On 2024-Oct-25, Alvaro Herrera wrote:
> On 2024-Oct-25, Tender Wang wrote:
>
> > Thanks for reporting. I can reproduce this memory invalid access on HEAD.
> > After debuging codes, I found the root cause.
> > In DetachPartitionFinalize(), below code:
> > att = TupleDescAttr(RelationGetDescr(partRel),
> > attmap->attnums[conkey[i] - 1] - 1);
> >
> > We should use confkey[i] -1 not conkey[i] - 1;
>
> Wow, how embarrasing, now that's one _really_ stupid bug and it's
> totally my own. Thanks for the analysis! I'll get this patched soon.
Actually, now that I look at this again, I think this proposed fix is
wrong; conkey/confkey confusion is not what the problem is. Rather, the
problem is that we're applying a tuple conversion map when none should
be applied. So the fix is to remove "attmap" altogether. One thing
that didn't appear correct to me was that the patch was changing one
constraint name so that it appeared that the constrained columns were
"id, p_id" but that didn't make sense: they are "p_id, p_jd" instead.
Yeah, The constrained name change made me think about if the patch is correct.
After your explanation, I have understood it.
Then I realized that you're wrong that it's the referenced side that
we're processing: what we're doing there is generate the fk_attrs list,
which is the list of constrained columns (not the list of referenced
columns, which is pk_attrs).
I also felt like modifying the fkpart12 test rather than adding a
separate fkpart13, so I did that.
So here's a v2 for this. (Commit message needs love still, but it's a
bit late here.)
The v2 LGTM.
BTW, while reviewing the v2 patch, I found the parentConTup in foreach(cell, fks) block
didn't need it anymore. We can remove the related codes.
Thanks,
Tender Wang
Alvaro Herrera <alvherre@alvh.no-ip.org> 于2024年10月25日周五 23:14写道:
On 2024-Oct-25, Alexander Lakhin wrote:
> I've also discovered another anomaly with a similar setup, but it's not
> related to 53af9491a.
Hmm, it may well be a preexisting problem, but I do think it involves
the same code. As far as I can tell, the value "2" here
> This script ends up with:
> ERROR: invalid attribute number 2
> ERROR: cache lookup failed for attribute 2 of relation 16398
is coming from riinfo->confdelsetcols which was set up by
DetachPartitionFinalize during the last DETACH operation.
Hmm, actually, the confdelsetcols before detach and after detach is always {2}, as below:
oid | conname | conrelid | conparentid | confdelsetcols
-------+-----------+----------+-------------+----------------
16400 | pt_a_fkey | 16397 | 16392 | {2}
(1 row)
postgres=# ALTER TABLE pt DETACH PARTITION tp1;
ALTER TABLE
postgres=# select oid, conname, conrelid,conparentid,confdelsetcols from pg_constraint where conrelid = 16397;
oid | conname | conrelid | conparentid | confdelsetcols
-------+-----------+----------+-------------+----------------
16400 | pt_a_fkey | 16397 | 0 | {2}
(1 row)
Even though no detach, the confdelsetcols is {2} . But no error report. Because the rel->rd_att->natts of pt is 2.
It will not go into tp1 because tp1 is a partition of pt. But after detach, the rel->rd_att->natts of tp1 is 1,
so "ERROR: invalid attribute number 2" will report.
CREATE TABLE tp1... will ignore the dropped column of parent, so the natts of tp1 is 1, but its parent is 2.
Thanks,
Tender Wang
Alvaro Herrera <alvherre@alvh.no-ip.org> 于2024年10月28日周一 17:16写道:
On 2024-Oct-27, Tender Wang wrote:
> BTW, while reviewing the v2 patch, I found the parentConTup in
> foreach(cell, fks) block
> didn't need it anymore. We can remove the related codes.
True. Done so in this v3.
I noticed another problem here: we're grabbing the wrong lock type on
the referenced rel (AccessShareLock) during detach. (What's more: we
release it afterwards, which is the wrong thing to do. We need to keep
such locks until end of transaction). I didn't try to construct a case
where this would be a problem, but if I change AccessShare to NoLock,
the assertion that says we don't hold _any_ lock on that relation fires,
which means that we're not taking any locks on those rels before this
point. So this lock strength choice is definitely wrong. I changed it
to ShareRowExclusive, which is what we're suppose to use when adding a
trigger.
In CloneFKReferencing(), the constrForm->confrelid uses the same lock type.
I think you're right. I don't find any other problem.
Thanks,
Tender Wang
I'm trying to write release notes for commits 53af9491a et al, and it seems to me that we need to explain how to get out of the mess that would be left behind by the old DETACH code. There's no hint about that in the commit message :-( Clearly, if you have now-inconsistent data, there's little help for that but to manually fix the inconsistencies. What I am worried about is how to get to a state where you have correct catalog entries for the constraint. Will ALTER TABLE DROP CONSTRAINT on the now stand-alone table work to clean out the old catalog entries for the constraint? I'm worried that it will either fail, or go through but remove triggers on the referenced table that we still need for the original partitioned table. If that doesn't work I think we had better create a recipe for manually removing the detritus. Once the old entries are gone it should be possible to do ALTER TABLE ADD CONSTRAINT (with an updated server), and that would validate your data. It's the DROP CONSTRAINT part that worries me. regards, tom lane
On 2024-Nov-05, Tom Lane wrote: > I'm trying to write release notes for commits 53af9491a et al, > and it seems to me that we need to explain how to get out of > the mess that would be left behind by the old DETACH code. > There's no hint about that in the commit message :-( > Clearly, if you have now-inconsistent data, there's little > help for that but to manually fix the inconsistencies. > What I am worried about is how to get to a state where you > have correct catalog entries for the constraint. > > Will ALTER TABLE DROP CONSTRAINT on the now stand-alone table > work to clean out the old catalog entries for the constraint? Yes -- as far as I can tell, a DROP CONSTRAINT of the offending constraint is successful and leaves no unwanted detritus. Perhaps one more task for me is to figure out a way to get a list of all the constraints that are broken because of this ... let me see if I can figure that out. > I'm worried that it will either fail, or go through but remove > triggers on the referenced table that we still need for the > original partitioned table. If that doesn't work I think we had > better create a recipe for manually removing the detritus. As as far as I can see, it works and no triggers are spuriously removed. > Once the old entries are gone it should be possible to do ALTER TABLE > ADD CONSTRAINT (with an updated server), and that would validate > your data. It's the DROP CONSTRAINT part that worries me. Yeah, that's correct: adding the constraint again after removing its broken self detects that there are values violating the RI. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ Thou shalt check the array bounds of all strings (indeed, all arrays), for surely where thou typest "foo" someone someday shall type "supercalifragilisticexpialidocious" (5th Commandment for C programmers)
On 2024-Nov-06, Alvaro Herrera wrote: > Perhaps one more task for me is to figure out a way to get a list of all > the constraints that are broken because of this ... let me see if I can > figure that out. It's gotta be something like this, SELECT conrelid::regclass AS "constrained table", conname as constraint, confrelid::regclass AS "references" FROM pg_constraint WHERE contype = 'f' and conparentid = 0 AND (SELECT count(*) FROM pg_constraint p2 WHERE conparentid = pg_constraint.oid) <> (SELECT count(*) FROM pg_inherits WHERE inhparent = pg_constraint.conrelid OR inhparent = pg_constraint.confrelid); Essentially, top-level constraints should have as many children constraint as direct partitions each partitioned table has. Ideally here you'd get an empty set, but you won't if the DETACH problem has occurred. I'll test this further later or maybe tomorrow as time allows. A quick test rig for this is: create table pk (a int primary key) partition by list (a); create table pk1 partition of pk for values in (1); create table pk2367 partition of pk for values in (2, 3, 6, 7) partition by list (a); create table pk67 partition of pk2367 for values in (6, 7) partition by list (a); create table pk2 partition of pk2367 for values in (2); create table pk3 partition of pk2367 for values in (3); create table pk6 partition of pk67 for values in (6); create table pk7 partition of pk67 for values in (7); create table pk45 partition of pk for values in (4, 5) partition by list (a); create table pk4 partition of pk45 for values in (4); create table pk5 partition of pk45 for values in (5); create table fk (a int references pk) partition by list (a); create table fk1 partition of fk for values in (1); create table fk2367 partition of fk for values in (2, 3, 6, 7) partition by list (a); create table fk67 partition of fk2367 for values in (6, 7) partition by list (a); create table fk2 partition of fk2367 for values in (2); create table fk3 partition of fk2367 for values in (3); create table fk6 partition of fk67 for values in (6); create table fk7 partition of fk67 for values in (7); create table fk45 partition of fk for values in (4, 5) partition by list (a); create table fk4 partition of fk45 for values in (4); create table fk5 partition of fk45 for values in (5); alter table fk detach partition fk2367; Before the fix, you get constrained table │ constraint │ references ───────────────────┼────────────┼──────────── fk2367 │ fk_a_fkey │ pk (1 fila) which means you need to ALTER TABLE fk2367 DROP CONSTRAINT fk_a_fkey; and then put it back. Maybe it'd be better to have the query emit the commands to drop and reconstruct the FK? -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: >> Perhaps one more task for me is to figure out a way to get a list of all >> the constraints that are broken because of this ... let me see if I can >> figure that out. > It's gotta be something like this, > SELECT conrelid::regclass AS "constrained table", > conname as constraint, confrelid::regclass AS "references" > FROM pg_constraint > WHERE contype = 'f' and conparentid = 0 AND > (SELECT count(*) FROM pg_constraint p2 WHERE conparentid = pg_constraint.oid) <> > (SELECT count(*) > FROM pg_inherits > WHERE inhparent = pg_constraint.conrelid OR inhparent = pg_constraint.confrelid); Hmm ... interestingly, if I run this in HEAD's regression database, I get constrained table | constraint | references -------------------+---------------+------------- clstr_tst | clstr_tst_con | clstr_tst_s (1 row) Digging a bit deeper, the sub-select for conparentid finds no rows, but the sub-select on pg_inherits finds regression=# SELECT inhrelid::regclass, inhparent::regclass, inhseqno,inhdetachpending from pg_inherits WHERE inhparent ='clstr_tst'::regclass or inhparent = 'clstr_tst_s'::regclass; inhrelid | inhparent | inhseqno | inhdetachpending ---------------+-----------+----------+------------------ clstr_tst_inh | clstr_tst | 1 | f (1 row) So it looks like this query needs a guard to make it ignore constraints on traditional-inheritance tables. regards, tom lane
On 2024-Nov-08, Tom Lane wrote: > Hmm ... interestingly, if I run this in HEAD's regression database, > I get > > constrained table | constraint | references > -------------------+---------------+------------- > clstr_tst | clstr_tst_con | clstr_tst_s > (1 row) Eeek. > So it looks like this query needs a guard to make it ignore > constraints on traditional-inheritance tables. Hmm, looks tricky, the only thing I found was to only consider rows in pg_inherit if there's a corresponding one in pg_partitioned_table. This should do it. I added the DROP/ADD commands. I also added some pg_catalog schema quals, though that may be kinda useless. Anyway, this reports empty in the regression database. SELECT conrelid::pg_catalog.regclass AS "constrained table", conname AS constraint, confrelid::pg_catalog.regclass AS "references", pg_catalog.format('ALTER TABLE %s DROP CONSTRAINT %I;', conrelid::regclass, conname), pg_catalog.format('ALTER TABLE %s ADD CONSTRAINT %I %s;', conrelid::regclass, conname, pg_catalog.pg_get_constraintdef(oid)) FROM pg_catalog.pg_constraint WHERE contype = 'f' and conparentid = 0 AND (SELECT count(*) FROM pg_catalog.pg_constraint p2 WHERE conparentid = pg_constraint.oid) <> (SELECT count(*) FROM pg_catalog.pg_inherits WHERE EXISTS (SELECT 1 FROM pg_catalog.pg_partitioned_table WHERE partrelid = inhparent) AND inhparent = pg_constraint.conrelid OR inhparent = pg_constraint.confrelid) ; I would have loved to be able to add the constraint as NOT VALID followed by a separate VALIDATE command, because if there are any RI violations, the constraint would now be in place to prevent future ones. However, =# ALTER TABLE fk2367 ADD CONSTRAINT fk_a_fkey FOREIGN KEY (a) REFERENCES pk(a) NOT VALID; ERROR: cannot add NOT VALID foreign key on partitioned table "fk2367" referencing relation "pk" DETAIL: This feature is not yet supported on partitioned tables. So it looks like we should suggest to save the output of the query, execute each DROP followed by each ADD, and if the latter fails, fix the violations and retry the ADD. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "Java is clearly an example of money oriented programming" (A. Stepanov)
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > So it looks like we should suggest to save the output of the query, > execute each DROP followed by each ADD, and if the latter fails, fix the > violations and retry the ADD. Right. I'll make it so -- thanks for doing the legwork on creating the query! regards, tom lane