Thread: [PATCH] Partial foreign key updates in referential integrity triggers
Hey, hackers! I've created a patch to better support referential integrity constraints when using composite primary and foreign keys. This patch allows creating a foreign key using the syntax: FOREIGN KEY (tenant_id, fk_id) REFERENCES fktable ON DELETE SET NULL (fk_id) which means that only the fk_id column will be set to NULL when the referenced row is deleted, rather than both the tenant_id and fk_id columns. In multi-tenant applications, it is common to denormalize a "tenant_id" column across every table, and use composite primary keys of the form (tenant_id, id) and composite foreign keys of the form (tenant_id, fk_id), reusing the tenant_id column in both the primary and foreign key. This is often done initially for performance reasons, but has the added benefit of making it impossible for data from one tenant to reference data from another tenant, also making this a good decision from a security perspective. Unfortunately, one downside of using composite foreign keys in such a matter is that commonly used referential actions, such as ON DELETE SET NULL, no longer work, because Postgres tries to set all of the referencing columns to NULL, including the columns that overlap with the primary key: CREATE TABLE tenants (id serial PRIMARY KEY); CREATE TABLE users ( tenant_id int REFERENCES tenants ON DELETE CASCADE, id serial, PRIMARY KEY (tenant_id, id), ); CREATE TABLE posts ( tenant_id int REFERENCES tenants ON DELETE CASCADE, id serial, author_id int, PRIMARY KEY (tenant_id, id), FOREIGN KEY (tenant_id, author_id) REFERENCES users ON DELETE SET NULL ); INSERT INTO tenants VALUES (1); INSERT INTO users VALUES (1, 101); INSERT INTO posts VALUES (1, 201, 101); DELETE FROM users WHERE id = 101; ERROR: null value in column "tenant_id" violates not-null constraint DETAIL: Failing row contains (null, 201, null). This can be resolved by manually creating triggers on the referenced table, but this is clunky and adds a significant amount of extra work when adding (or removing!) foreign keys. Users shouldn't have to compromise on maintenance overhead when using composite foreign keys. I have implemented a simple extension to the syntax for foreign keys that makes it just as easy to support referential integrity constraints for composite foreign keys as it is for single column foreign keys. The SET NULL and SET DEFAULT referential actions can now be optionally followed by a column list: key_action: | NO ACTION | RESTRICT | CASCADE | SET NULL [ (column_name [, ...] ) ] | SET DEFAULT [ (column_name [, ...] ) ] When a column list is provided, only the specified columns, which must be a subset of the referencing columns, will be updated in the associated trigger. Note that use of SET NULL (col1, ...) on a composite foreign key with MATCH FULL specified will still raise an error. In such a scenario we could raise an error when the user tries to define the foreign key, but we don't raise a similar error when a user tries to use SET NULL on a non-nullable column, so I don't think this is critical. (I haven't added this check in my patch.) While this feature would mostly be used with the default MATCH SIMPLE, I could imagine using SET DEFAULT (col, ...) even for MATCH FULL foreign keys. To store this additional data, I've added two columns to the pg_constraint catalog entry: confupdsetcols int2[] confdelsetcols int2[] These columns store the attribute numbers of the columns to update in the ON UPDATE and ON DELETE triggers respectively. If the arrays are empty, then all of the referencing columns should be updated. I previously proposed this feature about a year ago [1], but I don't feel like the arguments against it were very strong. Wanting to get more familiar with the Postgres codebase I decided to implement the feature over this holiday break, and I've gotten everything working and put together a complete patch including tests and updates to documentation. Hopefully if people find it useful it can make its way into the next commitfest! Visual diff: https://github.com/postgres/postgres/compare/master...PaulJuliusMartinez:on-upd-del-set-cols Here's a rough outline of the changes: src/backend/parser/gram.y | 122 src/include/nodes/parsenodes.h | 3 src/backend/nodes/copyfuncs.c | 2 src/backend/nodes/equalfuncs.c | 2 src/backend/nodes/outfuncs.c | 47 - Modify grammar to add opt_column_list after SET NULL and SET DEFAULT - Add fk_{upd,del}_set_cols fields to Constraint struct - Add proper node support, as well as outfuncs support for AlterTableStmt, which I used while debugging src/include/catalog/pg_constraint.h | 20 src/backend/catalog/pg_constraint.c | 80 src/include/catalog/catversion.h | 2 - Add confupdsetcols and confdelsetcols to pg_constraint catalog entry src/backend/commands/tablecmds.c | 142 - Pass along data from parsed Constraint node to CreateConstraintEntry - Handle propagating constraints for partitioned tables src/backend/utils/adt/ri_triggers.c | 109 - Update construction of trigger query to only update specified columns - Update caching mechanism since ON UPDATE and ON DELETE may modify different sets of columns src/backend/utils/adt/ruleutils.c | 29 - Update pg_get_constraintdef to handle new syntax - This automatically updates psql \d output as well as pg_dump src/backend/catalog/heap.c | 4 src/backend/catalog/index.c | 4 src/backend/commands/trigger.c | 4 src/backend/commands/typecmds.c | 4 src/backend/utils/cache/relcache.c | 2 - Update misc. call sites for updated functions src/test/regress/sql/foreign_key.sql | 63 src/test/regress/expected/foreign_key.out | 88 - Regression tests with checks for error cases and partitioned tables doc/src/sgml/catalogs.sgml | 24 doc/src/sgml/ref/create_table.sgml | 15 - Updated documentation for CREATE TABLE and pg_constraint catalog definition 20 files changed, 700 insertions(+), 66 deletions(-) Thanks, Paul [1] https://www.postgresql.org/message-id/flat/CAF%2B2_SGRXQOtumethpuXhsyU%2B4AYzfKA5fhHCjCjH%2BjQ04WWjA%40mail.gmail.com
Attachment
On 1/5/21 4:40 PM, Paul Martinez wrote: > > I've created a patch to better support referential integrity constraints when > using composite primary and foreign keys. This patch allows creating a foreign > key using the syntax: <snip> > I previously proposed this feature about a year ago [1], but I don't feel like > the arguments against it were very strong. Perhaps not, but Tom certainly didn't seem in favor of this feature, at the least. Since the patch has not attracted any review/comment I propose to move it to the next CF since it doesn't seem a likely candidate for PG14. I'll do that on March 23 unless there are arguments to the contrary. Regards, -- -David david@pgmasters.net
On 3/18/21 9:52 AM, David Steele wrote: > On 1/5/21 4:40 PM, Paul Martinez wrote: >> >> I've created a patch to better support referential integrity >> constraints when >> using composite primary and foreign keys. This patch allows creating a >> foreign >> key using the syntax: > > <snip> > >> I previously proposed this feature about a year ago [1], but I don't >> feel like >> the arguments against it were very strong. > > Perhaps not, but Tom certainly didn't seem in favor of this feature, at > the least. > > Since the patch has not attracted any review/comment I propose to move > it to the next CF since it doesn't seem a likely candidate for PG14. > > I'll do that on March 23 unless there are arguments to the contrary. This entry has been moved to the 2021-07 CF with status Needs Review. Regards, -- -David david@pgmasters.net
Re: [PATCH] Partial foreign key updates in referential integrity triggers
From
Peter Eisentraut
Date:
On 05.01.21 22:40, Paul Martinez wrote: > I've created a patch to better support referential integrity constraints when > using composite primary and foreign keys. This patch allows creating a foreign > key using the syntax: > > FOREIGN KEY (tenant_id, fk_id) REFERENCES fktable ON DELETE SET NULL (fk_id) > > which means that only the fk_id column will be set to NULL when the referenced > row is deleted, rather than both the tenant_id and fk_id columns. I think this is an interesting feature with a legitimate use case. I'm wondering a bit about what the ON UPDATE side of this is supposed to mean. Consider your example: > CREATE TABLE tenants (id serial PRIMARY KEY); > CREATE TABLE users ( > tenant_id int REFERENCES tenants ON DELETE CASCADE, > id serial, > PRIMARY KEY (tenant_id, id), > ); > CREATE TABLE posts ( > tenant_id int REFERENCES tenants ON DELETE CASCADE, > id serial, > author_id int, > PRIMARY KEY (tenant_id, id), > FOREIGN KEY (tenant_id, author_id) REFERENCES users ON DELETE SET NULL > ); > > INSERT INTO tenants VALUES (1); > INSERT INTO users VALUES (1, 101); > INSERT INTO posts VALUES (1, 201, 101); > DELETE FROM users WHERE id = 101; > ERROR: null value in column "tenant_id" violates not-null constraint > DETAIL: Failing row contains (null, 201, null). Consider what should happen when you update users.id. Per SQL standard, for MATCH SIMPLE an ON UPDATE SET NULL should only set to null the referencing column that corresponds to the referenced column actually updated, not all of them. PostgreSQL doesn't do this, but if it did, then this would work just fine. Your feature requires specifying a fixed column or columns to update, so it cannot react differently to what column actually updated. In fact, you might want different referential actions depending on what columns are updated, like what you can do with general triggers. So, unless I'm missing an angle here, I would suggest leaving out the ON UPDATE variant of this feature.
On Wed, Jul 14, 2021 at 6:51 PM Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:
On 05.01.21 22:40, Paul Martinez wrote:
> I've created a patch to better support referential integrity constraints when
> using composite primary and foreign keys. This patch allows creating a foreign
> key using the syntax:
>
> FOREIGN KEY (tenant_id, fk_id) REFERENCES fktable ON DELETE SET NULL (fk_id)
>
> which means that only the fk_id column will be set to NULL when the referenced
> row is deleted, rather than both the tenant_id and fk_id columns.
I think this is an interesting feature with a legitimate use case.
I'm wondering a bit about what the ON UPDATE side of this is supposed to
mean. Consider your example:
> CREATE TABLE tenants (id serial PRIMARY KEY);
> CREATE TABLE users (
> tenant_id int REFERENCES tenants ON DELETE CASCADE,
> id serial,
> PRIMARY KEY (tenant_id, id),
> );
> CREATE TABLE posts (
> tenant_id int REFERENCES tenants ON DELETE CASCADE,
> id serial,
> author_id int,
> PRIMARY KEY (tenant_id, id),
> FOREIGN KEY (tenant_id, author_id) REFERENCES users ON DELETE SET NULL
> );
>
> INSERT INTO tenants VALUES (1);
> INSERT INTO users VALUES (1, 101);
> INSERT INTO posts VALUES (1, 201, 101);
> DELETE FROM users WHERE id = 101;
> ERROR: null value in column "tenant_id" violates not-null constraint
> DETAIL: Failing row contains (null, 201, null).
Consider what should happen when you update users.id. Per SQL standard,
for MATCH SIMPLE an ON UPDATE SET NULL should only set to null the
referencing column that corresponds to the referenced column actually
updated, not all of them. PostgreSQL doesn't do this, but if it did,
then this would work just fine.
Your feature requires specifying a fixed column or columns to update, so
it cannot react differently to what column actually updated. In fact,
you might want different referential actions depending on what columns
are updated, like what you can do with general triggers.
So, unless I'm missing an angle here, I would suggest leaving out the ON
UPDATE variant of this feature.
On Wed, Jul 14, 2021 at 6:51 AM Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote: > I think this is an interesting feature with a legitimate use case. Great! I'm glad to hear that. > Consider what should happen when you update users.id. Per SQL standard, > for MATCH SIMPLE an ON UPDATE SET NULL should only set to null the > referencing column that corresponds to the referenced column actually > updated, not all of them. PostgreSQL doesn't do this, but if it did, > then this would work just fine. > > ... > > So, unless I'm missing an angle here, I would suggest leaving out the ON > UPDATE variant of this feature. I understand what you're saying here, but are you sure that is the behavior specified in the SQL standard? I can't find a copy of a more recent standard, but at least in SQL 1999 [1], it has this to say (page 459 of the linked PDF, page 433 of the standard): > 8) If a non-null value of a referenced column in the referenced table is > updated to a value that is distinct from the current value of that > column, then for every member F of the subtable family of the > referencing table: > > Case: > a) If SIMPLE is specified or implicit, or if FULL is specified, then > > Case: > ii) If the <update rule> specifies SET NULL, then > > Case: > 1) If SIMPLE is specified or implicit, then: > > A) <snipped stuff about triggers> > > B) For every F, in every matching row in F, each referencing > column in F that corresponds with a referenced column is set to > the null value. This phrasing doesn't seem to make any reference to which columns were actually updated. The definitions a few pages earlier do explicitly define the set of updated columns ("Let UMC be the set of referencing columns that correspond with updated referenced columns"), but that defined set is only referenced in the behavior when PARTIAL is specified, implying that the set of updated columns is _not_ relevant in the SIMPLE case. If my understanding of this is correct, then Postgres _isn't_ out of spec, and this extension still works fine for the ON UPDATE triggers. (But, wow, this spec is not easy to read, so I could definitely be wrong.) I'm not sure how MATCH PARTIAL fits into this; I know it's unimplemented, but I don't know what its use cases are, and I don't think I understand how ON DELETE / ON UPDATE should work with MATCH PARTIAL, let alone how they would work combined with this extension. This lack of clarity may be a good-enough reason to leave out the ON UPDATE case. On Wed, Jul 14, 2021 at 12:36 PM Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote: > Patch does not apply on head, I am marking the status "Waiting on author" > http://cfbot.cputube.org/patch_33_2932.log I've rebased my original patch and it should work now. This is referential-actions-set-cols-v2.patch. I've also created a second patch that leaves out the ON UPDATE behavior, if we think that's the safer route. This is referential-actions-on-delete-only-set-cols-v1.patch. [1]: http://web.cecs.pdx.edu/~len/sql1999.pdf On Wed, Jul 14, 2021 at 12:36 PM Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote: > > > > On Wed, Jul 14, 2021 at 6:51 PM Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote: >> >> >> On 05.01.21 22:40, Paul Martinez wrote: >> > I've created a patch to better support referential integrity constraints when >> > using composite primary and foreign keys. This patch allows creating a foreign >> > key using the syntax: >> > >> > FOREIGN KEY (tenant_id, fk_id) REFERENCES fktable ON DELETE SET NULL (fk_id) >> > >> > which means that only the fk_id column will be set to NULL when the referenced >> > row is deleted, rather than both the tenant_id and fk_id columns. >> >> I think this is an interesting feature with a legitimate use case. >> >> I'm wondering a bit about what the ON UPDATE side of this is supposed to >> mean. Consider your example: >> >> > CREATE TABLE tenants (id serial PRIMARY KEY); >> > CREATE TABLE users ( >> > tenant_id int REFERENCES tenants ON DELETE CASCADE, >> > id serial, >> > PRIMARY KEY (tenant_id, id), >> > ); >> > CREATE TABLE posts ( >> > tenant_id int REFERENCES tenants ON DELETE CASCADE, >> > id serial, >> > author_id int, >> > PRIMARY KEY (tenant_id, id), >> > FOREIGN KEY (tenant_id, author_id) REFERENCES users ON DELETE SET NULL >> > ); >> > >> > INSERT INTO tenants VALUES (1); >> > INSERT INTO users VALUES (1, 101); >> > INSERT INTO posts VALUES (1, 201, 101); >> > DELETE FROM users WHERE id = 101; >> > ERROR: null value in column "tenant_id" violates not-null constraint >> > DETAIL: Failing row contains (null, 201, null). >> >> Consider what should happen when you update users.id. Per SQL standard, >> for MATCH SIMPLE an ON UPDATE SET NULL should only set to null the >> referencing column that corresponds to the referenced column actually >> updated, not all of them. PostgreSQL doesn't do this, but if it did, >> then this would work just fine. >> >> Your feature requires specifying a fixed column or columns to update, so >> it cannot react differently to what column actually updated. In fact, >> you might want different referential actions depending on what columns >> are updated, like what you can do with general triggers. >> >> So, unless I'm missing an angle here, I would suggest leaving out the ON >> UPDATE variant of this feature. >> >> > > Patch does not apply on head, I am marking the status "Waiting on author" > http://cfbot.cputube.org/patch_33_2932.log > > -- > Ibrar Ahmed
Attachment
Re: [PATCH] Partial foreign key updates in referential integrity triggers
From
Daniel Gustafsson
Date:
> On 4 Aug 2021, at 23:49, Paul Martinez <hellopfm@gmail.com> wrote: > I've rebased my original patch and it should work now. This patch no longer applies, can you please submit a rebased version? It currently fails on catversion.h, to keep that from happening repeatedly you can IMO skip that from the patch submission. -- Daniel Gustafsson https://vmware.com/
On Wed, Sep 1, 2021 at 4:11 AM Daniel Gustafsson <daniel@yesql.se> wrote: > This patch no longer applies, can you please submit a rebased version? It > currently fails on catversion.h, to keep that from happening repeatedly you can > IMO skip that from the patch submission. Ah, understood. Will do that in the future. Attached are rebased patches, not including catversion.h changes, for both the ON UPDATE/DELETE case, and the ON DELETE only case. - Paul
Attachment
On Thu, Sep 2, 2021 at 12:11 PM Paul Martinez <hellopfm@gmail.com> wrote:
On Wed, Sep 1, 2021 at 4:11 AM Daniel Gustafsson <daniel@yesql.se> wrote:
> This patch no longer applies, can you please submit a rebased version? It
> currently fails on catversion.h, to keep that from happening repeatedly you can
> IMO skip that from the patch submission.
Ah, understood. Will do that in the future. Attached are rebased patches, not
including catversion.h changes, for both the ON UPDATE/DELETE case, and the
ON DELETE only case.
- Paul
Hi,
+ case RI_TRIGTYPE_DELETE:+ queryno = is_set_null
+ ? RI_PLAN_ONDELETE_SETNULL_DOUPDATE
+ : RI_PLAN_ONDELETE_SETDEFAULT_DOUPDATE;
Should the new symbols be renamed ?
RI_PLAN_ONDELETE_SETNULL_DOUPDATE -> RI_PLAN_ONDELETE_SETNULL_DODELETE
RI_PLAN_ONDELETE_SETDEFAULT_DOUPDATE -> RI_PLAN_ONDELETE_SETDEFAULT_DODELETE
Cheers
On Thu, Sep 2, 2021 at 1:55 PM Zhihong Yu <zyu@yugabyte.com> wrote: > > Hi, > + case RI_TRIGTYPE_DELETE: > + queryno = is_set_null > + ? RI_PLAN_ONDELETE_SETNULL_DOUPDATE > + : RI_PLAN_ONDELETE_SETDEFAULT_DOUPDATE; > > Should the new symbols be renamed ? > > RI_PLAN_ONDELETE_SETNULL_DOUPDATE -> RI_PLAN_ONDELETE_SETNULL_DODELETE > RI_PLAN_ONDELETE_SETDEFAULT_DOUPDATE -> RI_PLAN_ONDELETE_SETDEFAULT_DODELETE These constants are named correctly -- they follow the format: RI_PLAN_<trigger>_<action>_<what_saved_plan_does> These symbols refer to plans that are used for ON DELETE SET NULL and ON DELETE SET DEFAULT triggers, which update rows in the referencing table ("_DOUPDATE"). These triggers do not perform any deletions. But these names are definitely confusing, and I did have to spend some time confirming that the names were correct. I decided to rename these, as well as the other plan keys, so they all use the same more explicit format: RI_PLAN_<trigger>_<action> RI_PLAN_CASCADE_DEL_DODELETE => RI_PLAN_ONDELETE_CASCADE RI_PLAN_CASCADE_UPD_DOUPDATE => RI_PLAN_ONUPDATE_CASCADE RI_PLAN_RESTRICT_CHECKREF => RI_PLAN_ONTRIGGER_RESTRICT RI_PLAN_SETNULL_DOUPDATE => RI_PLAN_ONDELETE_SETNULL and RI_PLAN_ONUPDATE_SETNULL RI_PLAN_SETDEFAULT_DOUPDATE => RI_PLAN_ONDELETE_SETDEFAULT and RI_PLAN_ONUPDATE_SETDEFAULT The same plan can be used for both ON DELETE RESTRICT and ON UPDATE RESTRICT, so we just use ONTRIGGER there. Previously, the same plan could also be used for both ON DELETE SET NULL and ON UPDATE SET NULL, or both ON DELETE SET DEFAULT and ON UPDATE SET DEFAULT. This is no longer the case, so we need to add separate keys for each case. As an example, a constraint on a table foo could specify: FOREIGN KEY (a, b) REFERENCES bar (a, b) ON UPDATE SET NULL ON DELETE SET NULL (a) In this case for the update trigger we want to do: UPDATE foo SET a = NULL, B = NULL WHERE ... but for the delete trigger we want to do: UPDATE foo SET a = NULL WHERE ... so the plans cannot be shared. (Note that we still need separate plans even if we only support specifying a column subset for the ON DELETE trigger. As in the above example, the ON UPDATE trigger will always set all the columns, while the ON DELETE trigger could only set a subset.) - Paul
Attachment
Re: [PATCH] Partial foreign key updates in referential integrity triggers
From
Peter Eisentraut
Date:
I have reviewed your patch referential-actions-on-delete-only-set-cols-v3.patch. Attached are two patches that go on top of yours that contain my recommended changes. Basically, this all looks pretty good to me. My changes are mostly stylistic. Some notes of substance: - The omission of the referential actions details from the CREATE TABLE reference page surprised me. I have committed that separately (without the column support, of course). So you should rebase your patch on top of that. Note that ALTER TABLE would now also need to be updated by your patch. - I recommend setting pg_constraint.confdelsetcols to null for the default behavior of setting all columns, instead of an empty array the way you have done it. I have noted the places in the code that need to be changed for that. - The outfuncs.c support shouldn't be included in the final patch. There is nothing wrong it, but I don't think it should be part of this patch to add piecemeal support like that. I have included a few changes there anyway for completeness. - In gram.y, I moved the error check around to avoid duplication. - In ri_triggers.c, I follow your renaming of the constants, but RI_PLAN_ONTRIGGER_RESTRICT seems a little weird. Maybe do _ONBOTH_, or else reverse the order, like RI_PLAN_SETNULL_ONDELETE, which would then allow RI_PLAN_RESTRICT. Please look through this and provide an updated patch, and then it should be good to go.
Attachment
On Thu, Nov 11, 2021 at 4:34 AM Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote: > > I have reviewed your patch > referential-actions-on-delete-only-set-cols-v3.patch. Attached are two > patches that go on top of yours that contain my recommended changes. > > Basically, this all looks pretty good to me. My changes are mostly > stylistic. Thank you! I really, really appreciate the thorough review and the comments and corrections. > Some notes of substance: > > - The omission of the referential actions details from the CREATE TABLE > reference page surprised me. I have committed that separately (without > the column support, of course). So you should rebase your patch on top > of that. Note that ALTER TABLE would now also need to be updated by > your patch. Done. > - I recommend setting pg_constraint.confdelsetcols to null for the > default behavior of setting all columns, instead of an empty array the > way you have done it. I have noted the places in the code that need to > be changed for that. Done. > - The outfuncs.c support shouldn't be included in the final patch. > There is nothing wrong it, but I don't think it should be part of this > patch to add piecemeal support like that. I have included a few changes > there anyway for completeness. Got it. I've reverted the changes in that file. > - In ri_triggers.c, I follow your renaming of the constants, but > RI_PLAN_ONTRIGGER_RESTRICT seems a little weird. Maybe do _ONBOTH_, or > else reverse the order, like RI_PLAN_SETNULL_ONDELETE, which would then > allow RI_PLAN_RESTRICT. I've reversed the order, so it's now RI_PLAN_<action>_<trigger>, and renamed the RESTRICT one to just RI_PLAN_RESTRICT. I've attached an updated patch, including your changes and the additional changes mentioned above. - Paul
Attachment
On Mon, Nov 22, 2021 at 8:04 PM Paul Martinez <hellopfm@gmail.com> wrote:
On Thu, Nov 11, 2021 at 4:34 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
>
> I have reviewed your patch
> referential-actions-on-delete-only-set-cols-v3.patch. Attached are two
> patches that go on top of yours that contain my recommended changes.
>
> Basically, this all looks pretty good to me. My changes are mostly
> stylistic.
Thank you! I really, really appreciate the thorough review and the
comments and corrections.
> Some notes of substance:
>
> - The omission of the referential actions details from the CREATE TABLE
> reference page surprised me. I have committed that separately (without
> the column support, of course). So you should rebase your patch on top
> of that. Note that ALTER TABLE would now also need to be updated by
> your patch.
Done.
> - I recommend setting pg_constraint.confdelsetcols to null for the
> default behavior of setting all columns, instead of an empty array the
> way you have done it. I have noted the places in the code that need to
> be changed for that.
Done.
> - The outfuncs.c support shouldn't be included in the final patch.
> There is nothing wrong it, but I don't think it should be part of this
> patch to add piecemeal support like that. I have included a few changes
> there anyway for completeness.
Got it. I've reverted the changes in that file.
> - In ri_triggers.c, I follow your renaming of the constants, but
> RI_PLAN_ONTRIGGER_RESTRICT seems a little weird. Maybe do _ONBOTH_, or
> else reverse the order, like RI_PLAN_SETNULL_ONDELETE, which would then
> allow RI_PLAN_RESTRICT.
I've reversed the order, so it's now RI_PLAN_<action>_<trigger>, and
renamed the RESTRICT one to just RI_PLAN_RESTRICT.
I've attached an updated patch, including your changes and the additional
changes mentioned above.
- Paul
Hi,
+ If a foreign key with a <literal>SET NULL</literal> or <literal>SET+ DEFAULT</literal> delete action, which columns should be updated.
which columns should be updated -> the columns that should be updated
+ if (fk_del_set_cols)
+ {
+ int num_delete_cols = 0;
+ {
+ int num_delete_cols = 0;
Since num_delete_cols is only used in the else block, I think it can be moved inside else block.
Or you can store the value inside *num_fk_del_set_cols directly and avoid num_delete_cols.
Cheers
On Mon, Nov 22, 2021 at 10:21 PM Zhihong Yu <zyu@yugabyte.com> wrote: > > Hi, > + If a foreign key with a <literal>SET NULL</literal> or <literal>SET > + DEFAULT</literal> delete action, which columns should be updated. > > which columns should be updated -> the columns that should be updated Done. > + if (fk_del_set_cols) > + { > + int num_delete_cols = 0; > > Since num_delete_cols is only used in the else block, I think it can be moved inside else block. > Or you can store the value inside *num_fk_del_set_cols directly and avoid num_delete_cols. I've moved it inside the else block (and removed the initialization). Updated patch attached. Thanks for taking a look so quickly! - Paul
Attachment
Re: [PATCH] Partial foreign key updates in referential integrity triggers
From
Peter Eisentraut
Date:
On 05.01.21 22:40, Paul Martinez wrote: > CREATE TABLE tenants (id serial PRIMARY KEY); > CREATE TABLE users ( > tenant_id int REFERENCES tenants ON DELETE CASCADE, > id serial, > PRIMARY KEY (tenant_id, id), > ); > CREATE TABLE posts ( > tenant_id int REFERENCES tenants ON DELETE CASCADE, > id serial, > author_id int, > PRIMARY KEY (tenant_id, id), > FOREIGN KEY (tenant_id, author_id) REFERENCES users ON DELETE SET NULL > ); > > INSERT INTO tenants VALUES (1); > INSERT INTO users VALUES (1, 101); > INSERT INTO posts VALUES (1, 201, 101); > DELETE FROM users WHERE id = 101; > ERROR: null value in column "tenant_id" violates not-null constraint > DETAIL: Failing row contains (null, 201, null). I was looking through this example to see if it could be adapted for the documentation. The way the users table is defined, it appears that "id" is actually unique and the primary key ought to be just (id). The DELETE command you show also just uses the id column to find the user, which would be bad if the user id is not unique across tenants. If the id were unique, then the foreign key from posts to users would just use the user id column and the whole problem of the ON DELETE SET NULL action would go away. If the primary key of users is indeed supposed to be (tenant_id, id), then maybe the definition of the users.id column should not use serial, and the DELETE command should also look at the tenant_id column. (The same question applies to posts.id.) Also, you initially wrote that this is a denormalized schema. I think if we keep the keys the way you show, then this isn't denormalized. But if we considered users.id globally unique, then there would be normalization concerns. What do you think?
Re: [PATCH] Partial foreign key updates in referential integrity triggers
From
Peter Eisentraut
Date:
On 23.11.21 05:44, Paul Martinez wrote: > Updated patch attached. Thanks for taking a look so quickly! This patch looks pretty much okay to me. As I wrote in another message in this thread, I'm having some doubts about the proper use case. So I'm going to push this commit fest entry to the next one, so we can continue that discussion.
Re: [PATCH] Partial foreign key updates in referential integrity triggers
From
Matthias van de Meent
Date:
On Wed, 1 Dec 2021 at 11:33, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote: > > On 23.11.21 05:44, Paul Martinez wrote: > > Updated patch attached. Thanks for taking a look so quickly! > > This patch looks pretty much okay to me. As I wrote in another message > in this thread, I'm having some doubts about the proper use case. So > I'm going to push this commit fest entry to the next one, so we can > continue that discussion. The use case of the original mail "foreign keys are guaranteed to not be cross-tenant" seems like a good enough use case to me? The alternative to the discriminator column approach to seperating tenant data even when following referential integrety checks would be maintaining a copy of the table for each tenant, but this won't work as well due to (amongst others) syscache bloat, prepared statements being significantly less effective, and DDL operations now growing linearly with the amount of tenants in the system. Kind regards, Matthias van de Meent
On Wed, Nov 24, 2021 at 10:59 AM Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote: > > I was looking through this example to see if it could be adapted for the > documentation. > > The way the users table is defined, it appears that "id" is actually > unique and the primary key ought to be just (id). The DELETE command > you show also just uses the id column to find the user, which would be > bad if the user id is not unique across tenants. If the id were unique, > then the foreign key from posts to users would just use the user id > column and the whole problem of the ON DELETE SET NULL action would go > away. If the primary key of users is indeed supposed to be (tenant_id, > id), then maybe the definition of the users.id column should not use > serial, and the DELETE command should also look at the tenant_id column. > (The same question applies to posts.id.) > > Also, you initially wrote that this is a denormalized schema. I think > if we keep the keys the way you show, then this isn't denormalized. But > if we considered users.id globally unique, then there would be > normalization concerns. > > What do you think? Regarding that specific example, in a production scenario, yes, the DELETE command should reference both columns. And if used for documentation both columns should be referenced for clarity/correctness. I don't think the exact semantics regarding the uniqueness of the id column are critical. Configuring separate auto-incrementing ids per tenant would be fairly complex; practically speaking, a single database with multi-tenant data will use serial to get auto-incrementing ids (or else use UUIDs to prevent conflicts). The possibility of conflicting ids likely won't arise until moving to a distributed environment, at which point queries should only be routed towards a single shard (where uniqueness will still hold), either by some higher level application-level context, or by including the tenant_id as part of the query. I think there are three separate motivating use cases for using (tenant_id, id) as primary keys everywhere in a multi-tenant database: 1) I initially encountered this problem while migrating a database to use Citus, which requires that primary keys (and any other uniqueness constraints) include the shard key, which forces the primary key to be (tenant_id, id). I'm not sure what constraints other sharding solutions enforce, but I don't feel like this feature is over-fitting to Citus' specific implementation -- it seems like a pretty reasonable/generalizable solution when sharding data: prefix all your indexes with the shard key. 2) As I mentioned in my response to Tom in my original proposal thread, and as Matthias alluded to, using composite primary keys grants significantly stronger referential integrity by preventing cross-tenant references. I think this represents a significant leap in the robustness and security of a schema, to the point where you could consider it a design flaw to _not_ use composite keys. https://www.postgresql.org/message-id/flat/CAF%2B2_SFFCjWMpxo0cj3yaqMavcb3Byd0bSG%2B0UPs7RVb8EF99g%40mail.gmail.com#c0e2b37b223bfbf8ece561f02865286c 3) For performance reasons, indexes on foreign keys will often be prefixed by the tenant_id to speed up index scans. (I think algorithmically doing an index lookup on (fk_id) vs. (tenant_id, fk_id) has the same complexity, but repeated index scans, such as when doing a join, should in practice be more efficient when including a tenant_id, because most queries will only reference a single tenant so the looked up values are more likely to be on the same pages.) If a foreign key only references the id column, then ON DELETE CASCADE triggers will only use the id column in their DELETE query. Thus, to ensure that deletes are still fast, you will need to create an index on (fk_id) in addition to the (tenant_id, fk_id) index, which would cause _significant_ database bloat. (In practice, the presence of both indexes will also confuse the query planner and now BOTH indexes will take up precious space in the database's working memory, so it really creates all sorts of problems.) Using a composite foreign key will ensure that ON DELETE CASCADE trigger query will use both columns. - Paul
Re: [PATCH] Partial foreign key updates in referential integrity triggers
From
Peter Eisentraut
Date:
On 02.12.21 01:17, Paul Martinez wrote: > Regarding that specific example, in a production scenario, yes, the > DELETE command should reference both columns. And if used for > documentation both columns should be referenced for clarity/correctness. Ok, thanks. I have updated your example accordingly and included it in the patch, which I have now committed.