Thread: [PATCH] Partial foreign key updates in referential integrity triggers

[PATCH] Partial foreign key updates in referential integrity triggers

From
Paul Martinez
Date:
Hey, hackers!

I've created a patch to better support referential integrity constraints when
using composite primary and foreign keys. This patch allows creating a foreign
key using the syntax:

  FOREIGN KEY (tenant_id, fk_id) REFERENCES fktable ON DELETE SET NULL (fk_id)

which means that only the fk_id column will be set to NULL when the referenced
row is deleted, rather than both the tenant_id and fk_id columns.

In multi-tenant applications, it is common to denormalize a "tenant_id" column
across every table, and use composite primary keys of the form (tenant_id, id)
and composite foreign keys of the form (tenant_id, fk_id), reusing the
tenant_id column in both the primary and foreign key.

This is often done initially for performance reasons, but has the added benefit
of making it impossible for data from one tenant to reference data from another
tenant, also making this a good decision from a security perspective.

Unfortunately, one downside of using composite foreign keys in such a matter
is that commonly used referential actions, such as ON DELETE SET NULL, no
longer work, because Postgres tries to set all of the referencing columns to
NULL, including the columns that overlap with the primary key:


CREATE TABLE tenants (id serial PRIMARY KEY);
CREATE TABLE users (
  tenant_id int REFERENCES tenants ON DELETE CASCADE,
  id serial,
  PRIMARY KEY (tenant_id, id),
);
CREATE TABLE posts (
    tenant_id int REFERENCES tenants ON DELETE CASCADE,
    id serial,
    author_id int,
    PRIMARY KEY (tenant_id, id),
    FOREIGN KEY (tenant_id, author_id) REFERENCES users ON DELETE SET NULL
);

INSERT INTO tenants VALUES (1);
INSERT INTO users VALUES (1, 101);
INSERT INTO posts VALUES (1, 201, 101);
DELETE FROM users WHERE id = 101;
ERROR:  null value in column "tenant_id" violates not-null constraint
DETAIL:  Failing row contains (null, 201, null).


This can be resolved by manually creating triggers on the referenced table, but
this is clunky and adds a significant amount of extra work when adding (or
removing!) foreign keys. Users shouldn't have to compromise on maintenance
overhead when using composite foreign keys.

I have implemented a simple extension to the syntax for foreign keys that
makes it just as easy to support referential integrity constraints for
composite foreign keys as it is for single column foreign keys. The SET NULL
and SET DEFAULT referential actions can now be optionally followed by a column
list:

key_action:
  | NO ACTION
  | RESTRICT
  | CASCADE
  | SET NULL [ (column_name [, ...] ) ]
  | SET DEFAULT [ (column_name [, ...] ) ]

When a column list is provided, only the specified columns, which must be a
subset of the referencing columns, will be updated in the associated trigger.

Note that use of SET NULL (col1, ...) on a composite foreign key with MATCH
FULL specified will still raise an error. In such a scenario we could raise an
error when the user tries to define the foreign key, but we don't raise a
similar error when a user tries to use SET NULL on a non-nullable column, so I
don't think this is critical. (I haven't added this check in my patch.) While
this feature would mostly be used with the default MATCH SIMPLE, I could
imagine using SET DEFAULT (col, ...) even for MATCH FULL foreign keys.

To store this additional data, I've added two columns to the pg_constraint
catalog entry:

confupdsetcols int2[]
confdelsetcols int2[]

These columns store the attribute numbers of the columns to update in the
ON UPDATE and ON DELETE triggers respectively. If the arrays are empty, then
all of the referencing columns should be updated.


I previously proposed this feature about a year ago [1], but I don't feel like
the arguments against it were very strong. Wanting to get more familiar with the
Postgres codebase I decided to implement the feature over this holiday break,
and I've gotten everything working and put together a complete patch including
tests and updates to documentation. Hopefully if people find it useful it can
make its way into the next commitfest!

Visual diff:
https://github.com/postgres/postgres/compare/master...PaulJuliusMartinez:on-upd-del-set-cols

Here's a rough outline of the changes:

src/backend/parser/gram.y                 | 122
src/include/nodes/parsenodes.h            |   3
src/backend/nodes/copyfuncs.c             |   2
src/backend/nodes/equalfuncs.c            |   2
src/backend/nodes/outfuncs.c              |  47
- Modify grammar to add opt_column_list after SET NULL and SET DEFAULT
- Add fk_{upd,del}_set_cols fields to Constraint struct
- Add proper node support, as well as outfuncs support for AlterTableStmt,
  which I used while debugging

src/include/catalog/pg_constraint.h       |  20
src/backend/catalog/pg_constraint.c       |  80
src/include/catalog/catversion.h          |   2
- Add confupdsetcols and confdelsetcols to pg_constraint catalog entry

src/backend/commands/tablecmds.c          | 142
- Pass along data from parsed Constraint node to CreateConstraintEntry
- Handle propagating constraints for partitioned tables

src/backend/utils/adt/ri_triggers.c       | 109
- Update construction of trigger query to only update specified columns
- Update caching mechanism since ON UPDATE and ON DELETE may modify different
  sets of columns

src/backend/utils/adt/ruleutils.c         |  29
- Update pg_get_constraintdef to handle new syntax
- This automatically updates psql \d output as well as pg_dump

src/backend/catalog/heap.c                |   4
src/backend/catalog/index.c               |   4
src/backend/commands/trigger.c            |   4
src/backend/commands/typecmds.c           |   4
src/backend/utils/cache/relcache.c        |   2
- Update misc. call sites for updated functions

src/test/regress/sql/foreign_key.sql      |  63
src/test/regress/expected/foreign_key.out |  88
- Regression tests with checks for error cases and partitioned tables

doc/src/sgml/catalogs.sgml                |  24
doc/src/sgml/ref/create_table.sgml        |  15
- Updated documentation for CREATE TABLE and pg_constraint catalog definition

20 files changed, 700 insertions(+), 66 deletions(-)

Thanks,
Paul


[1]
https://www.postgresql.org/message-id/flat/CAF%2B2_SGRXQOtumethpuXhsyU%2B4AYzfKA5fhHCjCjH%2BjQ04WWjA%40mail.gmail.com

Attachment

Re: [PATCH] Partial foreign key updates in referential integrity triggers

From
David Steele
Date:
On 1/5/21 4:40 PM, Paul Martinez wrote:
> 
> I've created a patch to better support referential integrity constraints when
> using composite primary and foreign keys. This patch allows creating a foreign
> key using the syntax:

<snip>

> I previously proposed this feature about a year ago [1], but I don't feel like
> the arguments against it were very strong.

Perhaps not, but Tom certainly didn't seem in favor of this feature, at 
the least.

Since the patch has not attracted any review/comment I propose to move 
it to the next CF since it doesn't seem a likely candidate for PG14.

I'll do that on March 23 unless there are arguments to the contrary.

Regards,
-- 
-David
david@pgmasters.net



Re: [PATCH] Partial foreign key updates in referential integrity triggers

From
David Steele
Date:
On 3/18/21 9:52 AM, David Steele wrote:
> On 1/5/21 4:40 PM, Paul Martinez wrote:
>>
>> I've created a patch to better support referential integrity 
>> constraints when
>> using composite primary and foreign keys. This patch allows creating a 
>> foreign
>> key using the syntax:
> 
> <snip>
> 
>> I previously proposed this feature about a year ago [1], but I don't 
>> feel like
>> the arguments against it were very strong.
> 
> Perhaps not, but Tom certainly didn't seem in favor of this feature, at 
> the least.
> 
> Since the patch has not attracted any review/comment I propose to move 
> it to the next CF since it doesn't seem a likely candidate for PG14.
> 
> I'll do that on March 23 unless there are arguments to the contrary.

This entry has been moved to the 2021-07 CF with status Needs Review.

Regards,
-- 
-David
david@pgmasters.net



Re: [PATCH] Partial foreign key updates in referential integrity triggers

From
Peter Eisentraut
Date:
On 05.01.21 22:40, Paul Martinez wrote:
> I've created a patch to better support referential integrity constraints when
> using composite primary and foreign keys. This patch allows creating a foreign
> key using the syntax:
> 
>    FOREIGN KEY (tenant_id, fk_id) REFERENCES fktable ON DELETE SET NULL (fk_id)
> 
> which means that only the fk_id column will be set to NULL when the referenced
> row is deleted, rather than both the tenant_id and fk_id columns.

I think this is an interesting feature with a legitimate use case.

I'm wondering a bit about what the ON UPDATE side of this is supposed to 
mean.  Consider your example:

> CREATE TABLE tenants (id serial PRIMARY KEY);
> CREATE TABLE users (
>    tenant_id int REFERENCES tenants ON DELETE CASCADE,
>    id serial,
>    PRIMARY KEY (tenant_id, id),
> );
> CREATE TABLE posts (
>      tenant_id int REFERENCES tenants ON DELETE CASCADE,
>      id serial,
>      author_id int,
>      PRIMARY KEY (tenant_id, id),
>      FOREIGN KEY (tenant_id, author_id) REFERENCES users ON DELETE SET NULL
> );
> 
> INSERT INTO tenants VALUES (1);
> INSERT INTO users VALUES (1, 101);
> INSERT INTO posts VALUES (1, 201, 101);
> DELETE FROM users WHERE id = 101;
> ERROR:  null value in column "tenant_id" violates not-null constraint
> DETAIL:  Failing row contains (null, 201, null).

Consider what should happen when you update users.id.  Per SQL standard, 
for MATCH SIMPLE an ON UPDATE SET NULL should only set to null the 
referencing column that corresponds to the referenced column actually 
updated, not all of them.  PostgreSQL doesn't do this, but if it did, 
then this would work just fine.

Your feature requires specifying a fixed column or columns to update, so 
it cannot react differently to what column actually updated.  In fact, 
you might want different referential actions depending on what columns 
are updated, like what you can do with general triggers.

So, unless I'm missing an angle here, I would suggest leaving out the ON 
UPDATE variant of this feature.



Re: [PATCH] Partial foreign key updates in referential integrity triggers

From
Ibrar Ahmed
Date:


On Wed, Jul 14, 2021 at 6:51 PM Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:

On 05.01.21 22:40, Paul Martinez wrote:
> I've created a patch to better support referential integrity constraints when
> using composite primary and foreign keys. This patch allows creating a foreign
> key using the syntax:
>
>    FOREIGN KEY (tenant_id, fk_id) REFERENCES fktable ON DELETE SET NULL (fk_id)
>
> which means that only the fk_id column will be set to NULL when the referenced
> row is deleted, rather than both the tenant_id and fk_id columns.

I think this is an interesting feature with a legitimate use case.

I'm wondering a bit about what the ON UPDATE side of this is supposed to
mean.  Consider your example:

> CREATE TABLE tenants (id serial PRIMARY KEY);
> CREATE TABLE users (
>    tenant_id int REFERENCES tenants ON DELETE CASCADE,
>    id serial,
>    PRIMARY KEY (tenant_id, id),
> );
> CREATE TABLE posts (
>      tenant_id int REFERENCES tenants ON DELETE CASCADE,
>      id serial,
>      author_id int,
>      PRIMARY KEY (tenant_id, id),
>      FOREIGN KEY (tenant_id, author_id) REFERENCES users ON DELETE SET NULL
> );
>
> INSERT INTO tenants VALUES (1);
> INSERT INTO users VALUES (1, 101);
> INSERT INTO posts VALUES (1, 201, 101);
> DELETE FROM users WHERE id = 101;
> ERROR:  null value in column "tenant_id" violates not-null constraint
> DETAIL:  Failing row contains (null, 201, null).

Consider what should happen when you update users.id.  Per SQL standard,
for MATCH SIMPLE an ON UPDATE SET NULL should only set to null the
referencing column that corresponds to the referenced column actually
updated, not all of them.  PostgreSQL doesn't do this, but if it did,
then this would work just fine.

Your feature requires specifying a fixed column or columns to update, so
it cannot react differently to what column actually updated.  In fact,
you might want different referential actions depending on what columns
are updated, like what you can do with general triggers.

So, unless I'm missing an angle here, I would suggest leaving out the ON
UPDATE variant of this feature.


 
Patch does not apply on head, I am marking the status "Waiting on author"

Re: [PATCH] Partial foreign key updates in referential integrity triggers

From
Paul Martinez
Date:
On Wed, Jul 14, 2021 at 6:51 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
> I think this is an interesting feature with a legitimate use case.

Great! I'm glad to hear that.

> Consider what should happen when you update users.id.  Per SQL standard,
> for MATCH SIMPLE an ON UPDATE SET NULL should only set to null the
> referencing column that corresponds to the referenced column actually
> updated, not all of them.  PostgreSQL doesn't do this, but if it did,
> then this would work just fine.
>
> ...
>
> So, unless I'm missing an angle here, I would suggest leaving out the ON
> UPDATE variant of this feature.

I understand what you're saying here, but are you sure that is the
behavior specified in the SQL standard?

I can't find a copy of a more recent standard, but at least in SQL 1999 [1],
it has this to say (page 459 of the linked PDF, page 433 of the standard):

> 8) If a non-null value of a referenced column in the referenced table is
>    updated to a value that is distinct from the current value of that
>    column, then for every member F of the subtable family of the
>    referencing table:
>
>   Case:
>   a) If SIMPLE is specified or implicit, or if FULL is specified, then
>
>     Case:
>     ii) If the <update rule> specifies SET NULL, then
>
>       Case:
>       1) If SIMPLE is specified or implicit, then:
>
>         A) <snipped stuff about triggers>
>
>         B) For every F, in every matching row in F, each referencing
>         column in F that corresponds with a referenced column is set to
>         the null value.

This phrasing doesn't seem to make any reference to which columns were
actually updated.

The definitions a few pages earlier do explicitly define the set of
updated columns ("Let UMC be the set of referencing columns that
correspond with updated referenced columns"), but that defined set is
only referenced in the behavior when PARTIAL is specified, implying that
the set of updated columns is _not_ relevant in the SIMPLE case.


If my understanding of this is correct, then Postgres _isn't_ out of spec,
and this extension still works fine for the ON UPDATE triggers. (But, wow,
this spec is not easy to read, so I could definitely be wrong.)

I'm not sure how MATCH PARTIAL fits into this; I know it's
unimplemented, but I don't know what its use cases are, and I don't
think I understand how ON DELETE / ON UPDATE should work with MATCH
PARTIAL, let alone how they would work combined with this extension.

This lack of clarity may be a good-enough reason to leave out the ON
UPDATE case.

On Wed, Jul 14, 2021 at 12:36 PM Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote:
> Patch does not apply on head, I am marking the status "Waiting on author"
> http://cfbot.cputube.org/patch_33_2932.log

I've rebased my original patch and it should work now. This is
referential-actions-set-cols-v2.patch.

I've also created a second patch that leaves out the ON UPDATE behavior, if
we think that's the safer route. This is
referential-actions-on-delete-only-set-cols-v1.patch.

[1]: http://web.cecs.pdx.edu/~len/sql1999.pdf

On Wed, Jul 14, 2021 at 12:36 PM Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote:
>
>
>
> On Wed, Jul 14, 2021 at 6:51 PM Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:
>>
>>
>> On 05.01.21 22:40, Paul Martinez wrote:
>> > I've created a patch to better support referential integrity constraints when
>> > using composite primary and foreign keys. This patch allows creating a foreign
>> > key using the syntax:
>> >
>> >    FOREIGN KEY (tenant_id, fk_id) REFERENCES fktable ON DELETE SET NULL (fk_id)
>> >
>> > which means that only the fk_id column will be set to NULL when the referenced
>> > row is deleted, rather than both the tenant_id and fk_id columns.
>>
>> I think this is an interesting feature with a legitimate use case.
>>
>> I'm wondering a bit about what the ON UPDATE side of this is supposed to
>> mean.  Consider your example:
>>
>> > CREATE TABLE tenants (id serial PRIMARY KEY);
>> > CREATE TABLE users (
>> >    tenant_id int REFERENCES tenants ON DELETE CASCADE,
>> >    id serial,
>> >    PRIMARY KEY (tenant_id, id),
>> > );
>> > CREATE TABLE posts (
>> >      tenant_id int REFERENCES tenants ON DELETE CASCADE,
>> >      id serial,
>> >      author_id int,
>> >      PRIMARY KEY (tenant_id, id),
>> >      FOREIGN KEY (tenant_id, author_id) REFERENCES users ON DELETE SET NULL
>> > );
>> >
>> > INSERT INTO tenants VALUES (1);
>> > INSERT INTO users VALUES (1, 101);
>> > INSERT INTO posts VALUES (1, 201, 101);
>> > DELETE FROM users WHERE id = 101;
>> > ERROR:  null value in column "tenant_id" violates not-null constraint
>> > DETAIL:  Failing row contains (null, 201, null).
>>
>> Consider what should happen when you update users.id.  Per SQL standard,
>> for MATCH SIMPLE an ON UPDATE SET NULL should only set to null the
>> referencing column that corresponds to the referenced column actually
>> updated, not all of them.  PostgreSQL doesn't do this, but if it did,
>> then this would work just fine.
>>
>> Your feature requires specifying a fixed column or columns to update, so
>> it cannot react differently to what column actually updated.  In fact,
>> you might want different referential actions depending on what columns
>> are updated, like what you can do with general triggers.
>>
>> So, unless I'm missing an angle here, I would suggest leaving out the ON
>> UPDATE variant of this feature.
>>
>>
>
> Patch does not apply on head, I am marking the status "Waiting on author"
> http://cfbot.cputube.org/patch_33_2932.log
>
> --
> Ibrar Ahmed

Attachment

Re: [PATCH] Partial foreign key updates in referential integrity triggers

From
Daniel Gustafsson
Date:
> On 4 Aug 2021, at 23:49, Paul Martinez <hellopfm@gmail.com> wrote:

> I've rebased my original patch and it should work now.

This patch no longer applies, can you please submit a rebased version?  It
currently fails on catversion.h, to keep that from happening repeatedly you can
IMO skip that from the patch submission.

--
Daniel Gustafsson        https://vmware.com/




Re: [PATCH] Partial foreign key updates in referential integrity triggers

From
Paul Martinez
Date:
On Wed, Sep 1, 2021 at 4:11 AM Daniel Gustafsson <daniel@yesql.se> wrote:

> This patch no longer applies, can you please submit a rebased version?  It
> currently fails on catversion.h, to keep that from happening repeatedly you can
> IMO skip that from the patch submission.

Ah, understood. Will do that in the future. Attached are rebased patches, not
including catversion.h changes, for both the ON UPDATE/DELETE case, and the
ON DELETE only case.

- Paul

Attachment

Re: [PATCH] Partial foreign key updates in referential integrity triggers

From
Zhihong Yu
Date:


On Thu, Sep 2, 2021 at 12:11 PM Paul Martinez <hellopfm@gmail.com> wrote:
On Wed, Sep 1, 2021 at 4:11 AM Daniel Gustafsson <daniel@yesql.se> wrote:

> This patch no longer applies, can you please submit a rebased version?  It
> currently fails on catversion.h, to keep that from happening repeatedly you can
> IMO skip that from the patch submission.

Ah, understood. Will do that in the future. Attached are rebased patches, not
including catversion.h changes, for both the ON UPDATE/DELETE case, and the
ON DELETE only case.

- Paul
Hi,
+       case RI_TRIGTYPE_DELETE:
+           queryno = is_set_null
+               ? RI_PLAN_ONDELETE_SETNULL_DOUPDATE
+               : RI_PLAN_ONDELETE_SETDEFAULT_DOUPDATE;

Should the new symbols be renamed ?

RI_PLAN_ONDELETE_SETNULL_DOUPDATE -> RI_PLAN_ONDELETE_SETNULL_DODELETE 
RI_PLAN_ONDELETE_SETDEFAULT_DOUPDATE -> RI_PLAN_ONDELETE_SETDEFAULT_DODELETE

Cheers

Re: [PATCH] Partial foreign key updates in referential integrity triggers

From
Paul Martinez
Date:
On Thu, Sep 2, 2021 at 1:55 PM Zhihong Yu <zyu@yugabyte.com> wrote:
>
> Hi,
> +       case RI_TRIGTYPE_DELETE:
> +           queryno = is_set_null
> +               ? RI_PLAN_ONDELETE_SETNULL_DOUPDATE
> +               : RI_PLAN_ONDELETE_SETDEFAULT_DOUPDATE;
>
> Should the new symbols be renamed ?
>
> RI_PLAN_ONDELETE_SETNULL_DOUPDATE -> RI_PLAN_ONDELETE_SETNULL_DODELETE
> RI_PLAN_ONDELETE_SETDEFAULT_DOUPDATE -> RI_PLAN_ONDELETE_SETDEFAULT_DODELETE

These constants are named correctly -- they follow the format:

RI_PLAN_<trigger>_<action>_<what_saved_plan_does>

These symbols refer to plans that are used for ON DELETE SET NULL
and ON DELETE SET DEFAULT triggers, which update rows in the referencing
table ("_DOUPDATE"). These triggers do not perform any deletions.


But these names are definitely confusing, and I did have to spend some time
confirming that the names were correct. I decided to rename these, as well as
the other plan keys, so they all use the same more explicit format:

RI_PLAN_<trigger>_<action>

RI_PLAN_CASCADE_DEL_DODELETE => RI_PLAN_ONDELETE_CASCADE
RI_PLAN_CASCADE_UPD_DOUPDATE => RI_PLAN_ONUPDATE_CASCADE

RI_PLAN_RESTRICT_CHECKREF    => RI_PLAN_ONTRIGGER_RESTRICT

RI_PLAN_SETNULL_DOUPDATE     => RI_PLAN_ONDELETE_SETNULL
                            and RI_PLAN_ONUPDATE_SETNULL

RI_PLAN_SETDEFAULT_DOUPDATE  => RI_PLAN_ONDELETE_SETDEFAULT
                            and RI_PLAN_ONUPDATE_SETDEFAULT

The same plan can be used for both ON DELETE RESTRICT and ON UPDATE RESTRICT,
so we just use ONTRIGGER there. Previously, the same plan could also be
used for both ON DELETE SET NULL and ON UPDATE SET NULL, or both
ON DELETE SET DEFAULT and ON UPDATE SET DEFAULT. This is no longer the case,
so we need to add separate keys for each case. As an example, a constraint on
a table foo could specify:

FOREIGN KEY (a, b) REFERENCES bar (a, b)
  ON UPDATE SET NULL
  ON DELETE SET NULL (a)

In this case for the update trigger we want to do:

UPDATE foo SET a = NULL, B = NULL WHERE ...

but for the delete trigger we want to do:

UPDATE foo SET a = NULL WHERE ...

so the plans cannot be shared.

(Note that we still need separate plans even if we only support specifying
a column subset for the ON DELETE trigger. As in the above example, the
ON UPDATE trigger will always set all the columns, while the ON DELETE trigger
could only set a subset.)


- Paul

Attachment

Re: [PATCH] Partial foreign key updates in referential integrity triggers

From
Peter Eisentraut
Date:
I have reviewed your patch 
referential-actions-on-delete-only-set-cols-v3.patch.  Attached are two 
patches that go on top of yours that contain my recommended changes.

Basically, this all looks pretty good to me.  My changes are mostly 
stylistic.

Some notes of substance:

- The omission of the referential actions details from the CREATE TABLE 
reference page surprised me.  I have committed that separately (without 
the column support, of course).  So you should rebase your patch on top 
of that.  Note that ALTER TABLE would now also need to be updated by 
your patch.

- I recommend setting pg_constraint.confdelsetcols to null for the 
default behavior of setting all columns, instead of an empty array the 
way you have done it.  I have noted the places in the code that need to 
be changed for that.

- The outfuncs.c support shouldn't be included in the final patch. 
There is nothing wrong it, but I don't think it should be part of this 
patch to add piecemeal support like that.  I have included a few changes 
there anyway for completeness.

- In gram.y, I moved the error check around to avoid duplication.

- In ri_triggers.c, I follow your renaming of the constants, but 
RI_PLAN_ONTRIGGER_RESTRICT seems a little weird.  Maybe do _ONBOTH_, or 
else reverse the order, like RI_PLAN_SETNULL_ONDELETE, which would then 
allow RI_PLAN_RESTRICT.

Please look through this and provide an updated patch, and then it 
should be good to go.
Attachment

Re: [PATCH] Partial foreign key updates in referential integrity triggers

From
Paul Martinez
Date:
On Thu, Nov 11, 2021 at 4:34 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
>
> I have reviewed your patch
> referential-actions-on-delete-only-set-cols-v3.patch.  Attached are two
> patches that go on top of yours that contain my recommended changes.
>
> Basically, this all looks pretty good to me.  My changes are mostly
> stylistic.

Thank you! I really, really appreciate the thorough review and the
comments and corrections.

> Some notes of substance:
>
> - The omission of the referential actions details from the CREATE TABLE
> reference page surprised me.  I have committed that separately (without
> the column support, of course).  So you should rebase your patch on top
> of that.  Note that ALTER TABLE would now also need to be updated by
> your patch.

Done.

> - I recommend setting pg_constraint.confdelsetcols to null for the
> default behavior of setting all columns, instead of an empty array the
> way you have done it.  I have noted the places in the code that need to
> be changed for that.

Done.

> - The outfuncs.c support shouldn't be included in the final patch.
> There is nothing wrong it, but I don't think it should be part of this
> patch to add piecemeal support like that.  I have included a few changes
> there anyway for completeness.

Got it. I've reverted the changes in that file.

> - In ri_triggers.c, I follow your renaming of the constants, but
> RI_PLAN_ONTRIGGER_RESTRICT seems a little weird.  Maybe do _ONBOTH_, or
> else reverse the order, like RI_PLAN_SETNULL_ONDELETE, which would then
> allow RI_PLAN_RESTRICT.

I've reversed the order, so it's now RI_PLAN_<action>_<trigger>, and
renamed the RESTRICT one to just RI_PLAN_RESTRICT.


I've attached an updated patch, including your changes and the additional
changes mentioned above.

- Paul

Attachment

Re: [PATCH] Partial foreign key updates in referential integrity triggers

From
Zhihong Yu
Date:


On Mon, Nov 22, 2021 at 8:04 PM Paul Martinez <hellopfm@gmail.com> wrote:
On Thu, Nov 11, 2021 at 4:34 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
>
> I have reviewed your patch
> referential-actions-on-delete-only-set-cols-v3.patch.  Attached are two
> patches that go on top of yours that contain my recommended changes.
>
> Basically, this all looks pretty good to me.  My changes are mostly
> stylistic.

Thank you! I really, really appreciate the thorough review and the
comments and corrections.

> Some notes of substance:
>
> - The omission of the referential actions details from the CREATE TABLE
> reference page surprised me.  I have committed that separately (without
> the column support, of course).  So you should rebase your patch on top
> of that.  Note that ALTER TABLE would now also need to be updated by
> your patch.

Done.

> - I recommend setting pg_constraint.confdelsetcols to null for the
> default behavior of setting all columns, instead of an empty array the
> way you have done it.  I have noted the places in the code that need to
> be changed for that.

Done.

> - The outfuncs.c support shouldn't be included in the final patch.
> There is nothing wrong it, but I don't think it should be part of this
> patch to add piecemeal support like that.  I have included a few changes
> there anyway for completeness.

Got it. I've reverted the changes in that file.

> - In ri_triggers.c, I follow your renaming of the constants, but
> RI_PLAN_ONTRIGGER_RESTRICT seems a little weird.  Maybe do _ONBOTH_, or
> else reverse the order, like RI_PLAN_SETNULL_ONDELETE, which would then
> allow RI_PLAN_RESTRICT.

I've reversed the order, so it's now RI_PLAN_<action>_<trigger>, and
renamed the RESTRICT one to just RI_PLAN_RESTRICT.


I've attached an updated patch, including your changes and the additional
changes mentioned above.

- Paul
Hi,
+       If a foreign key with a <literal>SET NULL</literal> or <literal>SET
+       DEFAULT</literal> delete action, which columns should be updated. 

which columns should be updated -> the columns that should be updated

+   if (fk_del_set_cols)
+   {
+       int num_delete_cols = 0;

Since num_delete_cols is only used in the else block, I think it can be moved inside else block.
Or you can store the value inside *num_fk_del_set_cols directly and avoid num_delete_cols.

Cheers

Re: [PATCH] Partial foreign key updates in referential integrity triggers

From
Paul Martinez
Date:
On Mon, Nov 22, 2021 at 10:21 PM Zhihong Yu <zyu@yugabyte.com> wrote:
>
> Hi,
> +       If a foreign key with a <literal>SET NULL</literal> or <literal>SET
> +       DEFAULT</literal> delete action, which columns should be updated.
>
> which columns should be updated -> the columns that should be updated

Done.

> +   if (fk_del_set_cols)
> +   {
> +       int num_delete_cols = 0;
>
> Since num_delete_cols is only used in the else block, I think it can be moved inside else block.
> Or you can store the value inside *num_fk_del_set_cols directly and avoid num_delete_cols.

I've moved it inside the else block (and removed the initialization).

Updated patch attached. Thanks for taking a look so quickly!

- Paul

Attachment

Re: [PATCH] Partial foreign key updates in referential integrity triggers

From
Peter Eisentraut
Date:
On 05.01.21 22:40, Paul Martinez wrote:
> CREATE TABLE tenants (id serial PRIMARY KEY);
> CREATE TABLE users (
>    tenant_id int REFERENCES tenants ON DELETE CASCADE,
>    id serial,
>    PRIMARY KEY (tenant_id, id),
> );
> CREATE TABLE posts (
>      tenant_id int REFERENCES tenants ON DELETE CASCADE,
>      id serial,
>      author_id int,
>      PRIMARY KEY (tenant_id, id),
>      FOREIGN KEY (tenant_id, author_id) REFERENCES users ON DELETE SET NULL
> );
> 
> INSERT INTO tenants VALUES (1);
> INSERT INTO users VALUES (1, 101);
> INSERT INTO posts VALUES (1, 201, 101);
> DELETE FROM users WHERE id = 101;
> ERROR:  null value in column "tenant_id" violates not-null constraint
> DETAIL:  Failing row contains (null, 201, null).

I was looking through this example to see if it could be adapted for the 
documentation.

The way the users table is defined, it appears that "id" is actually 
unique and the primary key ought to be just (id).  The DELETE command 
you show also just uses the id column to find the user, which would be 
bad if the user id is not unique across tenants.  If the id were unique, 
then the foreign key from posts to users would just use the user id 
column and the whole problem of the ON DELETE SET NULL action would go 
away.  If the primary key of users is indeed supposed to be (tenant_id, 
id), then maybe the definition of the users.id column should not use 
serial, and the DELETE command should also look at the tenant_id column. 
  (The same question applies to posts.id.)

Also, you initially wrote that this is a denormalized schema.  I think 
if we keep the keys the way you show, then this isn't denormalized.  But 
if we considered users.id globally unique, then there would be 
normalization concerns.

What do you think?




Re: [PATCH] Partial foreign key updates in referential integrity triggers

From
Peter Eisentraut
Date:
On 23.11.21 05:44, Paul Martinez wrote:
> Updated patch attached. Thanks for taking a look so quickly!

This patch looks pretty much okay to me.  As I wrote in another message 
in this thread, I'm having some doubts about the proper use case.  So 
I'm going to push this commit fest entry to the next one, so we can 
continue that discussion.




Re: [PATCH] Partial foreign key updates in referential integrity triggers

From
Matthias van de Meent
Date:
On Wed, 1 Dec 2021 at 11:33, Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
>
> On 23.11.21 05:44, Paul Martinez wrote:
> > Updated patch attached. Thanks for taking a look so quickly!
>
> This patch looks pretty much okay to me.  As I wrote in another message
> in this thread, I'm having some doubts about the proper use case.  So
> I'm going to push this commit fest entry to the next one, so we can
> continue that discussion.

The use case of the original mail "foreign keys are guaranteed to not
be cross-tenant" seems like a good enough use case to me?

The alternative to the discriminator column approach to seperating
tenant data even when following referential integrety checks would be
maintaining a copy of the table for each tenant, but this won't work
as well due to (amongst others) syscache bloat, prepared statements
being significantly less effective, and DDL operations now growing
linearly with the amount of tenants in the system.

Kind regards,

Matthias van de Meent



Re: [PATCH] Partial foreign key updates in referential integrity triggers

From
Paul Martinez
Date:
On Wed, Nov 24, 2021 at 10:59 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
>
> I was looking through this example to see if it could be adapted for the
> documentation.
>
> The way the users table is defined, it appears that "id" is actually
> unique and the primary key ought to be just (id).  The DELETE command
> you show also just uses the id column to find the user, which would be
> bad if the user id is not unique across tenants.  If the id were unique,
> then the foreign key from posts to users would just use the user id
> column and the whole problem of the ON DELETE SET NULL action would go
> away.  If the primary key of users is indeed supposed to be (tenant_id,
> id), then maybe the definition of the users.id column should not use
> serial, and the DELETE command should also look at the tenant_id column.
> (The same question applies to posts.id.)
>
> Also, you initially wrote that this is a denormalized schema.  I think
> if we keep the keys the way you show, then this isn't denormalized.  But
> if we considered users.id globally unique, then there would be
> normalization concerns.
>
> What do you think?

Regarding that specific example, in a production scenario, yes, the
DELETE command should reference both columns. And if used for
documentation both columns should be referenced for clarity/correctness.

I don't think the exact semantics regarding the uniqueness of the id
column are critical. Configuring separate auto-incrementing ids per
tenant would be fairly complex; practically speaking, a single
database with multi-tenant data will use serial to get auto-incrementing
ids (or else use UUIDs to prevent conflicts). The possibility of
conflicting ids likely won't arise until moving to a distributed
environment, at which point queries should only be routed towards a
single shard (where uniqueness will still hold), either by some higher
level application-level context, or by including the tenant_id as part
of the query.

I think there are three separate motivating use cases for using
(tenant_id, id) as primary keys everywhere in a multi-tenant database:

1) I initially encountered this problem while migrating a database to use
Citus, which requires that primary keys (and any other uniqueness
constraints) include the shard key, which forces the primary key to be
(tenant_id, id). I'm not sure what constraints other sharding solutions
enforce, but I don't feel like this feature is over-fitting to Citus'
specific implementation -- it seems like a pretty
reasonable/generalizable solution when sharding data: prefix all your
indexes with the shard key.

2) As I mentioned in my response to Tom in my original proposal thread,
and as Matthias alluded to, using composite primary keys grants
significantly stronger referential integrity by preventing cross-tenant
references. I think this represents a significant leap in the robustness
and security of a schema, to the point where you could consider it a
design flaw to _not_ use composite keys.


https://www.postgresql.org/message-id/flat/CAF%2B2_SFFCjWMpxo0cj3yaqMavcb3Byd0bSG%2B0UPs7RVb8EF99g%40mail.gmail.com#c0e2b37b223bfbf8ece561f02865286c

3) For performance reasons, indexes on foreign keys will often be
prefixed by the tenant_id to speed up index scans. (I think
algorithmically doing an index lookup on (fk_id) vs. (tenant_id, fk_id)
has the same complexity, but repeated index scans, such as when doing a
join, should in practice be more efficient when including a tenant_id,
because most queries will only reference a single tenant so the looked
up values are more likely to be on the same pages.) If a foreign key
only references the id column, then ON DELETE CASCADE triggers will only
use the id column in their DELETE query. Thus, to ensure that deletes
are still fast, you will need to create an index on (fk_id) in addition
to the (tenant_id, fk_id) index, which would cause _significant_
database bloat. (In practice, the presence of both indexes will also
confuse the query planner and now BOTH indexes will take up precious
space in the database's working memory, so it really creates all sorts
of problems.) Using a composite foreign key will ensure that ON DELETE
CASCADE trigger query will use both columns.

- Paul



Re: [PATCH] Partial foreign key updates in referential integrity triggers

From
Peter Eisentraut
Date:
On 02.12.21 01:17, Paul Martinez wrote:
> Regarding that specific example, in a production scenario, yes, the
> DELETE command should reference both columns. And if used for
> documentation both columns should be referenced for clarity/correctness.

Ok, thanks.  I have updated your example accordingly and included it in 
the patch, which I have now committed.