[PROPOSAL] ON DELETE SET NULL () for Foreign Key Constraints - Mailing list pgsql-hackers

From Paul Martinez
Subject [PROPOSAL] ON DELETE SET NULL () for Foreign Key Constraints
Date
Msg-id CAF+2_SGRXQOtumethpuXhsyU+4AYzfKA5fhHCjCjH+jQ04WWjA@mail.gmail.com
Whole thread Raw
Responses Re: [PROPOSAL] ON DELETE SET NULL () for Foreign Key Constraints
List pgsql-hackers
Hello!

I have a proposal for a feature to add to Postgres. I believe it is a natural
extension to the current standard SQL ON DELETE SET NULL behavior when using
composite foreign keys. The basic idea is that you can specify which columns to
set to NULL in the DELETE trigger created by a foreign key constraint. (There
could be similar support for UPDATE triggers and SET DEFAULT actions.)


PROBLEM:

Consider the following basic schema:

CREATE TABLE users (id primary key);
CREATE TABLE posts (
    id primary key,
    content text,
    author_id int REFERENCES users(id) ON DELETE SET NULL
);

When a user is deleted all of their posts will have the author_id field set to
NULL.

Consider the same schema in a multi-tenant application. In a multi-tenant
application a tenant_id column will frequently be denormalized across every
table. In certain cases these tables will actually use the tenant_id as part of
a composite primary key. (This is the approach recommended by Citus.)

So in a multi-tenant architecture, our schema may look like this:

CREATE TABLE tenants (id primary key);
CREATE TABLE users (
    tenant_id int,
    id int,
    PRIMARY KEY (tenant_id, id)
);
CREATE TABLE posts (
    tenant_id int,
    id int,
    content text,
    author_id int,
    PRIMARY KEY (tenant_id, id),
    FOREIGN KEY (tenant_id, author_id)
        REFERENCES users(tenant_id, id) ON DELETE SET NULL
);

In this situation we can no longer delete a user that has created a post
because Postgres would try to set both the author_id AND the tenant_id columns
of a corresponding post to NULL, and tenant_id is not nullable.

INSERT INTO tenants VALUES (1);
INSERT INTO users VALUES (1, 101);
INSERT INTO posts VALUES (1, 201, 'content', 101);
DELETE FROM users WHERE id = 101;
ERROR:  null value in column "tenant_id" violates not-null constraint
DETAIL:  Failing row contains (null, 201, content, null).


GENERAL SOLUTION:

My proposal is to allow specifying which columns to set null during the delete
trigger:

CREATE TABLE posts (
    ...,
    FOREIGN KEY (tenant_id, author_id)
        REFERENCES users(tenant_id, id) ON DELETE SET NULL (author_id)
)

If this feature is implemented the above DELETE statement would succeed, and
the record in posts would have the values (1, 201, 'content', NULL).

The grammar will be modified as follows:

table_constraint is:

  FOREIGN KEY ( column_name [, ... ] ) REFERENCES reftable [ (
refcolumn [, ... ] ) ]
      [ MATCH FULL | MATCH PARTIAL | MATCH SIMPLE ] [ ON DELETE action
] [ ON UPDATE action ] }

action is:

  NO ACTION |
  RESTRICT |
  CASCADE |
  SET DEFAULT [ ( defcolumn [, ...] ) ] |
  SET NULL [ ( nullcolumn [, ...] ) ]

This modification to the grammar makes it easy to support similary
functionality for UPDATE triggers and the SET DEFAULT action.

The columns that we want to set to null must be a subset of the columns that
make up the foreign key. From looking through the grammar it seems like this
syntax will also be supported in column constraints, and later validation could
ensure that only that constrained column appears in the column list.

To store this new information, a new field will need to be added to
pg_constraint to save which columns we're supposed to set null. Since a
constraint could have separate ON DELETE and ON UPDATE behavior, we'd need two
additional columns if both were supported.

Name            Type    References               Description
conkey          int2[]    pg_attribute.attnum   List of constrained columns
confkey         int2[]    pg_attribute.attnum   List of referenced columns
condeletecols   int2[]    pg_attribute.attnum   List of constrained
columns to set
                                                on deletion of referenced record
conupdatecols   int2[]    pg_attribute.attnum   List of constrained
columns to set
                                                on update of referenced record


POSSIBLE GENERALIZATIONS/EXTENSIONS:

The above solution solves the specific problem that I have been facing, but I
see two clear possibilities for how this functionality can be extended.

One, the set of columns that we set to null could be any set of columns in the
table with the constraint, rather than just a subset of columns that make up
the foreign key. This would be useful for columns that are only meaningful if
the foreign key is present. Imagine a timestamp column that indicates when the
foreign key was set and should be cleared when the foreign key is cleared. Or
maybe some columns have been denormalized from the referenced table and those
should be cleared.

While this extension is reasonable, it does add another layer of complexity to
the feature: what happens when one of those columns gets dropped from the
table? Should the entire foreign key be dropped? Should it error? Or should
that column simply be removed from the set of columns that are acted upon? If
the set of columns can only be a subset of the foreign key columns, then this
is already a solved problem, so that constraint will be enforced at least
for the initial implementation.

The second possible generalization is also having the same functionality for
ON UPDATE triggers and/or SET DEFAULT triggers:

ON DELETE SET NULL    (col [, ...] )
ON DELETE SET DEFAULT (col [, ...] )
ON UPDATE SET NULL    (col [, ...] )
ON UPDATE SET DEFAULT (col [, ...] )

While the potential use cases of the other three versions is not quite as
clear, from an engineering standpoint it might make more sense to support both
trigger types and both action types; supporting only one would likely result in
messier code.

Supporting both ON DELETE and ON UPDATE would require updating code in multiple
places in ri_triggers.c, but arguably it's better to keep those functions in
sync.



IMPLEMENTATION PLAN:

If the consensus is that this is a reasonable feature to implement, I'd love
to work on it over the next few weeks. I've made one tiny contribution to
Postgres, so I don't know everything that will need to be modified to complete
this feature. I'm also not sure what the preferred granularity of patches is
for something like this, but this is how I'm imagining breaking it down:

Update parse node and catalog type:

src/include/nodes/parsenodes.h
  - Extend Constraint struct
src/backend/nodes/copyfuncs.c
src/backend/nodes/equalfuncs.c
src/include/catalog/pg_constraint.h
# I'm not sure if I'd need to modify readfuncs.c?

Extend Parser and validate columns:

src/backend/parser/gram.y
  - Update key_action rule
src/backend/commands/tablecmds.c
  - Validate selected columns in ATAddForeignKeyConstraint

Update trigger definitions:

src/backend/utils/adt/ri_triggers.c


Other miscellaneous:

src/backend/utils/adt/ruleutils.c
  - Update pg_get_constraint_def

Documentation:

doc/src/sgml/ref/create_table.sgml
  - CREATE TABLE documentation
doc/src/sgml/catalogs.sgml
  - pg_constraint documentation

Other things:

I have no idea what I have to modify to support upgrading from an old version
of Postgres and handle the changes to the pg_constraint. If someone can point
me in the right direction that would be great.

I'm sure there are also a bunch of other things that'll I'll have to change
that I've missed. The above is the stuff I've found after digging through the
source code for a couple hours. Please let me know what else I'll have to do!



Let me know if this seems like a feature that would likely get merged. It's
something I'd really like to build! I'd welcome any feedback about the proposal,
and am happy to answer any questions.

- Paul


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Thread-unsafe coding in ecpg
Next
From: Andrew Gierth
Date:
Subject: Re: Changing SQL Inlining Behaviour (or...?)