Thread: [PROPOSAL] ON DELETE SET NULL () for Foreign Key Constraints
Hello! I have a proposal for a feature to add to Postgres. I believe it is a natural extension to the current standard SQL ON DELETE SET NULL behavior when using composite foreign keys. The basic idea is that you can specify which columns to set to NULL in the DELETE trigger created by a foreign key constraint. (There could be similar support for UPDATE triggers and SET DEFAULT actions.) PROBLEM: Consider the following basic schema: CREATE TABLE users (id primary key); CREATE TABLE posts ( id primary key, content text, author_id int REFERENCES users(id) ON DELETE SET NULL ); When a user is deleted all of their posts will have the author_id field set to NULL. Consider the same schema in a multi-tenant application. In a multi-tenant application a tenant_id column will frequently be denormalized across every table. In certain cases these tables will actually use the tenant_id as part of a composite primary key. (This is the approach recommended by Citus.) So in a multi-tenant architecture, our schema may look like this: CREATE TABLE tenants (id primary key); CREATE TABLE users ( tenant_id int, id int, PRIMARY KEY (tenant_id, id) ); CREATE TABLE posts ( tenant_id int, id int, content text, author_id int, PRIMARY KEY (tenant_id, id), FOREIGN KEY (tenant_id, author_id) REFERENCES users(tenant_id, id) ON DELETE SET NULL ); In this situation we can no longer delete a user that has created a post because Postgres would try to set both the author_id AND the tenant_id columns of a corresponding post to NULL, and tenant_id is not nullable. INSERT INTO tenants VALUES (1); INSERT INTO users VALUES (1, 101); INSERT INTO posts VALUES (1, 201, 'content', 101); DELETE FROM users WHERE id = 101; ERROR: null value in column "tenant_id" violates not-null constraint DETAIL: Failing row contains (null, 201, content, null). GENERAL SOLUTION: My proposal is to allow specifying which columns to set null during the delete trigger: CREATE TABLE posts ( ..., FOREIGN KEY (tenant_id, author_id) REFERENCES users(tenant_id, id) ON DELETE SET NULL (author_id) ) If this feature is implemented the above DELETE statement would succeed, and the record in posts would have the values (1, 201, 'content', NULL). The grammar will be modified as follows: table_constraint is: FOREIGN KEY ( column_name [, ... ] ) REFERENCES reftable [ ( refcolumn [, ... ] ) ] [ MATCH FULL | MATCH PARTIAL | MATCH SIMPLE ] [ ON DELETE action ] [ ON UPDATE action ] } action is: NO ACTION | RESTRICT | CASCADE | SET DEFAULT [ ( defcolumn [, ...] ) ] | SET NULL [ ( nullcolumn [, ...] ) ] This modification to the grammar makes it easy to support similary functionality for UPDATE triggers and the SET DEFAULT action. The columns that we want to set to null must be a subset of the columns that make up the foreign key. From looking through the grammar it seems like this syntax will also be supported in column constraints, and later validation could ensure that only that constrained column appears in the column list. To store this new information, a new field will need to be added to pg_constraint to save which columns we're supposed to set null. Since a constraint could have separate ON DELETE and ON UPDATE behavior, we'd need two additional columns if both were supported. Name Type References Description conkey int2[] pg_attribute.attnum List of constrained columns confkey int2[] pg_attribute.attnum List of referenced columns condeletecols int2[] pg_attribute.attnum List of constrained columns to set on deletion of referenced record conupdatecols int2[] pg_attribute.attnum List of constrained columns to set on update of referenced record POSSIBLE GENERALIZATIONS/EXTENSIONS: The above solution solves the specific problem that I have been facing, but I see two clear possibilities for how this functionality can be extended. One, the set of columns that we set to null could be any set of columns in the table with the constraint, rather than just a subset of columns that make up the foreign key. This would be useful for columns that are only meaningful if the foreign key is present. Imagine a timestamp column that indicates when the foreign key was set and should be cleared when the foreign key is cleared. Or maybe some columns have been denormalized from the referenced table and those should be cleared. While this extension is reasonable, it does add another layer of complexity to the feature: what happens when one of those columns gets dropped from the table? Should the entire foreign key be dropped? Should it error? Or should that column simply be removed from the set of columns that are acted upon? If the set of columns can only be a subset of the foreign key columns, then this is already a solved problem, so that constraint will be enforced at least for the initial implementation. The second possible generalization is also having the same functionality for ON UPDATE triggers and/or SET DEFAULT triggers: ON DELETE SET NULL (col [, ...] ) ON DELETE SET DEFAULT (col [, ...] ) ON UPDATE SET NULL (col [, ...] ) ON UPDATE SET DEFAULT (col [, ...] ) While the potential use cases of the other three versions is not quite as clear, from an engineering standpoint it might make more sense to support both trigger types and both action types; supporting only one would likely result in messier code. Supporting both ON DELETE and ON UPDATE would require updating code in multiple places in ri_triggers.c, but arguably it's better to keep those functions in sync. IMPLEMENTATION PLAN: If the consensus is that this is a reasonable feature to implement, I'd love to work on it over the next few weeks. I've made one tiny contribution to Postgres, so I don't know everything that will need to be modified to complete this feature. I'm also not sure what the preferred granularity of patches is for something like this, but this is how I'm imagining breaking it down: Update parse node and catalog type: src/include/nodes/parsenodes.h - Extend Constraint struct src/backend/nodes/copyfuncs.c src/backend/nodes/equalfuncs.c src/include/catalog/pg_constraint.h # I'm not sure if I'd need to modify readfuncs.c? Extend Parser and validate columns: src/backend/parser/gram.y - Update key_action rule src/backend/commands/tablecmds.c - Validate selected columns in ATAddForeignKeyConstraint Update trigger definitions: src/backend/utils/adt/ri_triggers.c Other miscellaneous: src/backend/utils/adt/ruleutils.c - Update pg_get_constraint_def Documentation: doc/src/sgml/ref/create_table.sgml - CREATE TABLE documentation doc/src/sgml/catalogs.sgml - pg_constraint documentation Other things: I have no idea what I have to modify to support upgrading from an old version of Postgres and handle the changes to the pg_constraint. If someone can point me in the right direction that would be great. I'm sure there are also a bunch of other things that'll I'll have to change that I've missed. The above is the stuff I've found after digging through the source code for a couple hours. Please let me know what else I'll have to do! Let me know if this seems like a feature that would likely get merged. It's something I'd really like to build! I'd welcome any feedback about the proposal, and am happy to answer any questions. - Paul
Paul Martinez <hellopfm@gmail.com> writes: > I have a proposal for a feature to add to Postgres. I believe it is a natural > extension to the current standard SQL ON DELETE SET NULL behavior when using > composite foreign keys. The basic idea is that you can specify which columns to > set to NULL in the DELETE trigger created by a foreign key constraint. This seems like kind of a kluge, because it can only work in MATCH SIMPLE mode, not MATCH FULL or MATCH PARTIAL. (We don't have MATCH PARTIAL atm, but it's in the spec so I imagine somebody will get around to implementing it someday. Anyway MATCH FULL is there now.) In the latter two modes, setting a subset of the referencing columns to null isn't sufficient to make the row pass the constraint. regards, tom lane
On Sat, Jan 19, 2019 at 5:12 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Paul Martinez <hellopfm@gmail.com> writes: > > I have a proposal for a feature to add to Postgres. I believe it is a natural > > extension to the current standard SQL ON DELETE SET NULL behavior when using > > composite foreign keys. The basic idea is that you can specify which columns to > > set to NULL in the DELETE trigger created by a foreign key constraint. > > This seems like kind of a kluge, because it can only work in MATCH SIMPLE > mode, not MATCH FULL or MATCH PARTIAL. (We don't have MATCH PARTIAL atm, > but it's in the spec so I imagine somebody will get around to implementing > it someday. Anyway MATCH FULL is there now.) In the latter two modes, > setting a subset of the referencing columns to null isn't sufficient to > make the row pass the constraint. I don't see why this is an issue. Currently Postgres allows you to combine the foreign key features in non-sensical ways. For example, you can create a not nullable column that has ON DELETE SET NULL behavior: CREATE TABLE foo (a int PRIMARY KEY); CREATE TABLE bar (b int NOT NULL REFERENCES foo(a) ON DELETE SET NULL); INSERT INTO foo VALUES (1); INSERT INTO bar VALUES (1); DELETE FROM foo WHERE a = 1; ERROR: null value in column "b" violates not-null constraint The incongruence of MATCH FULL with this feature doesn't seem like a problem. We could raise an error when MATCH FULL and a proper subset of the constrained columns are supplied in the SET NULL clause if we were worried about users mis-using the feature, but we don't raise an error for the similar logic error above. I don't entirely understand the use-cases for MATCH PARTIAL, but it doesn't seem like combining MATCH PARTIAL with this feature would be blatantly wrong. This feature provides a real benefit when using MATCH SIMPLE, which is the default behavior. There are good reasons for using denormalizing tenant_ids in a multi-tenant application beyond just performance. It actually improves referential integrity because it makes mixing data between tenants impossible. Consider the following example: -- denormalized tenant_id, but simple primary keys CREATE TABLE tenants (id int PRIMARY KEY); CREATE TABLE users (tenant_id int REFERENCES tenants, id int PRIMARY KEY); CREATE TABLE messages ( tenant_id int REFERENCES tenants, id int PRIMARY KEY, from_id int REFERENCES users, to_id int REFERENCES users, content text ); -- Create three tenants INSERT INTO tenants VALUES (1), (2), (3); -- Create users in tenants 1 and 2 INSERT INTO users VALUES (1, 101), (2, 102); -- Create message in tenant 3 sent from user in tenant 1 to user in tenant 2 INSERT INTO messages VALUES (3, 201, 101, 102, 'poor referential integrity'); If you create the users and messages tables with composite primary keys the last query will fail: -- composite primary keys of the form (tenant_id, id) CREATE TABLE cpk_users ( tenant_id int REFERENCES tenants, id int, PRIMARY KEY (tenant_id, id) ); CREATE TABLE cpk_messages ( tenant_id int REFERENCES tenants, id int, from_id int, to_id int, content text, PRIMARY KEY (tenant_id, id), FOREIGN KEY (tenant_id, from_id) REFERENCES cpk_users, FOREIGN KEY (tenant_id, to_id) REFERENCES cpk_users ); -- Create cpk_users in tenants 1 and 2 INSERT INTO cpk_users VALUES (1, 101), (2, 102); -- Create cpk_message in tenant 3 sent from user in tenant 1 to user in tenant 2 INSERT INTO cpk_messages VALUES (3, 201, 101, 102, 'great referential integrity'); ERROR: insert or update on table "cpk_messages" violates foreign key constraint "cpk_messages_tenant_id_from_id_fkey" DETAIL: Key (tenant_id, from_id)=(3, 101) is not present in table "cpk_users". So there are strong reasons in favor of using composite primary keys. Postgres has great support for composite primary and foreign keys, but SET NULL behavior that would have worked fine in the schema using simple primary keys no longer works in the composite primary key schema. Users could manually implement triggers to get the desired behavior (as I have done in the use-case that led me to think of this feature), but it'd be great if switching to composite primary keys didn't force the user to make compromises elsewhere. - Paul On Sat, Jan 19, 2019 at 5:12 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Paul Martinez <hellopfm@gmail.com> writes: > > I have a proposal for a feature to add to Postgres. I believe it is a natural > > extension to the current standard SQL ON DELETE SET NULL behavior when using > > composite foreign keys. The basic idea is that you can specify which columns to > > set to NULL in the DELETE trigger created by a foreign key constraint. > > This seems like kind of a kluge, because it can only work in MATCH SIMPLE > mode, not MATCH FULL or MATCH PARTIAL. (We don't have MATCH PARTIAL atm, > but it's in the spec so I imagine somebody will get around to implementing > it someday. Anyway MATCH FULL is there now.) In the latter two modes, > setting a subset of the referencing columns to null isn't sufficient to > make the row pass the constraint. > > regards, tom lane