[PROPOSAL] ON DELETE SET NULL () for Foreign Key Constraints - Mailing list pgsql-hackers
From | Paul Martinez |
---|---|
Subject | [PROPOSAL] ON DELETE SET NULL ( |
Date | |
Msg-id | CAF+2_SGRXQOtumethpuXhsyU+4AYzfKA5fhHCjCjH+jQ04WWjA@mail.gmail.com Whole thread Raw |
Responses |
Re: [PROPOSAL] ON DELETE SET NULL ( |
List | pgsql-hackers |
Hello! I have a proposal for a feature to add to Postgres. I believe it is a natural extension to the current standard SQL ON DELETE SET NULL behavior when using composite foreign keys. The basic idea is that you can specify which columns to set to NULL in the DELETE trigger created by a foreign key constraint. (There could be similar support for UPDATE triggers and SET DEFAULT actions.) PROBLEM: Consider the following basic schema: CREATE TABLE users (id primary key); CREATE TABLE posts ( id primary key, content text, author_id int REFERENCES users(id) ON DELETE SET NULL ); When a user is deleted all of their posts will have the author_id field set to NULL. Consider the same schema in a multi-tenant application. In a multi-tenant application a tenant_id column will frequently be denormalized across every table. In certain cases these tables will actually use the tenant_id as part of a composite primary key. (This is the approach recommended by Citus.) So in a multi-tenant architecture, our schema may look like this: CREATE TABLE tenants (id primary key); CREATE TABLE users ( tenant_id int, id int, PRIMARY KEY (tenant_id, id) ); CREATE TABLE posts ( tenant_id int, id int, content text, author_id int, PRIMARY KEY (tenant_id, id), FOREIGN KEY (tenant_id, author_id) REFERENCES users(tenant_id, id) ON DELETE SET NULL ); In this situation we can no longer delete a user that has created a post because Postgres would try to set both the author_id AND the tenant_id columns of a corresponding post to NULL, and tenant_id is not nullable. INSERT INTO tenants VALUES (1); INSERT INTO users VALUES (1, 101); INSERT INTO posts VALUES (1, 201, 'content', 101); DELETE FROM users WHERE id = 101; ERROR: null value in column "tenant_id" violates not-null constraint DETAIL: Failing row contains (null, 201, content, null). GENERAL SOLUTION: My proposal is to allow specifying which columns to set null during the delete trigger: CREATE TABLE posts ( ..., FOREIGN KEY (tenant_id, author_id) REFERENCES users(tenant_id, id) ON DELETE SET NULL (author_id) ) If this feature is implemented the above DELETE statement would succeed, and the record in posts would have the values (1, 201, 'content', NULL). The grammar will be modified as follows: table_constraint is: FOREIGN KEY ( column_name [, ... ] ) REFERENCES reftable [ ( refcolumn [, ... ] ) ] [ MATCH FULL | MATCH PARTIAL | MATCH SIMPLE ] [ ON DELETE action ] [ ON UPDATE action ] } action is: NO ACTION | RESTRICT | CASCADE | SET DEFAULT [ ( defcolumn [, ...] ) ] | SET NULL [ ( nullcolumn [, ...] ) ] This modification to the grammar makes it easy to support similary functionality for UPDATE triggers and the SET DEFAULT action. The columns that we want to set to null must be a subset of the columns that make up the foreign key. From looking through the grammar it seems like this syntax will also be supported in column constraints, and later validation could ensure that only that constrained column appears in the column list. To store this new information, a new field will need to be added to pg_constraint to save which columns we're supposed to set null. Since a constraint could have separate ON DELETE and ON UPDATE behavior, we'd need two additional columns if both were supported. Name Type References Description conkey int2[] pg_attribute.attnum List of constrained columns confkey int2[] pg_attribute.attnum List of referenced columns condeletecols int2[] pg_attribute.attnum List of constrained columns to set on deletion of referenced record conupdatecols int2[] pg_attribute.attnum List of constrained columns to set on update of referenced record POSSIBLE GENERALIZATIONS/EXTENSIONS: The above solution solves the specific problem that I have been facing, but I see two clear possibilities for how this functionality can be extended. One, the set of columns that we set to null could be any set of columns in the table with the constraint, rather than just a subset of columns that make up the foreign key. This would be useful for columns that are only meaningful if the foreign key is present. Imagine a timestamp column that indicates when the foreign key was set and should be cleared when the foreign key is cleared. Or maybe some columns have been denormalized from the referenced table and those should be cleared. While this extension is reasonable, it does add another layer of complexity to the feature: what happens when one of those columns gets dropped from the table? Should the entire foreign key be dropped? Should it error? Or should that column simply be removed from the set of columns that are acted upon? If the set of columns can only be a subset of the foreign key columns, then this is already a solved problem, so that constraint will be enforced at least for the initial implementation. The second possible generalization is also having the same functionality for ON UPDATE triggers and/or SET DEFAULT triggers: ON DELETE SET NULL (col [, ...] ) ON DELETE SET DEFAULT (col [, ...] ) ON UPDATE SET NULL (col [, ...] ) ON UPDATE SET DEFAULT (col [, ...] ) While the potential use cases of the other three versions is not quite as clear, from an engineering standpoint it might make more sense to support both trigger types and both action types; supporting only one would likely result in messier code. Supporting both ON DELETE and ON UPDATE would require updating code in multiple places in ri_triggers.c, but arguably it's better to keep those functions in sync. IMPLEMENTATION PLAN: If the consensus is that this is a reasonable feature to implement, I'd love to work on it over the next few weeks. I've made one tiny contribution to Postgres, so I don't know everything that will need to be modified to complete this feature. I'm also not sure what the preferred granularity of patches is for something like this, but this is how I'm imagining breaking it down: Update parse node and catalog type: src/include/nodes/parsenodes.h - Extend Constraint struct src/backend/nodes/copyfuncs.c src/backend/nodes/equalfuncs.c src/include/catalog/pg_constraint.h # I'm not sure if I'd need to modify readfuncs.c? Extend Parser and validate columns: src/backend/parser/gram.y - Update key_action rule src/backend/commands/tablecmds.c - Validate selected columns in ATAddForeignKeyConstraint Update trigger definitions: src/backend/utils/adt/ri_triggers.c Other miscellaneous: src/backend/utils/adt/ruleutils.c - Update pg_get_constraint_def Documentation: doc/src/sgml/ref/create_table.sgml - CREATE TABLE documentation doc/src/sgml/catalogs.sgml - pg_constraint documentation Other things: I have no idea what I have to modify to support upgrading from an old version of Postgres and handle the changes to the pg_constraint. If someone can point me in the right direction that would be great. I'm sure there are also a bunch of other things that'll I'll have to change that I've missed. The above is the stuff I've found after digging through the source code for a couple hours. Please let me know what else I'll have to do! Let me know if this seems like a feature that would likely get merged. It's something I'd really like to build! I'd welcome any feedback about the proposal, and am happy to answer any questions. - Paul
pgsql-hackers by date: