Proposed feature: Selective Foreign Keys - Mailing list pgsql-hackers

From Tom Dunstan
Subject Proposed feature: Selective Foreign Keys
Date
Msg-id 85A0355E-E76C-4B2D-8FB6-7353CA537A3E@tomd.cc
Whole thread Raw
Responses Re: Proposed feature: Selective Foreign Keys  (Albe Laurenz <laurenz.albe@wien.gv.at>)
Re: Proposed feature: Selective Foreign Keys  (Craig Ringer <craig@2ndquadrant.com>)
List pgsql-hackers
Hi all!

The Problem
-----------------
One case that traditional SQL doesn't handle very well is when you have a child entity which can be attached to a
numberof different parent entities. Examples might be comments, tags or file attachments - we might have 20 different
entitiesin the system that we would like our users to be able add comments to, but the existing solutions for mapping
thisall have downsides. 

Existing solution 1: Join tables ahoy
If I can have a list of comments on every other object in the system, and I want to have referrential integrity, then
theobvious thing to do is create a join table between each entity and the comments table. 
Pros:
 - Straight forward, traditional object-with-collection-of-child SQL structure
Cons:
 - If a parent object gets deleted here, we can't use foreign keys to delete e.g. a child comment, so we'll have to
eitherexplicitly do it as part of our delete logic or have a cleanup process to catch orphans. Or do a dance with
deletetriggers on the join tables deleting the comment. 
 - For n entities requiring comments in the system, we need n join tables.If we want both comments and e.g. tags and
likeson all of our entities, we now have 3n join tables for what should be some relatively self-contained on-the-side
data- this is could be more tables than the entire rest of the system 
 - It's difficult to create any kind of self-contained component for building applications in this scenario, as it will
needto know about every other entity in the system, or be able to poke around inside whatever ORM or data access system
thatyou have to work out what join tables it needs when running queries. 

Existing solution 2: Enter the matrix
Instead of having n join tables, let's just mash them all together, with a column per parent object, and a check
constraintto force exactly one of those columns to be set. 
Pros:
 - Less bloat in the number of tables
Cons:
 - Doesn't solve orphan problem
 - Addition of a new entity which needs comments and we now have to add another column onto it, potentially rewriting
thewhole thing 
 - Ugly

Existing solution 3: Embed the matrix
Kinda like the dependency matrix table, except that all the columns referencing potential parent objects we put into
thecomment table instead. 
Pros:
 - Everything contained in column table
 - No orphans, since cascaded deletes will now delete the actual comment
Cons:
 - Comment table now has references to every single type that it may be attached to
 - Addition of a new entity and we probably have to rewrite the comment table now

Existing solution 4: Abandon ye all referential integrity
Have a column indicating parent type and another one for the id. In the case of comments this would be directly on the
commenttable itself. In the case of something like tags that we might expect to be shared between entities, it would be
ina single join table.  
Pros:
 - Pretty self-contained
 - Data model which has neither lots of empty columns or lots of tables
 - Can make new entities "commentable" without rewriting anything
 - Because it's self-contained, can build application components that don't need to know much about the rest of your
system.For example this is the approach that the grails taggable and commentable plugins take. 
Cons:
 - No referential integrity, since we can't have a single column pointing to different tables with existing foreign key
infrastructure
 - Since there's no real db support for doing things this way, existing ORMs etc don't really know how use a single
columnto join against multiple different tables based on a discriminator or 'switch' column. 

Existing solution 5: Everything's a thing
Make your entity hierarchy have a base level object which can have comments attached, and then everything that you need
tobe "commentable" has to extend that. You can do that in an ORM, or with table inheritance in the database. 
Pros:
 - Single top-level thing to hang your data on
Cons:
 - You've polluted your object hierarchy just to hang some stuff off of the end of it rather than it being driven by
behaviours
 - You're going to be paying a performance penalty - everything that extends that base level object will now join
againstit incessantly, and you now have a global id sequence or whatever that you may not want. 

Basically none of the above handle the situation very well. The cleanest is solution 4, but lack of RI sucks.

Feature Proposal: Selective foreign keys.
-------------------------------------------------
Allow foreign keys to have where clauses. The above comment example using solution 4 might then look like then
following:

CREATE TABLE comment as (
  id bigserial primary key,
  content text not null,
  parent_entity regclass not null,
  parent_id int8
);
ALTER TABLE comment ADD CONSTRAINT comment_blog_fk FOREIGN KEY (parent_id) REFERENCES blog(id) WHERE (parent_entity =
‘blog');
ALTER TABLE comment ADD CONSTRAINT comment_event_fk FOREIGN KEY (parent_id) REFERENCES event(id) WHERE (parent_entity =
‘event');

At this point, the following things should work:
insert into blog(id, title, content) values (10, 'i hate mondays', 'so sad');
insert into event(id, title, location, date) values (20, 'my birthday party', 'local pub', now());
insert into comment(content, parent_entity, parent_id) values ('me too', 'blog', 10);
insert into comment(content, parent_entity, parent_id) values ('I love that pub', 'event', 20);

But the following would barf:
insert into comment(content, parent_entity, parent_id) values ('bad comment', 'blog', 20);
ERROR: insert or update on table "comment" violates foreign key constraint "comment_blog_fk"
  Detail: Key (parent_id)=(20) is not present in table "blog".:

Patch
-----
I have attached a proof-of-concept patch for this feature.
What it currently does:
 - Allows where clause on FK declaration
 - Validates and stores the expression into pg_constraint a la check constraints, using the same validity checks
 - Implements logic on insertion to FK table, and update / delete on PK tables as expected.
 - Regression tests handling all of the above

Things I know are still to be done, if there is consensus that this feature is worth having:
 - Handle update of columns affecting the selection expression. Currently the RI trigger doesn’t fire if the FK column
isn’tmodified. I’m not sure where the logic for this is implemented. 
 - pg_dump support
 - Update some comments in a few places referring to conbin/consrc columns of pg_constraint as relevant to check
constraintsonly 
 - Documentation
 - There are probably a few scenarios that I’m not testing in the regression tests that I should be.

Implementation Comments
----------------------------
I borrowed the exclusion constraint where clause in the grammar, which results in the where expression needing
parentheses.With a bit more work this could possibly be removed, but I’m not too upset about it tbh. 

Currently I’m creating a new executor for each call of RI_FKey_check when there is a where clause, and doing other bits
ofwork in there. It’s not 100% clear to me how expensive doing that stuff in there is, and whether it’s worth it or
possibleto cache some of it. 

Stuff added in tablecmds.c was copied from similar stuff elsewhere, but honestly without a huge amount of
understanding.Don’t know if I could do without some of it.  


Comments?


Cheers

Tom



PS Thanks to Noah Misch who answered a question on the list a while ago which got me over a hump, and to the authors of
thewiki pages around submitting a patch. 

Attachment

pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: doPickSplit stack buffer overflow in XLogInsert?
Next
From: Peter Geoghegan
Date:
Subject: Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE