Re: Allow foreign keys to reference a superset of unique columns - Mailing list pgsql-hackers

From Wolfgang Walther
Subject Re: Allow foreign keys to reference a superset of unique columns
Date
Msg-id 2ef8dbe1-4e2e-25af-218e-0b5efae8f649@technowledgy.de
Whole thread Raw
In response to Re: Allow foreign keys to reference a superset of unique columns  (James Coleman <jtc331@gmail.com>)
Responses Re: Allow foreign keys to reference a superset of unique columns
List pgsql-hackers
James Coleman:
> If I'm following properly this sounds like an overengineered EAV
> schema, and neither of those things inspires me to think "this is a
> use case I want to support".
> 
> That being said, I know that sometimes examples that have been
> abstracted enough to share aren't always the best, so perhaps there's
> something underlying this that's a more valuable example.

Most my use-cases are slightly denormalized and I was looking for an 
example that didn't require those kind of FKS only because of the 
denormalization. So that's why it might have been a bit artifical or 
abstracted too much.

Take another example: I deal with multi-tenancy systems, for which I 
want to use RLS to separate the data between tenants:

CREATE TABLE tenants (tenant INT PRIMARY KEY);

Each tenant can create multiple users and groups:

CREATE TABLE users (
   "user" INT PRIMARY KEY,
   tenant INT NOT NULL REFERENCES tenants
);

CREATE TABLLE groups (
   "group" INT PRIMARY KEY,
   tenant INT NOT NULL REFERENCES tenants
);

Users can be members of groups. The simple approach would be:

CREATE TABLE members (
   PRIMARY KEY ("user", "group"),
   "user" INT REFERENCES users,
   "group" INT REFERENCES groups
);

But this falls short in two aspects:
- To make RLS policies simpler to write and quicker to execute, I want 
to add "tenant" columns to **all** other tables. A slightly denormalized 
schema for efficiency.
- The schema above does not ensure that users can only be members in 
groups of the same tenant. Our business model requires to separate 
tenants cleanly, but as written above, cross-tenant memberships would be 
allowed.

In comes the "tenant" column which solves both of this:

CREATE TABLE members (
   PRIMARY KEY ("user", "group"),
   tenant INT REFERENCES tenants,
   "user" INT,
   "group" INT,
   FOREIGN KEY ("user", tenant) REFERENCES users ("user", tenant),
   FOREIGN KEY ("group", tenant) REFERENCES groups ("group", tenant)
);

This is not possible to do right now, without adding more UNIQUE 
constraints to the users and groups tables - on a superset of already 
unique columns.

>> bar.y is a little bit like a generated value in that sense, it should
>> always match foo.b. I think it would be great, if we could actually go a
>> step further, too: On an update to bar.x to a new value, if foo.a=bar.x
>> exists, I would like to set bar.y automatically to the new foo.b.
>> Otherwise those kind of updates always have to either query foo before,
>> or add a trigger to do the same.
> 
> Isn't this actually contradictory to the behavior you currently have
> with a multi-column foreign key? In the example above then an update
> to bar.x is going to update the rows in foo that match bar.x = foo.a
> and bar.y = foo.b *using the old values of bar.x and bar.y* to be the
> new values.

No, I think there was a misunderstanding. An update to bar should not 
update rows in foo. An update to bar.x should update bar.y implicitly, 
to match the new value of foo.b.

> You seem to be suggesting that instead it should look for
> other rows that already match the *new value* of only one of the
> columns in the constraint.

Yes. I think basically what I'm suggesting is, that for an FK to a 
superset of unique columns, all the FK-logic should still be done on the 
already unique set of columns only - and then the additional columns 
should be mirrored into the referencing table. The referencing table can 
then put additional constraints on this column. In the members example 
above, this additional constraint is the fact that the tenant column 
can't be filled with two different values for the users and groups FKs. 
But this could also be a CHECK constraint to allow FKs only to a subset 
of rows in the target table:

CREATE TYPE foo_type AS ENUM ('A', 'B', 'C');

CREATE TABLE foo (
   f INT PRIMARY KEY,
   type foo_type
);

CREATE TABLE bar (
   b INT PRIMARY KEY,
   f INT,
   ftype foo_type CHECK (ftype <> 'C'),
   FOREIGN KEY (f, ftype) REFERENCES foo (f, type);
);

In this example, the additional ftype column is just used to enforce 
that bar can only reference rows with type A or B, but not C. Assume:

INSERT INTO foo VALUES (1, 'A'), (2, 'B'), (3, 'C');

In this case, it would be nice to be able to do the following, i.e. 
derive the value for bar.ftype automatically:

INSERT INTO bar (b, f) VALUES (10, 1); -- bar.ftype is then 'A'
UPDATE bar SET f = 2 WHERE b = 10; -- bar.ftype is then 'B'

And it would throw errors in the following cases, because the 
automatically derived value fails the CHECK constraint:

INSERT INTO bar (b, f) VALUES (20, 3);
UPDATE bar SET f = 3 WHERE b = 10;

Note: This "automatically derived columns" extension would be a separate 
feature. Really nice to have, but the above mentioned FKs to supersets 
of unique columns would be very valuable without it already.

Best

Wolfgang



pgsql-hackers by date:

Previous
From: Julien Rouhaud
Date:
Subject: Re: Schema variables - new implementation for Postgres 15
Next
From: Wolfgang Walther
Date:
Subject: Re: has_privs_of_role vs. is_member_of_role, redux