Re: Allow foreign keys to reference a superset of unique columns - Mailing list pgsql-hackers

From James Coleman
Subject Re: Allow foreign keys to reference a superset of unique columns
Date
Msg-id CAAaqYe_55S-suxsG=mHFKZAppAUXDM4o9GF7MHFriPUBLOPcGA@mail.gmail.com
Whole thread Raw
In response to Re: Allow foreign keys to reference a superset of unique columns  (Wolfgang Walther <walther@technowledgy.de>)
Responses Re: Allow foreign keys to reference a superset of unique columns
List pgsql-hackers
On Sun, Sep 25, 2022 at 4:49 AM Wolfgang Walther
<walther@technowledgy.de> wrote:
>
> James Coleman:
> > If I'm following properly this sounds like an overengineered EAV
> > schema, and neither of those things inspires me to think "this is a
> > use case I want to support".
> >
> > That being said, I know that sometimes examples that have been
> > abstracted enough to share aren't always the best, so perhaps there's
> > something underlying this that's a more valuable example.
>
> Most my use-cases are slightly denormalized and I was looking for an
> example that didn't require those kind of FKS only because of the
> denormalization. So that's why it might have been a bit artifical or
> abstracted too much.
>
> Take another example: I deal with multi-tenancy systems, for which I
> want to use RLS to separate the data between tenants:
>
> CREATE TABLE tenants (tenant INT PRIMARY KEY);
>
> Each tenant can create multiple users and groups:
>
> CREATE TABLE users (
>    "user" INT PRIMARY KEY,
>    tenant INT NOT NULL REFERENCES tenants
> );
>
> CREATE TABLLE groups (
>    "group" INT PRIMARY KEY,
>    tenant INT NOT NULL REFERENCES tenants
> );
>
> Users can be members of groups. The simple approach would be:
>
> CREATE TABLE members (
>    PRIMARY KEY ("user", "group"),
>    "user" INT REFERENCES users,
>    "group" INT REFERENCES groups
> );
>
> But this falls short in two aspects:
> - To make RLS policies simpler to write and quicker to execute, I want
> to add "tenant" columns to **all** other tables. A slightly denormalized
> schema for efficiency.
> - The schema above does not ensure that users can only be members in
> groups of the same tenant. Our business model requires to separate
> tenants cleanly, but as written above, cross-tenant memberships would be
> allowed.
>
> In comes the "tenant" column which solves both of this:
>
> CREATE TABLE members (
>    PRIMARY KEY ("user", "group"),
>    tenant INT REFERENCES tenants,
>    "user" INT,
>    "group" INT,
>    FOREIGN KEY ("user", tenant) REFERENCES users ("user", tenant),
>    FOREIGN KEY ("group", tenant) REFERENCES groups ("group", tenant)
> );
>
> This is not possible to do right now, without adding more UNIQUE
> constraints to the users and groups tables - on a superset of already
> unique columns.

Thanks, that's a more interesting use case IMO (and doesn't smell in
the way the other did).

> >> bar.y is a little bit like a generated value in that sense, it should
> >> always match foo.b. I think it would be great, if we could actually go a
> >> step further, too: On an update to bar.x to a new value, if foo.a=bar.x
> >> exists, I would like to set bar.y automatically to the new foo.b.
> >> Otherwise those kind of updates always have to either query foo before,
> >> or add a trigger to do the same.
> >
> > Isn't this actually contradictory to the behavior you currently have
> > with a multi-column foreign key? In the example above then an update
> > to bar.x is going to update the rows in foo that match bar.x = foo.a
> > and bar.y = foo.b *using the old values of bar.x and bar.y* to be the
> > new values.
>
> No, I think there was a misunderstanding. An update to bar should not
> update rows in foo. An update to bar.x should update bar.y implicitly,
> to match the new value of foo.b.
>
> > You seem to be suggesting that instead it should look for
> > other rows that already match the *new value* of only one of the
> > columns in the constraint.
>
> Yes. I think basically what I'm suggesting is, that for an FK to a
> superset of unique columns, all the FK-logic should still be done on the
> already unique set of columns only - and then the additional columns
> should be mirrored into the referencing table. The referencing table can
> then put additional constraints on this column. In the members example
> above, this additional constraint is the fact that the tenant column
> can't be filled with two different values for the users and groups FKs.

If we have a declared constraint on x,y where x is unique based on an
index including on x I do not think we should have that fk constraint
work differently than a constraint on x,y where there is a unique
index on x,y. That would seem to be incredibly confusing behavior
(even if it would be useful for some specific use case).

> But this could also be a CHECK constraint to allow FKs only to a subset
> of rows in the target table:

Are you suggesting a check constraint that queries another table?
Because check constraints are not supposed to do that (I assume this
is technically possible to declare via a function, just like it is
technically possible to do in a functional index, but like in the
index case it's a bad idea since it's not actually guaranteed).

> CREATE TYPE foo_type AS ENUM ('A', 'B', 'C');
>
> CREATE TABLE foo (
>    f INT PRIMARY KEY,
>    type foo_type
> );
>
> CREATE TABLE bar (
>    b INT PRIMARY KEY,
>    f INT,
>    ftype foo_type CHECK (ftype <> 'C'),
>    FOREIGN KEY (f, ftype) REFERENCES foo (f, type);
> );
>
> In this example, the additional ftype column is just used to enforce
> that bar can only reference rows with type A or B, but not C. Assume:
>
> INSERT INTO foo VALUES (1, 'A'), (2, 'B'), (3, 'C');
>
> In this case, it would be nice to be able to do the following, i.e.
> derive the value for bar.ftype automatically:
>
> INSERT INTO bar (b, f) VALUES (10, 1); -- bar.ftype is then 'A'
> UPDATE bar SET f = 2 WHERE b = 10; -- bar.ftype is then 'B'
>
> And it would throw errors in the following cases, because the
> automatically derived value fails the CHECK constraint:
>
> INSERT INTO bar (b, f) VALUES (20, 3);
> UPDATE bar SET f = 3 WHERE b = 10;
>
> Note: This "automatically derived columns" extension would be a separate
> feature. Really nice to have, but the above mentioned FKs to supersets
> of unique columns would be very valuable without it already.

This "derive the value automatically" is not what foreign key
constraints do right now at all, right? And if fact it's contradictory
to existing behavior, no?

This part just seems like a very bad idea. Unless I'm misunderstanding
I think we should reject this part of the proposals on this thread as
something we would not even consider implementing.

James Coleman



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Add ON CONFLICT DO RETURN clause
Next
From: Andres Freund
Date:
Subject: Re: [RFC] building postgres with meson - v13