Re: Allow foreign keys to reference a superset of unique columns - Mailing list pgsql-hackers

From James Coleman
Subject Re: Allow foreign keys to reference a superset of unique columns
Date
Msg-id CAAaqYe_-rrCq-VOsvrOuTEVayubezbKEMsJTRUOBbqkCpmJ53Q@mail.gmail.com
Whole thread Raw
In response to Re: Allow foreign keys to reference a superset of unique columns  (Wolfgang Walther <walther@technowledgy.de>)
Responses Re: Allow foreign keys to reference a superset of unique columns
List pgsql-hackers
On Fri, Sep 2, 2022 at 5:42 AM Wolfgang Walther <walther@technowledgy.de> wrote:
>
> Kaiting Chen:
> > I'd like to propose a change to PostgreSQL to allow the creation of a foreign
> > key constraint referencing a superset of uniquely constrained columns.
>
> +1
>
> Tom Lane:
> > TBH, I think this is a fundamentally bad idea and should be rejected
> > outright.  It fuzzes the semantics of the FK relationship, and I'm
> > not convinced that there are legitimate use-cases.  Your example
> > schema could easily be dismissed as bad design that should be done
> > some other way.
>
> I had to add quite a few unique constraints on a superset of already
> uniquely constrained columns in the past, just to be able to support FKs
> to those columns. I think those cases most often come up when dealing
> with slightly denormalized schemas, e.g. for efficiency.
>
> One other use-case I had recently, was along the followling lines, in
> abstract terms:
>
> CREATE TABLE classes (class INT PRIMARY KEY, ...);
>
> CREATE TABLE instances (
>    instance INT PRIMARY KEY,
>    class INT REFERENCES classes,
>    ...
> );
>
> Think about classes and instances as in OOP. So the table classes
> contains some definitions for different types of object and the table
> instances realizes them into concrete objects.
>
> Now, assume you have some property of a class than is best modeled as a
> table like this:
>
> CREATE TABLE classes_prop (
>    property INT PRIMARY KEY,
>    class INT REFERNECES classes,
>    ...
> );
>
> Now, assume you need to store data for each of those classes_prop rows
> for each instance. You'd do the following:
>
> CREATE TABLE instances_prop (
>    instance INT REFERENCES instances,
>    property INT REFERENCES classes_prop,
>    ...
> );
>
> However, this does not ensure that the instance and the property you're
> referencing in instances_prop are actually from the same class, so you
> add a class column:
>
> CREATE TABLE instances_prop (
>    instance INT,
>    class INT,
>    property INT,
>    FOREIGN KEY (instance, class) REFERENCES instances,
>    FOREIGN KEY (property, class) REFERENCES classes_prop,
>    ...
> );
>
> But this won't work, without creating some UNIQUE constraints on those
> supersets of the PK column first.

If I'm following properly this sounds like an overengineered EAV
schema, and neither of those things inspires me to think "this is a
use case I want to support".

That being said, I know that sometimes examples that have been
abstracted enough to share aren't always the best, so perhaps there's
something underlying this that's a more valuable example.

> > For one example of where the semantics get fuzzy, it's not
> > very clear how the extra-baggage columns ought to participate in
> > CASCADE updates.  Currently, if we have
> >     CREATE TABLE foo (a integer PRIMARY KEY, b integer);
> > then an update that changes only foo.b doesn't need to update
> > referencing tables, and I think we even have optimizations that
> > assume that if no unique-key columns are touched then RI checks
> > need not be made.  But if you did
> >     CREATE TABLE bar (x integer, y integer,
> >                       FOREIGN KEY (x, y) REFERENCES foo(a, b) ON UPDATE CASCADE);
> > then perhaps you expect bar.y to be updated ... or maybe you don't?
>
> In all use-cases I had so far, I would expect bar.y to be updated, too.
>
> I think it would not even be possible to NOT update bar.y, because the
> FK would then not match anymore. foo.a is the PK, so the value in bar.x
> already forces bar.y to be the same as foo.b at all times.
>
> bar.y is a little bit like a generated value in that sense, it should
> always match foo.b. I think it would be great, if we could actually go a
> step further, too: On an update to bar.x to a new value, if foo.a=bar.x
> exists, I would like to set bar.y automatically to the new foo.b.
> Otherwise those kind of updates always have to either query foo before,
> or add a trigger to do the same.

Isn't this actually contradictory to the behavior you currently have
with a multi-column foreign key? In the example above then an update
to bar.x is going to update the rows in foo that match bar.x = foo.a
and bar.y = foo.b *using the old values of bar.x and bar.y* to be the
new values. You seem to be suggesting that instead it should look for
other rows that already match the *new value* of only one of the
columns in the constraint. If I'm understanding the example correctly,
that seems like a *very* bad idea.

James Coleman



pgsql-hackers by date:

Previous
From: James Coleman
Date:
Subject: Re: Consider parallel for lateral subqueries with limit
Next
From: Thomas Munro
Date:
Subject: Re: WIP: WAL prefetch (another approach)