Re: Allow foreign keys to reference a superset of unique columns - Mailing list pgsql-hackers

From Wolfgang Walther
Subject Re: Allow foreign keys to reference a superset of unique columns
Date
Msg-id 85a79730-5778-a7ed-5ca4-cadf6e45e44d@technowledgy.de
Whole thread Raw
In response to Re: Allow foreign keys to reference a superset of unique columns  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Allow foreign keys to reference a superset of unique columns
List pgsql-hackers
Kaiting Chen:
> I'd like to propose a change to PostgreSQL to allow the creation of a foreign
> key constraint referencing a superset of uniquely constrained columns.

+1

Tom Lane:
> TBH, I think this is a fundamentally bad idea and should be rejected
> outright.  It fuzzes the semantics of the FK relationship, and I'm
> not convinced that there are legitimate use-cases.  Your example
> schema could easily be dismissed as bad design that should be done
> some other way.

I had to add quite a few unique constraints on a superset of already 
uniquely constrained columns in the past, just to be able to support FKs 
to those columns. I think those cases most often come up when dealing 
with slightly denormalized schemas, e.g. for efficiency.

One other use-case I had recently, was along the followling lines, in 
abstract terms:

CREATE TABLE classes (class INT PRIMARY KEY, ...);

CREATE TABLE instances (
   instance INT PRIMARY KEY,
   class INT REFERENCES classes,
   ...
);

Think about classes and instances as in OOP. So the table classes 
contains some definitions for different types of object and the table 
instances realizes them into concrete objects.

Now, assume you have some property of a class than is best modeled as a 
table like this:

CREATE TABLE classes_prop (
   property INT PRIMARY KEY,
   class INT REFERNECES classes,
   ...
);

Now, assume you need to store data for each of those classes_prop rows 
for each instance. You'd do the following:

CREATE TABLE instances_prop (
   instance INT REFERENCES instances,
   property INT REFERENCES classes_prop,
   ...
);

However, this does not ensure that the instance and the property you're 
referencing in instances_prop are actually from the same class, so you 
add a class column:

CREATE TABLE instances_prop (
   instance INT,
   class INT,
   property INT,
   FOREIGN KEY (instance, class) REFERENCES instances,
   FOREIGN KEY (property, class) REFERENCES classes_prop,
   ...
);

But this won't work, without creating some UNIQUE constraints on those 
supersets of the PK column first.

> For one example of where the semantics get fuzzy, it's not
> very clear how the extra-baggage columns ought to participate in
> CASCADE updates.  Currently, if we have
>     CREATE TABLE foo (a integer PRIMARY KEY, b integer);
> then an update that changes only foo.b doesn't need to update
> referencing tables, and I think we even have optimizations that
> assume that if no unique-key columns are touched then RI checks
> need not be made.  But if you did
>     CREATE TABLE bar (x integer, y integer,
>                       FOREIGN KEY (x, y) REFERENCES foo(a, b) ON UPDATE CASCADE);
> then perhaps you expect bar.y to be updated ... or maybe you don't?

In all use-cases I had so far, I would expect bar.y to be updated, too.

I think it would not even be possible to NOT update bar.y, because the 
FK would then not match anymore. foo.a is the PK, so the value in bar.x 
already forces bar.y to be the same as foo.b at all times.

bar.y is a little bit like a generated value in that sense, it should 
always match foo.b. I think it would be great, if we could actually go a 
step further, too: On an update to bar.x to a new value, if foo.a=bar.x 
exists, I would like to set bar.y automatically to the new foo.b. 
Otherwise those kind of updates always have to either query foo before, 
or add a trigger to do the same.

In the classes/instances example above, when updating 
instances_prop.property to a new value, instances_prop.class would be 
updated automatically to match classes_prop.class. This would fail, when 
the class is different than the class required by the FK to instances, 
though, providing exactly the safe-guard that this constraint was 
supposed to provide, without incurring additional overhead in update 
statements.

In the foo/bar example above, which is just a bit of denormalization, 
this automatic update would also be helpful - because rejecting the 
update on the grounds that the columns don't match doesn't make sense here.

> Another example is that I think the idea is only well-defined when
> the subset column(s) are a primary key, or at least all marked NOT NULL.
> Otherwise they're not as unique as you're claiming.

I fail to see why. My understanding is that rows with NULL values in the 
referenced table can't participate in FK matches anyway, because both 
MATCH SIMPLE and MATCH FULL wouldn't require a match when any/all of the 
columns in the referencing table are NULL. MATCH PARTIAL is not 
implemented, so I can't tell whether the semantics would be different there.

I'm not sure whether a FK on a superset of unique columns would be 
useful with MATCH SIMPLE. Maybe it could be forced to be MATCH FULL, if 
MATCH SIMPLE is indeed not well-defined.

> It's also unclear to me how this ought to interact with the
> information_schema views concerning foreign keys.  We generally
> feel that we don't want to present any non-SQL-compatible data
> in information_schema, for fear that it will confuse applications
> that expect to see SQL-spec behavior there.  So do we leave such
> FKs out of the views altogether, or show only the columns involving
> the associated unique constraint?  Neither answer seems pleasant.

Instead of tweaking FKs, maybe it would be possible to define a UNIQUE 
constraint re-using an existing index that guarantees uniqueness on a 
subset of columns already? This would allow to create those FK 
relationships by creating another unique constraint - without the 
overhead of creating yet another index.

So roughly something like this:

ALTER TABLE foo ADD UNIQUE (a, b) USING INDEX foo_pk;

This should give a consistent output for information_schema views?

Best

Wolfgang



pgsql-hackers by date:

Previous
From: Dong Wook Lee
Date:
Subject: Re: add test: pg_rowlocks extension
Next
From: David Rowley
Date:
Subject: Clarify restriction on partitioned tables primary key / unique indexes