Thread: Defining a foreign key with a duplicate column is broken

Defining a foreign key with a duplicate column is broken

From
David Rowley
Date:
In master I've been testing some new code that I'm working on around foreign keys. 
I wasn't quite sure if it was possible to include the same column twice in a foreign key, so I tested....

create table r1 (a int);
create table r2 (b int);
create unique index on r2(b,b);
alter table r1 add constraint r2_b_fkey foreign key (a,a) references r2 (b,b);
ERROR:  cache lookup failed for opclass 0

Which is not quite any of the messages that I would have expected

I've tracked this down to a small bug in transformFkeyCheckAttrs() where opclasses[1] won't get set because if (attnums[0] == indexStruct->indkey.values[1]) will match then break out of the inner loop. Instead opclasses[0] gets set twice.

The attached seems to fix the problem, but the whole thing makes me wonder if this is even meant to be allowed? I was thinking that this might be a good time to disallow this altogether, since it's already broken and looks like it has been for about 11 years

Disallowing it would simplify some code in my semi/anti join removal patch that I posted here http://www.postgresql.org/message-id/CAApHDvpCBEfuc5tD=vniepAv0pU5m=q=fOQZcOdMHeei7OQPgQ@mail.gmail.com

Any thoughts?

Regards

David Rowley
Attachment

Re: Defining a foreign key with a duplicate column is broken

From
Tom Lane
Date:
David Rowley <dgrowleyml@gmail.com> writes:
> I wasn't quite sure if it was possible to include the same column twice in
> a foreign key, so I tested....

> create table r1 (a int);
> create table r2 (b int);
> create unique index on r2(b,b);
> alter table r1 add constraint r2_b_fkey foreign key (a,a) references r2
> (b,b);
> ERROR:  cache lookup failed for opclass 0

Ouch.

> The attached seems to fix the problem, but the whole thing makes me wonder
> if this is even meant to be allowed? I was thinking that this might be a
> good time to disallow this altogether, since it's already broken and looks
> like it has been for about 11 years

We've gone out of our way in the past to allow duplicate index columns
(eg commit cfc5008a51f4), so I'm not sure why we'd not allow such indexes
to be used as foreign key references.  The example you posted above does
look pretty pointless, but something like this is perhaps less so:

regression=# create table r1 (a int, c int);
CREATE TABLE
regression=# create table r2 (b int);
CREATE TABLE
regression=# create unique index on r2(b,b);
CREATE INDEX
regression=# alter table r1 add constraint r2_b_fkey foreign key (a,c) references r2
(b,b);
ERROR:  cache lookup failed for opclass 0

especially when using nondefault FK match rules.
        regards, tom lane



Re: Defining a foreign key with a duplicate column is broken

From
Tom Lane
Date:
I wrote:
> David Rowley <dgrowleyml@gmail.com> writes:
>> The attached seems to fix the problem, but the whole thing makes me wonder
>> if this is even meant to be allowed? I was thinking that this might be a
>> good time to disallow this altogether, since it's already broken and looks
>> like it has been for about 11 years

> We've gone out of our way in the past to allow duplicate index columns
> (eg commit cfc5008a51f4), so I'm not sure why we'd not allow such indexes
> to be used as foreign key references.

I poked at this some more and changed my mind.  Per SQL standard, the
referent of a foreign key has to match some unique or pkey constraint,
and *we do not allow duplicate columns in constraint syntax*:

regression=# create temp table pp (f1 int, unique(f1,f1));
ERROR:  column "f1" appears twice in unique constraint
LINE 1: create temp table pp (f1 int, unique(f1,f1));                                     ^

which restriction I believe is also per spec.  It's true that using
CREATE UNIQUE INDEX syntax, you can make an index with duplicate
columns, but there seems no a-priori reason why we have to allow such
an index to be the referent of a foreign key.  The main reason *not* to
allow it is that such an index might conceivably have different opclasses
for the index columns containing the same attnum, and in that case it
becomes ambiguous which opclass (ie, which definition of equality)
applies to each column of the foreign key.  We could make some arbitrary
definition of which one to select but I think that would be stretching
generality well past the point of usefulness.  Worse, if the different
opclasses do have different definitions of equality and we pick the wrong
one, it's not real clear that it mightn't be possible for a referencing
row to be considered equal to multiple rows in the referenced table (that
are distinct according to the rules used by the index).

So I'm thinking you're right: we should rewrite this code so that only
indexes having all columns distinct can match, thereby ensuring that there
is only one possible interpretation of the equality semantics for the
foreign key.
        regards, tom lane



Re: Defining a foreign key with a duplicate column is broken

From
David Rowley
Date:
On Sat, Aug 9, 2014 at 12:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

So I'm thinking you're right: we should rewrite this code so that only
indexes having all columns distinct can match, thereby ensuring that there
is only one possible interpretation of the equality semantics for the
foreign key.


I've attached a patch which disallows these, though I was not quite sure about the new error message since likely this is something that should be backpacked? I wasn't sure on the policy for new translations in a minor release.

The wording of the comment above the code which disallows the duplicate column references could likely do with some meaningful explanation about the reasons why we disallow these, I thought your words would be a bit better at doing this so I didn't try very hard on that part.

Is this roughly what you had in mind?

Regards

David Rowley 
Attachment

Re: Defining a foreign key with a duplicate column is broken

From
Tom Lane
Date:
David Rowley <dgrowleyml@gmail.com> writes:
> On Sat, Aug 9, 2014 at 12:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> So I'm thinking you're right: we should rewrite this code so that only
>> indexes having all columns distinct can match, thereby ensuring that there
>> is only one possible interpretation of the equality semantics for the
>> foreign key.

> I've attached a patch which disallows these, though I was not quite sure
> about the new error message since likely this is something that should be
> backpacked? I wasn't sure on the policy for new translations in a minor
> release.

There's no need for a new error message I think, because we should just
ignore such indexes.  After all, there might be a valid matching index
later on.
        regards, tom lane



Re: Defining a foreign key with a duplicate column is broken

From
David Rowley
Date:
On Sat, Aug 9, 2014 at 3:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
David Rowley <dgrowleyml@gmail.com> writes:
> On Sat, Aug 9, 2014 at 12:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> So I'm thinking you're right: we should rewrite this code so that only
>> indexes having all columns distinct can match, thereby ensuring that there
>> is only one possible interpretation of the equality semantics for the
>> foreign key.

> I've attached a patch which disallows these, though I was not quite sure
> about the new error message since likely this is something that should be
> backpacked? I wasn't sure on the policy for new translations in a minor
> release.

There's no need for a new error message I think, because we should just
ignore such indexes.  After all, there might be a valid matching index
later on.


hmm, but if the user attempts to define the foreign key that contains a duplicate column in the REFERENCES part, then we'll never "find" any indexes, so there's no point in looking at all. That's why I did the duplicate checks as a pre-check before the loop over the indexes. Though that's not to say that we couldn't throw the "there is no unique constraint matching given keys for referenced table ..." error there.

I've attached a version of the patch that's a little smarter when it comes to doing the duplicate checks in the attnums array... For what it's worth.

Regards

David Rowley


Attachment

Re: Defining a foreign key with a duplicate column is broken

From
Tom Lane
Date:
David Rowley <dgrowleyml@gmail.com> writes:
> On Sat, Aug 9, 2014 at 3:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> There's no need for a new error message I think, because we should just
>> ignore such indexes.  After all, there might be a valid matching index
>> later on.

> hmm, but if the user attempts to define the foreign key that contains a
> duplicate column in the REFERENCES part, then we'll never "find" any
> indexes, so there's no point in looking at all.

OK, now that I'm a bit more awake, I agree with that.

> I've attached a version of the patch that's a little smarter when it comes
> to doing the duplicate checks in the attnums array...

Applied with some cosmetic adjustments.  I didn't bother with the
regression test either --- this doesn't seem like something that needs
permanent memorialization as a regression test.
        regards, tom lane