Re: [PATCH] Support for Array ELEMENT Foreign Keys - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: [PATCH] Support for Array ELEMENT Foreign Keys |
Date | |
Msg-id | 16787.1351053391@sss.pgh.pa.us Whole thread Raw |
In response to | Re: [PATCH] Support for Array ELEMENT Foreign Keys (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: [PATCH] Support for Array ELEMENT Foreign Keys
|
List | pgsql-hackers |
Now that it seems like we've got consensus on syntax, let's talk about some implementation issues. Ordinarily, the query executed during an insert or update on the referencing table looks like, for example, SELECT 1 FROM ONLY "public"."pk" x WHERE "f1" OPERATOR(pg_catalog.=) $1 FOR SHARE OF x where $1 is a parameter representing the referencing column's value. This will find and lock the referenced row if there is one. (There can't be more than one, because the equality constraint corresponds to the unique index on the referenced column pk.f1.) The proposed patch uses this if the referencing column is an array: SELECT 1 WHERE (SELECT pg_catalog.count(DISTINCT y) FROM pg_catalog.unnest($1) y) OPERATOR(pg_catalog.=) (SELECT pg_catalog.count(*)FROM (SELECT 1 FROM ONLY "public"."pk" x WHERE "f1" OPERATOR(pg_catalog.=) ANY ($1) FOR SHARE OFx) z) In English, the idea is to find and lock all PK rows matching any element of the array referencing value, and count them, and then see if that's equal to the number of distinct non-null elements in the array value. I find this pretty grotty. Quite aside from any aesthetic concerns, it's broken because it presupposes that count(distinct y) has exactly the same notion of equality that the PK unique index has. In reality, count(distinct) will fall back to the default btree opclass for the array element type. There might not be such an opclass, or it might not be compatible with the PK unique index. This is not just an academic concern: for instance, there are distinct values of the numeric type that will compare equal to the same float8 PK value, because of the limited precision of float comparison. In my working copy of the patch, I've dealt with this by inserting a creation-time restriction that the array element type has to have a default btree opclass that is part of the PK index's opfamily. This is not very desirable, because it means that some cases that are allowed in plain FKs are disallowed in array FKs. Example: regression=# create table ff (f1 float8 primary key); CREATE TABLE regression=# create table cc (f1 numeric references ff); CREATE TABLE regression=# create table cc2 (f1 numeric[], foreign key(each element of f1) references ff); ERROR: foreign key constraint "cc2_f1_fkey" cannot be implemented DETAIL: Key column "f1" has element type numeric which does not have a default btree operator class that's compatible withclass "float8_ops". So I'm looking for a better answer. One somewhat brute-force answer is to not try to use = ANY at all in the RI test query, but instead deconstruct the array value within the RI trigger and execute the standard scalar locking query for each array element. One attraction that would have is making it easier to produce a decent error message. Right now, if you insert an array value that has an invalid element, you get something like this: regression=# create table pk (f1 int primary key); CREATE TABLE regression=# create table ref1 (f1 int[], foreign key(each element of f1) references pk); CREATE TABLE regression=# insert into pk values (1),(2); INSERT 0 2 regression=# insert into ref1 values(array[1,2,5]); ERROR: insert or update on table "ref1" violates foreign key constraint "ref1_f1_fkey" DETAIL: Key (f1)=({1,2,5}) is not present in table "pk". I don't find that too helpful even with three elements, and it would be very much not helpful with hundreds. I would rather it told me that "5" is the problem. So that's the direction I was thinking of going in, but I wonder if anyone has a better idea. BTW, there is a second undesirable dependency on default opclass semantics in the patch, which is that it supposes it can use array_eq() to detect whether or not the referencing column has changed. But I think that can be fixed without undue pain by providing a refactored version of array_eq() that can be told which element-comparison function to use. regards, tom lane
pgsql-hackers by date: