Re: [PATCH] Support for Array ELEMENT Foreign Keys - Mailing list pgsql-hackers

From Noah Misch
Subject Re: [PATCH] Support for Array ELEMENT Foreign Keys
Date
Msg-id 20121024151444.GA6462@tornado.leadboat.com
Whole thread Raw
In response to Re: [PATCH] Support for Array ELEMENT Foreign Keys  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [PATCH] Support for Array ELEMENT Foreign Keys
List pgsql-hackers
On Wed, Oct 24, 2012 at 12:36:31AM -0400, Tom Lane wrote:
> The proposed patch uses this if the referencing column is an array:
> 
> SELECT 1 WHERE
>   (SELECT pg_catalog.count(DISTINCT y) FROM pg_catalog.unnest($1) y)
>   OPERATOR(pg_catalog.=)
>   (SELECT pg_catalog.count(*) FROM
>     (SELECT 1 FROM ONLY "public"."pk" x
>      WHERE "f1" OPERATOR(pg_catalog.=) ANY ($1) FOR SHARE OF x) z)
> 
> In English, the idea is to find and lock all PK rows matching any
> element of the array referencing value, and count them, and then see if
> that's equal to the number of distinct non-null elements in the array
> value.  I find this pretty grotty.  Quite aside from any aesthetic
> concerns, it's broken because it presupposes that count(distinct y)
> has exactly the same notion of equality that the PK unique index has.
> In reality, count(distinct) will fall back to the default btree opclass
> for the array element type.  There might not be such an opclass, or it
> might not be compatible with the PK unique index.  This is not just an
> academic concern: for instance, there are distinct values of the numeric
> type that will compare equal to the same float8 PK value, because of the
> limited precision of float comparison.

Good point.

> One somewhat brute-force answer
> is to not try to use = ANY at all in the RI test query, but instead
> deconstruct the array value within the RI trigger and execute the
> standard scalar locking query for each array element.  One attraction
> that would have is making it easier to produce a decent error message.
> Right now, if you insert an array value that has an invalid element,
> you get something like this:
> 
> regression=# create table pk (f1 int primary key);
> CREATE TABLE
> regression=# create table ref1 (f1 int[], foreign key(each element of f1) references pk);
> CREATE TABLE
> regression=# insert into pk values (1),(2);
> INSERT 0 2
> regression=# insert into ref1 values(array[1,2,5]);
> ERROR:  insert or update on table "ref1" violates foreign key constraint "ref1_f1_fkey"
> DETAIL:  Key (f1)=({1,2,5}) is not present in table "pk".
> 
> I don't find that too helpful even with three elements, and it would be
> very much not helpful with hundreds.  I would rather it told me that
> "5" is the problem.
> 
> So that's the direction I was thinking of going in, but I wonder if
> anyone has a better idea.

The error message improvement would be nice.  I'd be wary of the performance
of firing up hundreds or thousands of SPI queries to validate a single long
array, though.  We can always use a slow path to prepare a nice error message.

For FKs, we currently document that "The referenced columns must be the
columns of a non-deferrable unique or primary key constraint in the referenced
table."  Taking that literally, one might imagine that bare UNIQUE indexes do
not qualify.  However, transformFkeyCheckAttrs() does accept them, including
indexes with non-default operator classes:

create table parent (c text);
create unique index on parent(c text_pattern_ops);
create table child (c text references parent(c));

That's a bit suspect already; in particular, given multiple indexes with
different operator classes, I see no good way for the user to choose the one
characterizing his foreign key constraint.  Suppose, for this new feature, we
enforce the letter of the documentation and accept only real UNIQUE/PK
constraints.  (Perhaps also allow UNIQUE indexes eligible to be converted to
UNIQUE constraints?)  With that restriction, we'll know that the PK side of
the constraint uses the default b-tree operator class of the PK-side type.  At
that point, casting the element to the PK type in the count(DISTINCT)
expression should fix the problem, correct?

> BTW, there is a second undesirable dependency on default opclass
> semantics in the patch, which is that it supposes it can use array_eq()
> to detect whether or not the referencing column has changed.  But I
> think that can be fixed without undue pain by providing a refactored
> version of array_eq() that can be told which element-comparison function
> to use.

Does the bit near this comment not cover that?

+         /*
+          * In case of an array ELEMENT FK, make sure TYPECACHE_EQ_OPR exists
+          * for the FK element_type and it is compatible with pfeqop
+          */

Thanks,
nm



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: [WIP] pg_ping utility
Next
From: Noah Misch
Date:
Subject: Re: [WIP] Performance Improvement by reducing WAL for Update Operation