Re: [PATCH] Support for foreign keys with arrays - Mailing list pgsql-hackers

From Noah Misch
Subject Re: [PATCH] Support for foreign keys with arrays
Date
Msg-id 20120317033312.GB19556@tornado.leadboat.com
Whole thread Raw
In response to Re: [PATCH] Support for foreign keys with arrays  (Marco Nenciarini <marco.nenciarini@2ndQuadrant.it>)
Responses Re: [PATCH] Support for foreign keys with arrays
List pgsql-hackers
[used followup EACH-foreign-key.v4b.patch.bz2 for review]

On Wed, Mar 14, 2012 at 07:03:08PM +0100, Marco Nenciarini wrote:
> please find attached v4 of the EACH Foreign Key patch (formerly known
> also as Foreign Key Array).

> On Fri, Feb 24, 2012 at 09:01:35PM -0500, Noah Misch wrote:

> > > We can use multi-dimensional arrays as well as referencing columns. In 
> > > that case though, ON DELETE EACH CASCADE will behave like ON DELETE EACH
> > > SET NULL. This is a safe way of implementing the action. 
> > > We have some ideas on how to implement this, but we feel it is better to
> > > limit the behaviour for now.
> > 
> > This still feels awfully unprincipled to me.  How about just throwing an error
> > when we need to remove an element from a multidimensional array?  Essentially,
> > have ON DELETE EACH CASCADE downgrade to ON DELETE RESTRICT when it encounters
> > a multidimensional array.  That's strictly less code than what you have now,
> > and it keeps our options open.  We can always change from error to set-null
> > later, but we can't so readily change from set-null to anything else.
> 
> That seems reasonable to me. Implemented and documented.

I like the semantics now.  You implemented this by doing two scans per PK
delete, the first to detect multidimensional dependants of the PK row and the
second to fix 1-dimensional dependants.  We don't require an index to support
the scan, so this can mean two seq scans.  Currently, the only benefit to
doing it this way is a change of error message.  Here is the current error
message when we would need to remove a multidimensional array element:
 ERROR:  update or delete on table "pktableforarray" violates foreign key constraint "fktableforarray_ftest1_fkey" on
table"fktableforarray" DETAIL:  Key (ptest1)=(2) is still referenced from table "fktableforarray".
 

If I just rip out the first scan, we get the same outcome with a different
error message:
 ERROR:  removing elements from multidimensional arrays is not supported CONTEXT:  SQL statement "UPDATE ONLY
"public"."fktableforarray"SET "ftest1" = array_remove("ftest1", $1) WHERE $1 OPERATOR(pg_catalog.=) ANY ("ftest1")"
 

That has less polish, but I think it's actually more useful.  The first
message gives no indication that a multidimensional array foiled your ON
DELETE EACH CASCADE.  The second message hints at that cause.

I recommend removing the new block of code in RI_FKey_eachcascade_del() and
letting array_remove() throw the error.  If you find a way to throw a nicer
error without an extra scan, by all means submit that to a future CF as an
improvement.  I don't think it's important enough to justify cycles at this
late hour of the current CF.

> > As I pondered this patch again, I came upon formal hazards around non-default
> > operator classes.  Today, ATAddForeignKeyConstraint() always chooses pfeqop,
> > ffeqop and ppeqop in the same operator family.  If it neglected this, we would
> > get wrong behavior when one of the operators is sensitive to a change to which
> > another is insensitive.  For EACH FK constraints, this patch sets ffeqop =
> > ARRAY_EQ_OP unconditionally.  array_eq() uses the TYPECACHE_EQ_OPR (usually
> > from the default B-tree operator class).  That operator may not exist at all,
> > let alone share an operator family with pfeqop.  Shall we deal with this by
> > retrieving TYPECACHE_EQ_OPR in ATAddForeignKeyConstraint() and rejecting the
> > constraint creation if it does not share an operator family with pfeqop?  The
> > same hazard affects ri_triggers.c use of array_remove() and array_replace(),
> > and the same change mitigates it.
> 
> Check added. As a consequence of this stricter policy, certain
> previously allowed cases are now unacceptable (e.g pk FLOAT and fk
> INT[]).
> Regression tests have been added.

Ah, good.  Not so formal after all.

> > pg_constraint.confeach needs documentation.
> 
> Most of pg_constraint columns, including all the boolean ones, are
> documented only in the "description" column of
> 
> http://www.postgresql.org/docs/9.1/static/catalog-pg-constraint.html#AEN85760
> 
> it seems that confiseach should not be an exception, since it just
> indicates whether the constraint is of a certain kind or not.

Your patch adds two columns to pg_constraint, confiseach and confeach, but it
only mentions confiseach in documentation.  Just add a similar minimal mention
of confeach.

> > > ***************
> > > *** 5736,5741 **** ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation rel,
> > > --- 5759,5807 ----
> > >                   (errcode(ERRCODE_INVALID_FOREIGN_KEY),
> > >                    errmsg("number of referencing and referenced columns for foreign key disagree")));
> > >   
> > > +     /* Enforce each foreign key restrictions */
> > > +     if (fkconstraint->fk_is_each)
> > > +     {
> > > +         /*
> > > +          * ON UPDATE CASCADE action is not supported on EACH foreign keys
> > > +          */
> > > +         if (fkconstraint->fk_upd_action == FKCONSTR_ACTION_CASCADE)
> > > +             ereport(ERROR,
> > > +                 (errcode(ERRCODE_INVALID_FOREIGN_KEY),
> > > +                 errmsg("ON UPDATE CASCADE action is not supported on "
> > > +                        "EACH foreign keys"),
> > > +                 errhint("Use ON UPDATE EACH CASCADE, instead")));
> > 
> > Note from the Error Message Style Guide that errhint and errdetail messages
> > shall be complete sentences.
> 
> Fixed.

That is to say, they start with a capital letter and end with a period.  Your
old text was fine apart from the lack of a period.  (Your new text also lacks
a period.)

> > This processing should happen in parse_utilcmd.c rather than gram.y.  Magic
> > values "t" and "f" won't do.  Make fk_attrs a list of ForeignKeyColumnElem or
> > else at least use an int list of true/false.  Either way, you would no longer
> > need the decode loop in tablecmds.c.
> 
> Fixed. We have removed the magic values and moved decoding of the
> ForeignKeyColumnElem list inside parse_utilcmd.c.

When I said "int list", I meant a T_IntList (via lappend_int(), lfirst_int(),
etc.).  You used a T_List of T_Integer.

> > > ***************
> > > *** 2678,2695 **** RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
> > >        * The query string built is:
> > >        *    SELECT fk.keycols FROM ONLY relname fk
> > >        *     LEFT OUTER JOIN ONLY pkrelname pk
> > > !      *     ON (pk.pkkeycol1=fk.keycol1 [AND ...])
> > >        *     WHERE pk.pkkeycol1 IS NULL AND
> > >        * For MATCH unspecified:
> > >        *     (fk.keycol1 IS NOT NULL [AND ...])
> > >        * For MATCH FULL:
> > >        *     (fk.keycol1 IS NOT NULL [OR ...])
> > >        *
> > >        * We attach COLLATE clauses to the operators when comparing columns
> > >        * that have different collations.
> > >        *----------
> > >        */
> > >       initStringInfo(&querybuf);
> > >       appendStringInfo(&querybuf, "SELECT ");
> > >       sep = "";
> > >       for (i = 0; i < riinfo.nkeys; i++)
> > > --- 3527,3557 ----
> > >        * The query string built is:
> > >        *    SELECT fk.keycols FROM ONLY relname fk
> > >        *     LEFT OUTER JOIN ONLY pkrelname pk
> > > !      *     ON (pk.pkkeycol1=fk.keycol1 [AND ...]
> > > !      *      [AND NOT EXISTS(<RECHECK_SUBQUERY>)])
> > >        *     WHERE pk.pkkeycol1 IS NULL AND
> > >        * For MATCH unspecified:
> > >        *     (fk.keycol1 IS NOT NULL [AND ...])
> > >        * For MATCH FULL:
> > >        *     (fk.keycol1 IS NOT NULL [OR ...])
> > >        *
> > > +      * In case of an EACH foreign key, a recheck subquery is added to
> > > +      * the join condition in order to check that every combination of keys
> > > +      * is actually referenced.
> > > +      * The RECHECK_SUBQUERY is
> > > +      *   SELECT 1 FROM
> > > +      *     unnest(fk.keycol1) x1(x1) [CROSS JOIN ...]
> > > +      *   LEFT OUTER JOIN ONLY pkrelname pk
> > > +      *   ON (pk.pkkeycol1=x1.x1 [AND ...])
> > > +      *   WHERE pk.pkkeycol1 IS NULL AND
> > > +      *     (fk.keycol1 IS NOT NULL [AND ...])
> > 
> > What is a worst-case query plan for a constraint with n ordinary keys and m
> > EACH keys?
> 
> The subquery complexity can be compared with the one of the main query.
> Anyway, it is a one-off cost, paid at constraint creation, which I
> believe is acceptable. The fact that we now implement only single-column
> EACH foreign keys might suggest that we postpone this discussion at a
> later time.

If the cost doesn't exceed O(F log P), where F is the size of the FK table and
P is the size of the PK table, I'm not worried.  If it can be O(F^2), we would
have a problem to be documented, if not fixed.


Your change to array_replace() made me look at array_remove() again and
realize that it needs the same treatment.  This command yields a segfault: SELECT array_remove('{a,null}'::text[],
null);

This latest version introduces two calls to get_element_type(), both of which
should be get_base_element_type().

Patch "Avoid FK validations for no-rewrite ALTER TABLE ALTER TYPE", committed
between v3b and v4 of this patch, added code to ATAddForeignKeyConstraint()
requiring an update in this patch.  Run this in the regression DB: [local] regression=# alter table fktableforeachfk
alterftest1 type int[]; ERROR:  could not find cast from 1007 to 23
 

RI_PLAN_EACHCASCADE_DEL_DOUPDATE should be RI_PLAN_EACHCASCADE_DEL_DODELETE.


I think the patch has reached the stage where a committer can review it
without wasting much time on things that might change radically.  So, I'm
marking it Ready for Committer.  Please still submit an update correcting the
above; I'm sure your committer will appreciate it.

Thanks,
nm


pgsql-hackers by date:

Previous
From: Greg Stark
Date:
Subject: Re: Storage Manager crash at mdwrite()
Next
From: Greg Stark
Date:
Subject: Re: EquivalenceClasses and subqueries and PlaceHolderVars, oh my