Re: [PATCH] Support for foreign keys with arrays - Mailing list pgsql-hackers

From Marco Nenciarini
Subject Re: [PATCH] Support for foreign keys with arrays
Date
Msg-id 1331748188.6744.15.camel@greygoo.devise-it.lan
Whole thread Raw
In response to Re: [PATCH] Support for foreign keys with arrays  (Noah Misch <noah@leadboat.com>)
Responses Re: [PATCH] Support for foreign keys with arrays
Re: [PATCH] Support for foreign keys with arrays
List pgsql-hackers
Hi,

please find attached v4 of the EACH Foreign Key patch (formerly known
also as Foreign Key Array).

All the open issues have been fixed, including the crucial ones; see
below for details.

Regards,
Marco

--
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciarini@2ndQuadrant.it | www.2ndQuadrant.it

On Fri, Feb 24, 2012 at 09:01:35PM -0500, Noah Misch wrote:
> You support, and have test cases for, constraints with multiple EACH columns.
> The documentation and comments do not explain their semantics.  On reviewing
> the behavior and code, you have implemented it in terms of a Cartesian
> product.  That's logical, but when is such a constraint actually useful?  If
> someone can think of an application, great; let's document his example.
> Otherwise, let's reject such constraints at DDL time.

Now multiple EACH column are rejected.

>
> > We can use multi-dimensional arrays as well as referencing columns. In
> > that case though, ON DELETE EACH CASCADE will behave like ON DELETE EACH
> > SET NULL. This is a safe way of implementing the action.
> > We have some ideas on how to implement this, but we feel it is better to
> > limit the behaviour for now.
>
> This still feels awfully unprincipled to me.  How about just throwing an error
> when we need to remove an element from a multidimensional array?  Essentially,
> have ON DELETE EACH CASCADE downgrade to ON DELETE RESTRICT when it encounters
> a multidimensional array.  That's strictly less code than what you have now,
> and it keeps our options open.  We can always change from error to set-null
> later, but we can't so readily change from set-null to anything else.

That seems reasonable to me. Implemented and documented.

> As I pondered this patch again, I came upon formal hazards around non-default
> operator classes.  Today, ATAddForeignKeyConstraint() always chooses pfeqop,
> ffeqop and ppeqop in the same operator family.  If it neglected this, we would
> get wrong behavior when one of the operators is sensitive to a change to which
> another is insensitive.  For EACH FK constraints, this patch sets ffeqop =
> ARRAY_EQ_OP unconditionally.  array_eq() uses the TYPECACHE_EQ_OPR (usually
> from the default B-tree operator class).  That operator may not exist at all,
> let alone share an operator family with pfeqop.  Shall we deal with this by
> retrieving TYPECACHE_EQ_OPR in ATAddForeignKeyConstraint() and rejecting the
> constraint creation if it does not share an operator family with pfeqop?  The
> same hazard affects ri_triggers.c use of array_remove() and array_replace(),
> and the same change mitigates it.

Check added. As a consequence of this stricter policy, certain
previously allowed cases are now unacceptable (e.g pk FLOAT and fk
INT[]).
Regression tests have been added.

> Let me say again how much I like the test coverage from this patch.  It's
> great to think of something worth verifying and find existing coverage for it
> in the submitted test cases.
>
>
> > *** a/doc/src/sgml/catalogs.sgml
> > --- b/doc/src/sgml/catalogs.sgml
>
> > ***************
> > *** 2102,2107 ****
> > --- 2106,2118 ----
> >         <entry></entry>
> >         <entry>If a check constraint, a human-readable representation of the expression</entry>
> >        </row>
> > +
> > +      <row>
> > +       <entry><structfield>confisarray</structfield></entry>
>
> Name is now confiseach.

Fixed.

> pg_constraint.confeach needs documentation.

Most of pg_constraint columns, including all the boolean ones, are
documented only in the "description" column of

http://www.postgresql.org/docs/9.1/static/catalog-pg-constraint.html#AEN85760

it seems that confiseach should not be an exception, since it just
indicates whether the constraint is of a certain kind or not.

>
> > +       <entry><type>bool</type></entry>
> > +       <entry></entry>
> > +       <entry>If a foreign key, true if a EACH REFERENCE foreign key</entry>
> > +      </row>
> >       </tbody>
> >      </tgroup>
> >     </table>
> > *** a/doc/src/sgml/ddl.sgml
> > --- b/doc/src/sgml/ddl.sgml
>
> > +    <para>
> > +     Another option you have with foreign keys is to use a
> > +     referencing column which is an array of elements with
> > +     the same type (or a compatible one) as the referenced
> > +     column in the related table. This feature, commonly known
> > +     as <firstterm>foreign key arrays</firstterm>, is implemented
>
> Is it indeed "commonly known" by that name?  My web searching did not turn up
> any independent use of the term.
>
> In any event, this should be the only place where we mention multiple names
> for the feature.  Pick one preferred term and use it for all other mentions.
> The documentation, comments and messages currently have a mix of "each foreign
> key", "EACH foreign key" and "foreign key array".

In the end we uniformly adopted "EACH foreign key" as the unique name
of this feature; documentation, comments and messages have been
updated accordingly.

> > +    <para>
> > +     Even though the most common use case for foreign key arrays
> > +     is on a single column key, you can define an <quote>each foreign
> > +     key constraint</quote> on a group of columns. As the following
> > +     contrived example shows, it needs to be written in table constraint form:
> > + <programlisting>
> > + CREATE TABLE t1 (
> > +   a integer PRIMARY KEY,
> > +   b integer,
> > +   c integer[],
> > +   <emphasis>FOREIGN KEY (b, EACH c) REFERENCES other_table (c1, c2)</emphasis>
> > + );
> > + </programlisting>
>
> Rather than merely showing an abstract syntax example, some words about the
> semantics are in order.  You might have a parent table with primary key
> (category, value).  Then your child table(s) have a category column and an
> array of values.  The constraint then ensures that all the entries in the
> array exist in the parent table *and* belong to the proper category.

That example has been replaced by a concrete one.

> > *** a/src/backend/commands/tablecmds.c
> > --- b/src/backend/commands/tablecmds.c
>
> > ***************
> > *** 5736,5741 **** ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation rel,
> > --- 5759,5807 ----
> >                   (errcode(ERRCODE_INVALID_FOREIGN_KEY),
> >                    errmsg("number of referencing and referenced columns for foreign key disagree")));
> >
> > +     /* Enforce each foreign key restrictions */
> > +     if (fkconstraint->fk_is_each)
> > +     {
> > +         /*
> > +          * ON UPDATE CASCADE action is not supported on EACH foreign keys
> > +          */
> > +         if (fkconstraint->fk_upd_action == FKCONSTR_ACTION_CASCADE)
> > +             ereport(ERROR,
> > +                 (errcode(ERRCODE_INVALID_FOREIGN_KEY),
> > +                 errmsg("ON UPDATE CASCADE action is not supported on "
> > +                        "EACH foreign keys"),
> > +                 errhint("Use ON UPDATE EACH CASCADE, instead")));
>
> Note from the Error Message Style Guide that errhint and errdetail messages
> shall be complete sentences.

Fixed.

> > +
> > +         /*
> > +          * EACH CASCADE and EACH SET NULL actions are only available
> > +          * in single-column EACH foreign keys
> > +          */
> > +         if (numpks > 1 &&
> > +             (fkconstraint->fk_upd_action == FKCONSTR_ACTION_ARRCASCADE
> > +             || fkconstraint->fk_upd_action == FKCONSTR_ACTION_ARRSETNULL
> > +             || fkconstraint->fk_del_action == FKCONSTR_ACTION_ARRCASCADE
> > +             || fkconstraint->fk_del_action == FKCONSTR_ACTION_ARRSETNULL))
> > +             ereport(ERROR,
> > +                 (errcode(ERRCODE_INVALID_FOREIGN_KEY),
> > +                 errmsg("EACH CASCADE and EACH SET NULL actions are only "
> > +                         "available on single-column EACH foreign keys")));
> > +     }
> > +     else
> > +     {
> > +         /*
> > +          * EACH CASCADE and EACH SET NULL actions are only available
> > +          * in EACH foreign keys
> > +          */
> > +         if (fkconstraint->fk_upd_action == FKCONSTR_ACTION_ARRCASCADE
> > +             || fkconstraint->fk_upd_action == FKCONSTR_ACTION_ARRSETNULL
> > +             || fkconstraint->fk_del_action == FKCONSTR_ACTION_ARRCASCADE
> > +             || fkconstraint->fk_del_action == FKCONSTR_ACTION_ARRSETNULL)
> > +             ereport(ERROR,
> > +                 (errcode(ERRCODE_INVALID_FOREIGN_KEY),
> > +                 errmsg("EACH CASCADE and EACH SET NULL actions are only "
> > +                        "available on EACH foreign keys")));
>
> I would just reuse the previous error message.  It will be fine, maybe even
> better, for everyone to see the phrase "single-column".

Fixed.

> > ***************
> > *** 5812,5823 **** ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation rel,
> >               Oid            target_typeids[2];
> >
> >               input_typeids[0] = pktype;
> > !             input_typeids[1] = fktype;
> >               target_typeids[0] = opcintype;
> >               target_typeids[1] = opcintype;
> >               if (can_coerce_type(2, input_typeids, target_typeids,
> >                                   COERCION_IMPLICIT))
> > !                 pfeqop = ffeqop = ppeqop;
> >           }
> >
> >           if (!(OidIsValid(pfeqop) && OidIsValid(ffeqop)))
> > --- 5905,5925 ----
> >               Oid            target_typeids[2];
> >
> >               input_typeids[0] = pktype;
> > !             input_typeids[1] = fktyped;
>
> By sending the base type instead of the domain type to can_coerce_type(), this
> changes behavior ever so slightly for ordinary foreign keys.  It would take
> something fairly baroque to get a different result.  Perhaps a polymorphic
> opcintype and an FK column of a domain over an enum.  Still, please don't
> change that part of the logic.

Fixed.

> >               target_typeids[0] = opcintype;
> >               target_typeids[1] = opcintype;
> >               if (can_coerce_type(2, input_typeids, target_typeids,
> >                                   COERCION_IMPLICIT))
> > !             {
> > !                 pfeqop = ppeqop;
> > !
> > !                 /*
> > !                  * In caso of an EACH FK the ffeqop must be left untouched
>
> Typo.

Fixed.

>
> > *** a/src/backend/parser/gram.y
> > --- b/src/backend/parser/gram.y
>
> > ***************
> > *** 2900,2917 **** ConstraintElem:
> >                                      yyscanner);
> >                       $$ = (Node *)n;
> >                   }
> > !             | FOREIGN KEY '(' columnList ')' REFERENCES qualified_name
> >                   opt_column_list key_match key_actions ConstraintAttributeSpec
> >                   {
> >                       Constraint *n = makeNode(Constraint);
> >                       n->contype = CONSTR_FOREIGN;
> >                       n->location = @1;
> >                       n->pktable            = $7;
> > !                     n->fk_attrs            = $4;
> >                       n->pk_attrs            = $8;
> >                       n->fk_matchtype        = $9;
> >                       n->fk_upd_action    = (char) ($10 >> 8);
> >                       n->fk_del_action    = (char) ($10 & 0xFF);
> >                       processCASbits($11, @11, "FOREIGN KEY",
> >                                      &n->deferrable, &n->initdeferred,
> >                                      &n->skip_validation,
> > --- 2918,2959 ----
> >                                      yyscanner);
> >                       $$ = (Node *)n;
> >                   }
> > !             | FOREIGN KEY '(' foreignKeyColumnList ')' REFERENCES qualified_name
> >                   opt_column_list key_match key_actions ConstraintAttributeSpec
> >                   {
> >                       Constraint *n = makeNode(Constraint);
> >                       n->contype = CONSTR_FOREIGN;
> >                       n->location = @1;
> >                       n->pktable            = $7;
> > !                     n->fk_attrs         = $4;
> >                       n->pk_attrs            = $8;
> >                       n->fk_matchtype        = $9;
> >                       n->fk_upd_action    = (char) ($10 >> 8);
> >                       n->fk_del_action    = (char) ($10 & 0xFF);
> > +
> > +                     /*
> > +                      * Split the content of foreignKeyColumnList
> > +                      * in two separate list. One lis of fileds
>
> Two typos.

Fixed.

> > +                      * and one list of boolean values.
> > +                      */
> > +                     {
> > +                         ListCell *i;
> > +
> > +                         n->fk_attrs = NIL;
> > +                         n->fk_is_each = false;
> > +                         n->fk_each_attrs = NIL;
> > +                         foreach (i, $4)
> > +                         {
> > +                             ForeignKeyColumnElem *elem =
> > +                                 (ForeignKeyColumnElem *)lfirst(i);
> > +
> > +                             n->fk_attrs = lappend(n->fk_attrs, elem->name);
> > +                             n->fk_is_each |= elem->each;
> > +                             n->fk_each_attrs = lappend(n->fk_each_attrs,
> > +                                                        makeString(elem->each?"t":"f"));
> > +                         }
> > +                     }
>
> This processing should happen in parse_utilcmd.c rather than gram.y.  Magic
> values "t" and "f" won't do.  Make fk_attrs a list of ForeignKeyColumnElem or
> else at least use an int list of true/false.  Either way, you would no longer
> need the decode loop in tablecmds.c.

Fixed. We have removed the magic values and moved decoding of the
ForeignKeyColumnElem list inside parse_utilcmd.c.

> > *** a/src/backend/utils/adt/arrayfuncs.c
> > --- b/src/backend/utils/adt/arrayfuncs.c
>
> > + Datum
> > + array_replace(PG_FUNCTION_ARGS)
> > + {
>
> > +     for (i = 0; i < nitems; i++)
> > +     {
> > +         bool        isNull;
> > +         bool        oprresult;
> > +
> > +         /* Get source element, checking for NULL */
> > +         if (bitmap && (*bitmap & bitmask) == 0)
> > +         {
> > +             if (old_value_isnull)
> > +             {
> > +                 values[i] = new_value;
> > +                 isNull = false;
> > +                 changed = true;
> > +             }
> > +             else
> > +                 isNull = true;
> > +         }
>
> This does the wrong thing if old_value_isnull == new_value_isnull == true:
>
> [local] regression=# select array_replace('{4,null,5}'::int[], null, null);
>  array_replace
> ---------------
>  {4,0,5}
>
> (With text[], that yields a crash.)

Fixed.

> > *** a/src/backend/utils/adt/ri_triggers.c
> > --- b/src/backend/utils/adt/ri_triggers.c
>
> > ! Datum
> > ! RI_FKey_eachcascade_upd(PG_FUNCTION_ARGS)
> >   {
>
> > !                 /* ----------
> > !                  * The query string built is
> > !                  *    UPDATE ONLY <fktable>
> > !                  *        SET    fkatt1 = array_replace(fkatt1, $n, $1) [, ...]
> > !                  *        WHERE $n = fkatt1 [AND ...]
> > !                  * The type id's for the $ parameters are those of the
> > !                  * corresponding PK attributes.
> > !                  * ----------
> > !                  */
> > !                 initStringInfo(&querybuf);
> > !                 initStringInfo(&qualbuf);
> > !                 quoteRelationName(fkrelname, fk_rel);
> > !                 appendStringInfo(&querybuf, "UPDATE ONLY %s SET", fkrelname);
> > !                 querysep = "";
> > !                 qualsep = "WHERE";
>
> An elog(ERROR) when riinfo.nkeys != 1 would make sense here and for other
> functions where we forbid multiple columns at creation time.  If nothing else,
> this needs a comment that the looping is vestigial and the function only
> supports single-column constraints.

Logic implemented and comment added.

> > ***************
> > *** 2678,2695 **** RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
> >        * The query string built is:
> >        *    SELECT fk.keycols FROM ONLY relname fk
> >        *     LEFT OUTER JOIN ONLY pkrelname pk
> > !      *     ON (pk.pkkeycol1=fk.keycol1 [AND ...])
> >        *     WHERE pk.pkkeycol1 IS NULL AND
> >        * For MATCH unspecified:
> >        *     (fk.keycol1 IS NOT NULL [AND ...])
> >        * For MATCH FULL:
> >        *     (fk.keycol1 IS NOT NULL [OR ...])
> >        *
> >        * We attach COLLATE clauses to the operators when comparing columns
> >        * that have different collations.
> >        *----------
> >        */
> >       initStringInfo(&querybuf);
> >       appendStringInfo(&querybuf, "SELECT ");
> >       sep = "";
> >       for (i = 0; i < riinfo.nkeys; i++)
> > --- 3527,3557 ----
> >        * The query string built is:
> >        *    SELECT fk.keycols FROM ONLY relname fk
> >        *     LEFT OUTER JOIN ONLY pkrelname pk
> > !      *     ON (pk.pkkeycol1=fk.keycol1 [AND ...]
> > !      *      [AND NOT EXISTS(<RECHECK_SUBQUERY>)])
> >        *     WHERE pk.pkkeycol1 IS NULL AND
> >        * For MATCH unspecified:
> >        *     (fk.keycol1 IS NOT NULL [AND ...])
> >        * For MATCH FULL:
> >        *     (fk.keycol1 IS NOT NULL [OR ...])
> >        *
> > +      * In case of an EACH foreign key, a recheck subquery is added to
> > +      * the join condition in order to check that every combination of keys
> > +      * is actually referenced.
> > +      * The RECHECK_SUBQUERY is
> > +      *   SELECT 1 FROM
> > +      *     unnest(fk.keycol1) x1(x1) [CROSS JOIN ...]
> > +      *   LEFT OUTER JOIN ONLY pkrelname pk
> > +      *   ON (pk.pkkeycol1=x1.x1 [AND ...])
> > +      *   WHERE pk.pkkeycol1 IS NULL AND
> > +      *     (fk.keycol1 IS NOT NULL [AND ...])
>
> What is a worst-case query plan for a constraint with n ordinary keys and m
> EACH keys?

The subquery complexity can be compared with the one of the main query.
Anyway, it is a one-off cost, paid at constraint creation, which I
believe is acceptable. The fact that we now implement only single-column
EACH foreign keys might suggest that we postpone this discussion at a
later time.

>
> > +      *
> >        * We attach COLLATE clauses to the operators when comparing columns
> >        * that have different collations.
> >        *----------
> >        */
>
> > *** a/src/include/nodes/parsenodes.h
> > --- b/src/include/nodes/parsenodes.h
> > ***************
> > *** 570,575 **** typedef struct DefElem
> > --- 570,589 ----
> >   } DefElem;
> >
> >   /*
> > +  * ForeignKeyColumnElem - foreign key column (used in foreign key constaint)
>
> Typo.

Fixed.

>
> > +  *
> > +  * For a foreign key attribute, 'name' is the name of the table column to
> > +  * index, and each is true if it is an EACH fk.
> > +  */
> > + typedef struct ForeignKeyColumnElem
> > + {
> > +     NodeTag        type;
> > +     Node       *name;            /* name of the column, or NULL */
> > +     bool        each;            /* true if an EACH foreign key */
> > +
> > + } ForeignKeyColumnElem;
> > +
> > + /*
> >    * LockingClause - raw representation of FOR UPDATE/SHARE options
> >    *
> >    * Note: lockedRels == NIL means "all relations in query".    Otherwise it
> > ***************
> > *** 1510,1515 **** typedef enum ConstrType            /* types of constraints */
> > --- 1524,1531 ----
> >   #define FKCONSTR_ACTION_CASCADE        'c'
> >   #define FKCONSTR_ACTION_SETNULL        'n'
> >   #define FKCONSTR_ACTION_SETDEFAULT    'd'
> > + #define FKCONSTR_ACTION_ARRCASCADE    'C'
> > + #define FKCONSTR_ACTION_ARRSETNULL    'N'
>
> These names need update for the array -> each terminology change.

Fixed.

> I consider these the core changes needed to reach Ready for Committer:
>
> - Fix crash in array_replace(arr, null, null).
> - Don't look through the domain before calling can_coerce_type().
> - Compare operator family of pfeqop and TYPECACHE_EQ_OPR at creation.
> - Move post-processing from gram.y and remove "t"/"f" magic values.

All done.



Attachment

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: patch for parallel pg_dump
Next
From: Daniel Farina
Date:
Subject: Faster compression, again