Re: [PATCH] Support for foreign keys with arrays - Mailing list pgsql-hackers
From | Marco Nenciarini |
---|---|
Subject | Re: [PATCH] Support for foreign keys with arrays |
Date | |
Msg-id | 1331748188.6744.15.camel@greygoo.devise-it.lan Whole thread Raw |
In response to | Re: [PATCH] Support for foreign keys with arrays (Noah Misch <noah@leadboat.com>) |
Responses |
Re: [PATCH] Support for foreign keys with arrays
Re: [PATCH] Support for foreign keys with arrays |
List | pgsql-hackers |
Hi, please find attached v4 of the EACH Foreign Key patch (formerly known also as Foreign Key Array). All the open issues have been fixed, including the crucial ones; see below for details. Regards, Marco -- Marco Nenciarini - 2ndQuadrant Italy PostgreSQL Training, Services and Support marco.nenciarini@2ndQuadrant.it | www.2ndQuadrant.it On Fri, Feb 24, 2012 at 09:01:35PM -0500, Noah Misch wrote: > You support, and have test cases for, constraints with multiple EACH columns. > The documentation and comments do not explain their semantics. On reviewing > the behavior and code, you have implemented it in terms of a Cartesian > product. That's logical, but when is such a constraint actually useful? If > someone can think of an application, great; let's document his example. > Otherwise, let's reject such constraints at DDL time. Now multiple EACH column are rejected. > > > We can use multi-dimensional arrays as well as referencing columns. In > > that case though, ON DELETE EACH CASCADE will behave like ON DELETE EACH > > SET NULL. This is a safe way of implementing the action. > > We have some ideas on how to implement this, but we feel it is better to > > limit the behaviour for now. > > This still feels awfully unprincipled to me. How about just throwing an error > when we need to remove an element from a multidimensional array? Essentially, > have ON DELETE EACH CASCADE downgrade to ON DELETE RESTRICT when it encounters > a multidimensional array. That's strictly less code than what you have now, > and it keeps our options open. We can always change from error to set-null > later, but we can't so readily change from set-null to anything else. That seems reasonable to me. Implemented and documented. > As I pondered this patch again, I came upon formal hazards around non-default > operator classes. Today, ATAddForeignKeyConstraint() always chooses pfeqop, > ffeqop and ppeqop in the same operator family. If it neglected this, we would > get wrong behavior when one of the operators is sensitive to a change to which > another is insensitive. For EACH FK constraints, this patch sets ffeqop = > ARRAY_EQ_OP unconditionally. array_eq() uses the TYPECACHE_EQ_OPR (usually > from the default B-tree operator class). That operator may not exist at all, > let alone share an operator family with pfeqop. Shall we deal with this by > retrieving TYPECACHE_EQ_OPR in ATAddForeignKeyConstraint() and rejecting the > constraint creation if it does not share an operator family with pfeqop? The > same hazard affects ri_triggers.c use of array_remove() and array_replace(), > and the same change mitigates it. Check added. As a consequence of this stricter policy, certain previously allowed cases are now unacceptable (e.g pk FLOAT and fk INT[]). Regression tests have been added. > Let me say again how much I like the test coverage from this patch. It's > great to think of something worth verifying and find existing coverage for it > in the submitted test cases. > > > > *** a/doc/src/sgml/catalogs.sgml > > --- b/doc/src/sgml/catalogs.sgml > > > *************** > > *** 2102,2107 **** > > --- 2106,2118 ---- > > <entry></entry> > > <entry>If a check constraint, a human-readable representation of the expression</entry> > > </row> > > + > > + <row> > > + <entry><structfield>confisarray</structfield></entry> > > Name is now confiseach. Fixed. > pg_constraint.confeach needs documentation. Most of pg_constraint columns, including all the boolean ones, are documented only in the "description" column of http://www.postgresql.org/docs/9.1/static/catalog-pg-constraint.html#AEN85760 it seems that confiseach should not be an exception, since it just indicates whether the constraint is of a certain kind or not. > > > + <entry><type>bool</type></entry> > > + <entry></entry> > > + <entry>If a foreign key, true if a EACH REFERENCE foreign key</entry> > > + </row> > > </tbody> > > </tgroup> > > </table> > > *** a/doc/src/sgml/ddl.sgml > > --- b/doc/src/sgml/ddl.sgml > > > + <para> > > + Another option you have with foreign keys is to use a > > + referencing column which is an array of elements with > > + the same type (or a compatible one) as the referenced > > + column in the related table. This feature, commonly known > > + as <firstterm>foreign key arrays</firstterm>, is implemented > > Is it indeed "commonly known" by that name? My web searching did not turn up > any independent use of the term. > > In any event, this should be the only place where we mention multiple names > for the feature. Pick one preferred term and use it for all other mentions. > The documentation, comments and messages currently have a mix of "each foreign > key", "EACH foreign key" and "foreign key array". In the end we uniformly adopted "EACH foreign key" as the unique name of this feature; documentation, comments and messages have been updated accordingly. > > + <para> > > + Even though the most common use case for foreign key arrays > > + is on a single column key, you can define an <quote>each foreign > > + key constraint</quote> on a group of columns. As the following > > + contrived example shows, it needs to be written in table constraint form: > > + <programlisting> > > + CREATE TABLE t1 ( > > + a integer PRIMARY KEY, > > + b integer, > > + c integer[], > > + <emphasis>FOREIGN KEY (b, EACH c) REFERENCES other_table (c1, c2)</emphasis> > > + ); > > + </programlisting> > > Rather than merely showing an abstract syntax example, some words about the > semantics are in order. You might have a parent table with primary key > (category, value). Then your child table(s) have a category column and an > array of values. The constraint then ensures that all the entries in the > array exist in the parent table *and* belong to the proper category. That example has been replaced by a concrete one. > > *** a/src/backend/commands/tablecmds.c > > --- b/src/backend/commands/tablecmds.c > > > *************** > > *** 5736,5741 **** ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation rel, > > --- 5759,5807 ---- > > (errcode(ERRCODE_INVALID_FOREIGN_KEY), > > errmsg("number of referencing and referenced columns for foreign key disagree"))); > > > > + /* Enforce each foreign key restrictions */ > > + if (fkconstraint->fk_is_each) > > + { > > + /* > > + * ON UPDATE CASCADE action is not supported on EACH foreign keys > > + */ > > + if (fkconstraint->fk_upd_action == FKCONSTR_ACTION_CASCADE) > > + ereport(ERROR, > > + (errcode(ERRCODE_INVALID_FOREIGN_KEY), > > + errmsg("ON UPDATE CASCADE action is not supported on " > > + "EACH foreign keys"), > > + errhint("Use ON UPDATE EACH CASCADE, instead"))); > > Note from the Error Message Style Guide that errhint and errdetail messages > shall be complete sentences. Fixed. > > + > > + /* > > + * EACH CASCADE and EACH SET NULL actions are only available > > + * in single-column EACH foreign keys > > + */ > > + if (numpks > 1 && > > + (fkconstraint->fk_upd_action == FKCONSTR_ACTION_ARRCASCADE > > + || fkconstraint->fk_upd_action == FKCONSTR_ACTION_ARRSETNULL > > + || fkconstraint->fk_del_action == FKCONSTR_ACTION_ARRCASCADE > > + || fkconstraint->fk_del_action == FKCONSTR_ACTION_ARRSETNULL)) > > + ereport(ERROR, > > + (errcode(ERRCODE_INVALID_FOREIGN_KEY), > > + errmsg("EACH CASCADE and EACH SET NULL actions are only " > > + "available on single-column EACH foreign keys"))); > > + } > > + else > > + { > > + /* > > + * EACH CASCADE and EACH SET NULL actions are only available > > + * in EACH foreign keys > > + */ > > + if (fkconstraint->fk_upd_action == FKCONSTR_ACTION_ARRCASCADE > > + || fkconstraint->fk_upd_action == FKCONSTR_ACTION_ARRSETNULL > > + || fkconstraint->fk_del_action == FKCONSTR_ACTION_ARRCASCADE > > + || fkconstraint->fk_del_action == FKCONSTR_ACTION_ARRSETNULL) > > + ereport(ERROR, > > + (errcode(ERRCODE_INVALID_FOREIGN_KEY), > > + errmsg("EACH CASCADE and EACH SET NULL actions are only " > > + "available on EACH foreign keys"))); > > I would just reuse the previous error message. It will be fine, maybe even > better, for everyone to see the phrase "single-column". Fixed. > > *************** > > *** 5812,5823 **** ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation rel, > > Oid target_typeids[2]; > > > > input_typeids[0] = pktype; > > ! input_typeids[1] = fktype; > > target_typeids[0] = opcintype; > > target_typeids[1] = opcintype; > > if (can_coerce_type(2, input_typeids, target_typeids, > > COERCION_IMPLICIT)) > > ! pfeqop = ffeqop = ppeqop; > > } > > > > if (!(OidIsValid(pfeqop) && OidIsValid(ffeqop))) > > --- 5905,5925 ---- > > Oid target_typeids[2]; > > > > input_typeids[0] = pktype; > > ! input_typeids[1] = fktyped; > > By sending the base type instead of the domain type to can_coerce_type(), this > changes behavior ever so slightly for ordinary foreign keys. It would take > something fairly baroque to get a different result. Perhaps a polymorphic > opcintype and an FK column of a domain over an enum. Still, please don't > change that part of the logic. Fixed. > > target_typeids[0] = opcintype; > > target_typeids[1] = opcintype; > > if (can_coerce_type(2, input_typeids, target_typeids, > > COERCION_IMPLICIT)) > > ! { > > ! pfeqop = ppeqop; > > ! > > ! /* > > ! * In caso of an EACH FK the ffeqop must be left untouched > > Typo. Fixed. > > > *** a/src/backend/parser/gram.y > > --- b/src/backend/parser/gram.y > > > *************** > > *** 2900,2917 **** ConstraintElem: > > yyscanner); > > $$ = (Node *)n; > > } > > ! | FOREIGN KEY '(' columnList ')' REFERENCES qualified_name > > opt_column_list key_match key_actions ConstraintAttributeSpec > > { > > Constraint *n = makeNode(Constraint); > > n->contype = CONSTR_FOREIGN; > > n->location = @1; > > n->pktable = $7; > > ! n->fk_attrs = $4; > > n->pk_attrs = $8; > > n->fk_matchtype = $9; > > n->fk_upd_action = (char) ($10 >> 8); > > n->fk_del_action = (char) ($10 & 0xFF); > > processCASbits($11, @11, "FOREIGN KEY", > > &n->deferrable, &n->initdeferred, > > &n->skip_validation, > > --- 2918,2959 ---- > > yyscanner); > > $$ = (Node *)n; > > } > > ! | FOREIGN KEY '(' foreignKeyColumnList ')' REFERENCES qualified_name > > opt_column_list key_match key_actions ConstraintAttributeSpec > > { > > Constraint *n = makeNode(Constraint); > > n->contype = CONSTR_FOREIGN; > > n->location = @1; > > n->pktable = $7; > > ! n->fk_attrs = $4; > > n->pk_attrs = $8; > > n->fk_matchtype = $9; > > n->fk_upd_action = (char) ($10 >> 8); > > n->fk_del_action = (char) ($10 & 0xFF); > > + > > + /* > > + * Split the content of foreignKeyColumnList > > + * in two separate list. One lis of fileds > > Two typos. Fixed. > > + * and one list of boolean values. > > + */ > > + { > > + ListCell *i; > > + > > + n->fk_attrs = NIL; > > + n->fk_is_each = false; > > + n->fk_each_attrs = NIL; > > + foreach (i, $4) > > + { > > + ForeignKeyColumnElem *elem = > > + (ForeignKeyColumnElem *)lfirst(i); > > + > > + n->fk_attrs = lappend(n->fk_attrs, elem->name); > > + n->fk_is_each |= elem->each; > > + n->fk_each_attrs = lappend(n->fk_each_attrs, > > + makeString(elem->each?"t":"f")); > > + } > > + } > > This processing should happen in parse_utilcmd.c rather than gram.y. Magic > values "t" and "f" won't do. Make fk_attrs a list of ForeignKeyColumnElem or > else at least use an int list of true/false. Either way, you would no longer > need the decode loop in tablecmds.c. Fixed. We have removed the magic values and moved decoding of the ForeignKeyColumnElem list inside parse_utilcmd.c. > > *** a/src/backend/utils/adt/arrayfuncs.c > > --- b/src/backend/utils/adt/arrayfuncs.c > > > + Datum > > + array_replace(PG_FUNCTION_ARGS) > > + { > > > + for (i = 0; i < nitems; i++) > > + { > > + bool isNull; > > + bool oprresult; > > + > > + /* Get source element, checking for NULL */ > > + if (bitmap && (*bitmap & bitmask) == 0) > > + { > > + if (old_value_isnull) > > + { > > + values[i] = new_value; > > + isNull = false; > > + changed = true; > > + } > > + else > > + isNull = true; > > + } > > This does the wrong thing if old_value_isnull == new_value_isnull == true: > > [local] regression=# select array_replace('{4,null,5}'::int[], null, null); > array_replace > --------------- > {4,0,5} > > (With text[], that yields a crash.) Fixed. > > *** a/src/backend/utils/adt/ri_triggers.c > > --- b/src/backend/utils/adt/ri_triggers.c > > > ! Datum > > ! RI_FKey_eachcascade_upd(PG_FUNCTION_ARGS) > > { > > > ! /* ---------- > > ! * The query string built is > > ! * UPDATE ONLY <fktable> > > ! * SET fkatt1 = array_replace(fkatt1, $n, $1) [, ...] > > ! * WHERE $n = fkatt1 [AND ...] > > ! * The type id's for the $ parameters are those of the > > ! * corresponding PK attributes. > > ! * ---------- > > ! */ > > ! initStringInfo(&querybuf); > > ! initStringInfo(&qualbuf); > > ! quoteRelationName(fkrelname, fk_rel); > > ! appendStringInfo(&querybuf, "UPDATE ONLY %s SET", fkrelname); > > ! querysep = ""; > > ! qualsep = "WHERE"; > > An elog(ERROR) when riinfo.nkeys != 1 would make sense here and for other > functions where we forbid multiple columns at creation time. If nothing else, > this needs a comment that the looping is vestigial and the function only > supports single-column constraints. Logic implemented and comment added. > > *************** > > *** 2678,2695 **** RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel) > > * The query string built is: > > * SELECT fk.keycols FROM ONLY relname fk > > * LEFT OUTER JOIN ONLY pkrelname pk > > ! * ON (pk.pkkeycol1=fk.keycol1 [AND ...]) > > * WHERE pk.pkkeycol1 IS NULL AND > > * For MATCH unspecified: > > * (fk.keycol1 IS NOT NULL [AND ...]) > > * For MATCH FULL: > > * (fk.keycol1 IS NOT NULL [OR ...]) > > * > > * We attach COLLATE clauses to the operators when comparing columns > > * that have different collations. > > *---------- > > */ > > initStringInfo(&querybuf); > > appendStringInfo(&querybuf, "SELECT "); > > sep = ""; > > for (i = 0; i < riinfo.nkeys; i++) > > --- 3527,3557 ---- > > * The query string built is: > > * SELECT fk.keycols FROM ONLY relname fk > > * LEFT OUTER JOIN ONLY pkrelname pk > > ! * ON (pk.pkkeycol1=fk.keycol1 [AND ...] > > ! * [AND NOT EXISTS(<RECHECK_SUBQUERY>)]) > > * WHERE pk.pkkeycol1 IS NULL AND > > * For MATCH unspecified: > > * (fk.keycol1 IS NOT NULL [AND ...]) > > * For MATCH FULL: > > * (fk.keycol1 IS NOT NULL [OR ...]) > > * > > + * In case of an EACH foreign key, a recheck subquery is added to > > + * the join condition in order to check that every combination of keys > > + * is actually referenced. > > + * The RECHECK_SUBQUERY is > > + * SELECT 1 FROM > > + * unnest(fk.keycol1) x1(x1) [CROSS JOIN ...] > > + * LEFT OUTER JOIN ONLY pkrelname pk > > + * ON (pk.pkkeycol1=x1.x1 [AND ...]) > > + * WHERE pk.pkkeycol1 IS NULL AND > > + * (fk.keycol1 IS NOT NULL [AND ...]) > > What is a worst-case query plan for a constraint with n ordinary keys and m > EACH keys? The subquery complexity can be compared with the one of the main query. Anyway, it is a one-off cost, paid at constraint creation, which I believe is acceptable. The fact that we now implement only single-column EACH foreign keys might suggest that we postpone this discussion at a later time. > > > + * > > * We attach COLLATE clauses to the operators when comparing columns > > * that have different collations. > > *---------- > > */ > > > *** a/src/include/nodes/parsenodes.h > > --- b/src/include/nodes/parsenodes.h > > *************** > > *** 570,575 **** typedef struct DefElem > > --- 570,589 ---- > > } DefElem; > > > > /* > > + * ForeignKeyColumnElem - foreign key column (used in foreign key constaint) > > Typo. Fixed. > > > + * > > + * For a foreign key attribute, 'name' is the name of the table column to > > + * index, and each is true if it is an EACH fk. > > + */ > > + typedef struct ForeignKeyColumnElem > > + { > > + NodeTag type; > > + Node *name; /* name of the column, or NULL */ > > + bool each; /* true if an EACH foreign key */ > > + > > + } ForeignKeyColumnElem; > > + > > + /* > > * LockingClause - raw representation of FOR UPDATE/SHARE options > > * > > * Note: lockedRels == NIL means "all relations in query". Otherwise it > > *************** > > *** 1510,1515 **** typedef enum ConstrType /* types of constraints */ > > --- 1524,1531 ---- > > #define FKCONSTR_ACTION_CASCADE 'c' > > #define FKCONSTR_ACTION_SETNULL 'n' > > #define FKCONSTR_ACTION_SETDEFAULT 'd' > > + #define FKCONSTR_ACTION_ARRCASCADE 'C' > > + #define FKCONSTR_ACTION_ARRSETNULL 'N' > > These names need update for the array -> each terminology change. Fixed. > I consider these the core changes needed to reach Ready for Committer: > > - Fix crash in array_replace(arr, null, null). > - Don't look through the domain before calling can_coerce_type(). > - Compare operator family of pfeqop and TYPECACHE_EQ_OPR at creation. > - Move post-processing from gram.y and remove "t"/"f" magic values. All done.
Attachment
pgsql-hackers by date: