Re: [PATCH] Support for foreign keys with arrays - Mailing list pgsql-hackers
From | Noah Misch |
---|---|
Subject | Re: [PATCH] Support for foreign keys with arrays |
Date | |
Msg-id | 20120317033312.GB19556@tornado.leadboat.com Whole thread Raw |
In response to | Re: [PATCH] Support for foreign keys with arrays (Marco Nenciarini <marco.nenciarini@2ndQuadrant.it>) |
Responses |
Re: [PATCH] Support for foreign keys with arrays
|
List | pgsql-hackers |
[used followup EACH-foreign-key.v4b.patch.bz2 for review] On Wed, Mar 14, 2012 at 07:03:08PM +0100, Marco Nenciarini wrote: > please find attached v4 of the EACH Foreign Key patch (formerly known > also as Foreign Key Array). > On Fri, Feb 24, 2012 at 09:01:35PM -0500, Noah Misch wrote: > > > We can use multi-dimensional arrays as well as referencing columns. In > > > that case though, ON DELETE EACH CASCADE will behave like ON DELETE EACH > > > SET NULL. This is a safe way of implementing the action. > > > We have some ideas on how to implement this, but we feel it is better to > > > limit the behaviour for now. > > > > This still feels awfully unprincipled to me. How about just throwing an error > > when we need to remove an element from a multidimensional array? Essentially, > > have ON DELETE EACH CASCADE downgrade to ON DELETE RESTRICT when it encounters > > a multidimensional array. That's strictly less code than what you have now, > > and it keeps our options open. We can always change from error to set-null > > later, but we can't so readily change from set-null to anything else. > > That seems reasonable to me. Implemented and documented. I like the semantics now. You implemented this by doing two scans per PK delete, the first to detect multidimensional dependants of the PK row and the second to fix 1-dimensional dependants. We don't require an index to support the scan, so this can mean two seq scans. Currently, the only benefit to doing it this way is a change of error message. Here is the current error message when we would need to remove a multidimensional array element: ERROR: update or delete on table "pktableforarray" violates foreign key constraint "fktableforarray_ftest1_fkey" on table"fktableforarray" DETAIL: Key (ptest1)=(2) is still referenced from table "fktableforarray". If I just rip out the first scan, we get the same outcome with a different error message: ERROR: removing elements from multidimensional arrays is not supported CONTEXT: SQL statement "UPDATE ONLY "public"."fktableforarray"SET "ftest1" = array_remove("ftest1", $1) WHERE $1 OPERATOR(pg_catalog.=) ANY ("ftest1")" That has less polish, but I think it's actually more useful. The first message gives no indication that a multidimensional array foiled your ON DELETE EACH CASCADE. The second message hints at that cause. I recommend removing the new block of code in RI_FKey_eachcascade_del() and letting array_remove() throw the error. If you find a way to throw a nicer error without an extra scan, by all means submit that to a future CF as an improvement. I don't think it's important enough to justify cycles at this late hour of the current CF. > > As I pondered this patch again, I came upon formal hazards around non-default > > operator classes. Today, ATAddForeignKeyConstraint() always chooses pfeqop, > > ffeqop and ppeqop in the same operator family. If it neglected this, we would > > get wrong behavior when one of the operators is sensitive to a change to which > > another is insensitive. For EACH FK constraints, this patch sets ffeqop = > > ARRAY_EQ_OP unconditionally. array_eq() uses the TYPECACHE_EQ_OPR (usually > > from the default B-tree operator class). That operator may not exist at all, > > let alone share an operator family with pfeqop. Shall we deal with this by > > retrieving TYPECACHE_EQ_OPR in ATAddForeignKeyConstraint() and rejecting the > > constraint creation if it does not share an operator family with pfeqop? The > > same hazard affects ri_triggers.c use of array_remove() and array_replace(), > > and the same change mitigates it. > > Check added. As a consequence of this stricter policy, certain > previously allowed cases are now unacceptable (e.g pk FLOAT and fk > INT[]). > Regression tests have been added. Ah, good. Not so formal after all. > > pg_constraint.confeach needs documentation. > > Most of pg_constraint columns, including all the boolean ones, are > documented only in the "description" column of > > http://www.postgresql.org/docs/9.1/static/catalog-pg-constraint.html#AEN85760 > > it seems that confiseach should not be an exception, since it just > indicates whether the constraint is of a certain kind or not. Your patch adds two columns to pg_constraint, confiseach and confeach, but it only mentions confiseach in documentation. Just add a similar minimal mention of confeach. > > > *************** > > > *** 5736,5741 **** ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation rel, > > > --- 5759,5807 ---- > > > (errcode(ERRCODE_INVALID_FOREIGN_KEY), > > > errmsg("number of referencing and referenced columns for foreign key disagree"))); > > > > > > + /* Enforce each foreign key restrictions */ > > > + if (fkconstraint->fk_is_each) > > > + { > > > + /* > > > + * ON UPDATE CASCADE action is not supported on EACH foreign keys > > > + */ > > > + if (fkconstraint->fk_upd_action == FKCONSTR_ACTION_CASCADE) > > > + ereport(ERROR, > > > + (errcode(ERRCODE_INVALID_FOREIGN_KEY), > > > + errmsg("ON UPDATE CASCADE action is not supported on " > > > + "EACH foreign keys"), > > > + errhint("Use ON UPDATE EACH CASCADE, instead"))); > > > > Note from the Error Message Style Guide that errhint and errdetail messages > > shall be complete sentences. > > Fixed. That is to say, they start with a capital letter and end with a period. Your old text was fine apart from the lack of a period. (Your new text also lacks a period.) > > This processing should happen in parse_utilcmd.c rather than gram.y. Magic > > values "t" and "f" won't do. Make fk_attrs a list of ForeignKeyColumnElem or > > else at least use an int list of true/false. Either way, you would no longer > > need the decode loop in tablecmds.c. > > Fixed. We have removed the magic values and moved decoding of the > ForeignKeyColumnElem list inside parse_utilcmd.c. When I said "int list", I meant a T_IntList (via lappend_int(), lfirst_int(), etc.). You used a T_List of T_Integer. > > > *************** > > > *** 2678,2695 **** RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel) > > > * The query string built is: > > > * SELECT fk.keycols FROM ONLY relname fk > > > * LEFT OUTER JOIN ONLY pkrelname pk > > > ! * ON (pk.pkkeycol1=fk.keycol1 [AND ...]) > > > * WHERE pk.pkkeycol1 IS NULL AND > > > * For MATCH unspecified: > > > * (fk.keycol1 IS NOT NULL [AND ...]) > > > * For MATCH FULL: > > > * (fk.keycol1 IS NOT NULL [OR ...]) > > > * > > > * We attach COLLATE clauses to the operators when comparing columns > > > * that have different collations. > > > *---------- > > > */ > > > initStringInfo(&querybuf); > > > appendStringInfo(&querybuf, "SELECT "); > > > sep = ""; > > > for (i = 0; i < riinfo.nkeys; i++) > > > --- 3527,3557 ---- > > > * The query string built is: > > > * SELECT fk.keycols FROM ONLY relname fk > > > * LEFT OUTER JOIN ONLY pkrelname pk > > > ! * ON (pk.pkkeycol1=fk.keycol1 [AND ...] > > > ! * [AND NOT EXISTS(<RECHECK_SUBQUERY>)]) > > > * WHERE pk.pkkeycol1 IS NULL AND > > > * For MATCH unspecified: > > > * (fk.keycol1 IS NOT NULL [AND ...]) > > > * For MATCH FULL: > > > * (fk.keycol1 IS NOT NULL [OR ...]) > > > * > > > + * In case of an EACH foreign key, a recheck subquery is added to > > > + * the join condition in order to check that every combination of keys > > > + * is actually referenced. > > > + * The RECHECK_SUBQUERY is > > > + * SELECT 1 FROM > > > + * unnest(fk.keycol1) x1(x1) [CROSS JOIN ...] > > > + * LEFT OUTER JOIN ONLY pkrelname pk > > > + * ON (pk.pkkeycol1=x1.x1 [AND ...]) > > > + * WHERE pk.pkkeycol1 IS NULL AND > > > + * (fk.keycol1 IS NOT NULL [AND ...]) > > > > What is a worst-case query plan for a constraint with n ordinary keys and m > > EACH keys? > > The subquery complexity can be compared with the one of the main query. > Anyway, it is a one-off cost, paid at constraint creation, which I > believe is acceptable. The fact that we now implement only single-column > EACH foreign keys might suggest that we postpone this discussion at a > later time. If the cost doesn't exceed O(F log P), where F is the size of the FK table and P is the size of the PK table, I'm not worried. If it can be O(F^2), we would have a problem to be documented, if not fixed. Your change to array_replace() made me look at array_remove() again and realize that it needs the same treatment. This command yields a segfault: SELECT array_remove('{a,null}'::text[], null); This latest version introduces two calls to get_element_type(), both of which should be get_base_element_type(). Patch "Avoid FK validations for no-rewrite ALTER TABLE ALTER TYPE", committed between v3b and v4 of this patch, added code to ATAddForeignKeyConstraint() requiring an update in this patch. Run this in the regression DB: [local] regression=# alter table fktableforeachfk alterftest1 type int[]; ERROR: could not find cast from 1007 to 23 RI_PLAN_EACHCASCADE_DEL_DOUPDATE should be RI_PLAN_EACHCASCADE_DEL_DODELETE. I think the patch has reached the stage where a committer can review it without wasting much time on things that might change radically. So, I'm marking it Ready for Committer. Please still submit an update correcting the above; I'm sure your committer will appreciate it. Thanks, nm
pgsql-hackers by date: