Thread: altering a column's collation leaves an invalid foreign key
Dear hackers, I was looking at how foreign keys deal with collations, and I came across this comment about not re-checking a foreign key if the column type changes in a compatible way: * Since we require that all collations share the same notion of * equality (which they do, because texteq reduces to bitwise * equality), we don't compare collation here. But now that we have nondeterministic collations, isn't that out of date? For instance I could make this foreign key: paul=# create collation itext (provider = 'icu', locale = 'und-u-ks-level1', deterministic = false); CREATE COLLATION paul=# create table t1 (id text collate itext primary key); CREATE TABLE paul=# create table t2 (id text, parent_id text references t1); CREATE TABLE And then: paul=# insert into t1 values ('a'); INSERT 0 1 paul=# insert into t2 values ('.', 'A'); INSERT 0 1 So far that behavior seems correct, because the user told us 'a' and 'A' were equivalent, but now I can change the collation on the referenced table and the FK doesn't complain: paul=# alter table t1 alter column id type text collate "C"; ALTER TABLE The constraint claims to be valid, but I can't drop & add it: paul=# alter table t2 drop constraint t2_parent_id_fkey; ALTER TABLE paul=# alter table t2 add constraint t2_parent_id_fkey foreign key (parent_id) references t1; ERROR: insert or update on table "t2" violates foreign key constraint "t2_parent_id_fkey" DETAIL: Key (parent_id)=(A) is not present in table "t1". Isn't that a problem? Perhaps if the previous collation was nondeterministic we should force a re-check. (Tested on 17devel 697f8d266c and also 16.) Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
On 3/23/24 10:04, Paul Jungwirth wrote: > Perhaps if the previous collation was nondeterministic we should force a re-check. Here is a patch implementing this. It was a bit more fuss than I expected, so maybe someone has a better way. We have had nondeterministic collations since v12, so perhaps it is something to back-patch? Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
Attachment
On Mon, Mar 25, 2024 at 2:47 PM Paul Jungwirth <pj@illuminatedcomputing.com> wrote: > > On 3/23/24 10:04, Paul Jungwirth wrote: > > Perhaps if the previous collation was nondeterministic we should force a re-check. > > Here is a patch implementing this. It was a bit more fuss than I expected, so maybe someone has a > better way. > + /* test follows the one in ri_FetchConstraintInfo() */ + if (ARR_NDIM(arr) != 1 || + ARR_HASNULL(arr) || + ARR_ELEMTYPE(arr) != INT2OID) + elog(ERROR, "conkey is not a 1-D smallint array"); + attarr = (AttrNumber *) ARR_DATA_PTR(arr); + + /* stash a List of the collation Oids in our Constraint node */ + for (i = 0; i < numkeys; i++) + con->old_collations = lappend_oid(con->old_collations, + list_nth_oid(changedCollationOids, attarr[i] - 1)); I don't understand the "ri_FetchConstraintInfo" comment. +static void +RememberCollationForRebuilding(AttrNumber attnum, AlteredTableInfo *tab) +{ + Oid typid; + int32 typmod; + Oid collid; + ListCell *lc; + + /* Fill in the list with InvalidOid if this is our first visit */ + if (tab->changedCollationOids == NIL) + { + int len = RelationGetNumberOfAttributes(tab->rel); + int i; + + for (i = 0; i < len; i++) + tab->changedCollationOids = lappend_oid(tab->changedCollationOids, + InvalidOid); + } + + get_atttypetypmodcoll(RelationGetRelid(tab->rel), attnum, + &typid, &typmod, &collid); + + lc = list_nth_cell(tab->changedCollationOids, attnum - 1); + lfirst_oid(lc) = collid; +} do we need to check if `collid` is a valid collation? like: if (!OidIsValid(collid)) { lc = list_nth_cell(tab->changedCollationOids, attnum - 1); lfirst_oid(lc) = collid; }
On Tue, Mar 26, 2024 at 1:00 PM jian he <jian.universality@gmail.com> wrote: > > On Mon, Mar 25, 2024 at 2:47 PM Paul Jungwirth > <pj@illuminatedcomputing.com> wrote: > > > > On 3/23/24 10:04, Paul Jungwirth wrote: > > > Perhaps if the previous collation was nondeterministic we should force a re-check. > > > > Here is a patch implementing this. It was a bit more fuss than I expected, so maybe someone has a > > better way. > > > > + /* test follows the one in ri_FetchConstraintInfo() */ > + if (ARR_NDIM(arr) != 1 || > + ARR_HASNULL(arr) || > + ARR_ELEMTYPE(arr) != INT2OID) > + elog(ERROR, "conkey is not a 1-D smallint array"); > + attarr = (AttrNumber *) ARR_DATA_PTR(arr); > + > + /* stash a List of the collation Oids in our Constraint node */ > + for (i = 0; i < numkeys; i++) > + con->old_collations = lappend_oid(con->old_collations, > + list_nth_oid(changedCollationOids, attarr[i] - 1)); > > I don't understand the "ri_FetchConstraintInfo" comment. I still don't understand this comment. > > +static void > +RememberCollationForRebuilding(AttrNumber attnum, AlteredTableInfo *tab) > +{ > + Oid typid; > + int32 typmod; > + Oid collid; > + ListCell *lc; > + > + /* Fill in the list with InvalidOid if this is our first visit */ > + if (tab->changedCollationOids == NIL) > + { > + int len = RelationGetNumberOfAttributes(tab->rel); > + int i; > + > + for (i = 0; i < len; i++) > + tab->changedCollationOids = lappend_oid(tab->changedCollationOids, > + InvalidOid); > + } > + > + get_atttypetypmodcoll(RelationGetRelid(tab->rel), attnum, > + &typid, &typmod, &collid); > + > + lc = list_nth_cell(tab->changedCollationOids, attnum - 1); > + lfirst_oid(lc) = collid; > +} > > do we need to check if `collid` is a valid collation? > like: > > if (!OidIsValid(collid)) > { > lc = list_nth_cell(tab->changedCollationOids, attnum - 1); > lfirst_oid(lc) = collid; > } now I think RememberCollationForRebuilding is fine. no need further change. in ATAddForeignKeyConstraint. if (old_check_ok) { /* * When a pfeqop changes, revalidate the constraint. We could * permit intra-opfamily changes, but that adds subtle complexity * without any concrete benefit for core types. We need not * assess ppeqop or ffeqop, which RI_Initial_Check() does not use. */ old_check_ok = (pfeqop == lfirst_oid(old_pfeqop_item)); old_pfeqop_item = lnext(fkconstraint->old_conpfeqop, old_pfeqop_item); } maybe we can do the logic right below it. like: if (old_check_ok) { * All deterministic collations use bitwise equality to resolve * ties, but if the previous collation was nondeterministic, * we must re-check the foreign key, because some references * that use to be "equal" may not be anymore. If we have * InvalidOid here, either there was no collation or the * attribute didn't change. old_check_ok = (old_collation == InvalidOid || get_collation_isdeterministic(old_collation)); } then we don't need to cram it with the other `if (old_check_ok){}`. do we need to add an entry in https://wiki.postgresql.org/wiki/PostgreSQL_17_Open_Items#Open_Issues ?
On Fri, Apr 12, 2024 at 5:06 PM jian he <jian.universality@gmail.com> wrote: > > On Tue, Mar 26, 2024 at 1:00 PM jian he <jian.universality@gmail.com> wrote: > > > > On Mon, Mar 25, 2024 at 2:47 PM Paul Jungwirth > > <pj@illuminatedcomputing.com> wrote: > > > > > > On 3/23/24 10:04, Paul Jungwirth wrote: > > > > Perhaps if the previous collation was nondeterministic we should force a re-check. > > > > > > Here is a patch implementing this. It was a bit more fuss than I expected, so maybe someone has a > > > better way. I think I found a simple way. the logic is: * ATExecAlterColumnType changes one column once at a time. * one column can only have one collation. so we don't need to store a list of collation oid. * ATExecAlterColumnType we can get the new collation (targetcollid) and original collation info. * RememberAllDependentForRebuilding will check the column dependent, whether this column is referenced by a foreign key or not information is recorded. so AlteredTableInfo->changedConstraintOids have the primary key and foreign key oids. * ATRewriteCatalogs will call ATPostAlterTypeCleanup (see the comments in ATRewriteCatalogs) * for tab->changedConstraintOids (foreign key, primary key) will call ATPostAlterTypeParse, so for foreign key (con->contype == CONSTR_FOREIGN) will call TryReuseForeignKey. * in TryReuseForeignKey, we can pass the information that our primary key old collation is nondeterministic and old collation != new collation to the foreign key constraint. so foreign key can act accordingly at ATAddForeignKeyConstraint (old_check_ok). based on the above logic, I add one bool in struct AlteredTableInfo, one bool in struct Constraint. bool in AlteredTableInfo is for storing it, later passing it to struct Constraint. we need bool in Constraint because ATAddForeignKeyConstraint needs it.
Attachment
On Sat, Apr 13, 2024 at 9:13 PM jian he <jian.universality@gmail.com> wrote: > > > > > Here is a patch implementing this. It was a bit more fuss than I expected, so maybe someone has a > > > > better way. > I think I found a simple way. > > the logic is: > * ATExecAlterColumnType changes one column once at a time. > * one column can only have one collation. so we don't need to store a > list of collation oid. > * ATExecAlterColumnType we can get the new collation (targetcollid) > and original collation info. > * RememberAllDependentForRebuilding will check the column dependent, > whether this column is referenced by a foreign key or not information > is recorded. > so AlteredTableInfo->changedConstraintOids have the primary key and > foreign key oids. > * ATRewriteCatalogs will call ATPostAlterTypeCleanup (see the comments > in ATRewriteCatalogs) > * for tab->changedConstraintOids (foreign key, primary key) will call > ATPostAlterTypeParse, so > for foreign key (con->contype == CONSTR_FOREIGN) will call TryReuseForeignKey. > * in TryReuseForeignKey, we can pass the information that our primary > key old collation is nondeterministic > and old collation != new collation to the foreign key constraint. > so foreign key can act accordingly at ATAddForeignKeyConstraint (old_check_ok). > > > based on the above logic, I add one bool in struct AlteredTableInfo, > one bool in struct Constraint. > bool in AlteredTableInfo is for storing it, later passing it to struct > Constraint. > we need bool in Constraint because ATAddForeignKeyConstraint needs it. I refactored the comments. also added some extra tests hoping to make it more bullet proof, maybe it's redundant.
Attachment
jian he <jian.universality@gmail.com> writes: >> * in TryReuseForeignKey, we can pass the information that our primary >> key old collation is nondeterministic >> and old collation != new collation to the foreign key constraint. I have a basic question about this: why are we allowing FKs to be based on nondeterministic collations at all? ISTM that that breaks the assumption that there is exactly one referenced row for any referencing row. regards, tom lane
On Sat, Jun 8, 2024 at 4:12 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > jian he <jian.universality@gmail.com> writes: > >> * in TryReuseForeignKey, we can pass the information that our primary > >> key old collation is nondeterministic > >> and old collation != new collation to the foreign key constraint. > > I have a basic question about this: why are we allowing FKs to be > based on nondeterministic collations at all? ISTM that that breaks > the assumption that there is exactly one referenced row for any > referencing row. > for FKs nondeterministic, I think that would require the PRIMARY KEY collation to not be indeterministic also. for example: CREATE COLLATION ignore_accent_case (provider = icu, deterministic = false, locale = 'und-u-ks-level1'); DROP TABLE IF EXISTS fktable, pktable; CREATE TABLE pktable (x text COLLATE ignore_accent_case PRIMARY KEY); CREATE TABLE fktable (x text REFERENCES pktable on update cascade on delete cascade); INSERT INTO pktable VALUES ('A'); INSERT INTO fktable VALUES ('a'); INSERT INTO fktable VALUES ('A'); update pktable set x = 'Å'; table fktable; if FK is nondeterministic, then it looks PK more like FK. the following example, one FK row is referenced by two PK rows. DROP TABLE IF EXISTS fktable, pktable; CREATE TABLE pktable (x text COLLATE "C" PRIMARY KEY); CREATE TABLE fktable (x text COLLATE ignore_accent_case REFERENCES pktable on update cascade on delete cascade); INSERT INTO pktable VALUES ('A'), ('Å'); INSERT INTO fktable VALUES ('A'); begin; delete from pktable where x = 'Å'; TABLE fktable; rollback; begin; delete from pktable where x = 'A'; TABLE fktable; rollback;
On 08.06.24 06:14, jian he wrote: > if FK is nondeterministic, then it looks PK more like FK. > the following example, one FK row is referenced by two PK rows. > > DROP TABLE IF EXISTS fktable, pktable; > CREATE TABLE pktable (x text COLLATE "C" PRIMARY KEY); > CREATE TABLE fktable (x text COLLATE ignore_accent_case REFERENCES > pktable on update cascade on delete cascade); > INSERT INTO pktable VALUES ('A'), ('Å'); > INSERT INTO fktable VALUES ('A'); Yes, this is a problem. The RI checks are done with the collation of the primary key. The comment at ri_GenerateQualCollation() says "the SQL standard specifies that RI comparisons should use the referenced column's collation". But this is not what it says in my current copy. ... [ digs around ISO archives ] ... Yes, this was changed in SQL:2016 to require the collation on the PK side and the FK side to match at constraint creation time. The argument made is exactly the same we have here. This was actually already the rule in SQL:1999 but was then relaxed in SQL:2003 and then changed back because it was a mistake. We probably don't need to enforce this for deterministic collations, which would preserve some backward compatibility. I'll think some more about what steps to take to solve this and what to do about back branches etc.
On Tue, Jun 18, 2024 at 4:50 PM Peter Eisentraut <peter@eisentraut.org> wrote: > > On 08.06.24 06:14, jian he wrote: > > if FK is nondeterministic, then it looks PK more like FK. > > the following example, one FK row is referenced by two PK rows. > > > > DROP TABLE IF EXISTS fktable, pktable; > > CREATE TABLE pktable (x text COLLATE "C" PRIMARY KEY); > > CREATE TABLE fktable (x text COLLATE ignore_accent_case REFERENCES > > pktable on update cascade on delete cascade); > > INSERT INTO pktable VALUES ('A'), ('Å'); > > INSERT INTO fktable VALUES ('A'); > > Yes, this is a problem. The RI checks are done with the collation of > the primary key. > > The comment at ri_GenerateQualCollation() says "the SQL standard > specifies that RI comparisons should use the referenced column's > collation". But this is not what it says in my current copy. > > ... [ digs around ISO archives ] ... > > Yes, this was changed in SQL:2016 to require the collation on the PK > side and the FK side to match at constraint creation time. The argument > made is exactly the same we have here. This was actually already the > rule in SQL:1999 but was then relaxed in SQL:2003 and then changed back > because it was a mistake. > > We probably don't need to enforce this for deterministic collations, > which would preserve some backward compatibility. > > I'll think some more about what steps to take to solve this and what to > do about back branches etc. > I have come up with 3 corner cases. ---case1. not ok. PK indeterministic, FK default DROP TABLE IF EXISTS fktable, pktable; CREATE TABLE pktable (x text COLLATE ignore_accent_case PRIMARY KEY); CREATE TABLE fktable (x text REFERENCES pktable on update cascade on delete cascade); INSERT INTO pktable VALUES ('A'); INSERT INTO fktable VALUES ('a'); INSERT INTO fktable VALUES ('A'); RI_FKey_check (Check foreign key existence ) querybuf.data is SELECT 1 FROM ONLY "public"."pktable" x WHERE "x" OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x in here, fktable doesn't have collation. we cannot rewrite to SELECT 1 FROM ONLY "public"."pktable" x WHERE "x" OPERATOR(pg_catalog.=) $1 collate "C" FOR KEY SHARE OF x so assumption (only one referenced row for any referencing row) will break when inserting values to fktable. RI_FKey_check already allows invalidate values to happen, not sure how ri_GenerateQualCollation can help. overall i don't know how to stop invalid value while inserting value to fktable in this case. ---case2. not ok case PK deterministic, FK indeterministic DROP TABLE IF EXISTS fktable, pktable; CREATE TABLE pktable (x text COLLATE "C" PRIMARY KEY); CREATE TABLE fktable (x text COLLATE ignore_accent_case REFERENCES pktable on update cascade on delete cascade); INSERT INTO pktable VALUES ('A'), ('Å'); INSERT INTO fktable VALUES ('A'); begin; update pktable set x = 'B' where x = 'Å'; table fktable; rollback; begin; update pktable set x = 'B' where x = 'A'; table fktable; rollback; when cascade update fktable, in RI_FKey_cascade_upd we can use pktable's collation. but again, a query updating fktable only, using pktable collation seems strange. ---case3. ---not ok case PK indeterministic, FK deterministic DROP TABLE IF EXISTS fktable, pktable; CREATE TABLE pktable (x text COLLATE ignore_accent_case PRIMARY KEY); CREATE TABLE fktable (x text collate "C" REFERENCES pktable on update cascade on delete cascade); INSERT INTO pktable VALUES ('A'); INSERT INTO fktable VALUES ('A'); INSERT INTO fktable VALUES ('a'); begin; update pktable set x = 'Å'; table fktable; rollback; when we "INSERT INTO fktable VALUES ('a');" we can disallow this invalid query in RI_FKey_check by constructing the following stop query SELECT 1 FROM ONLY public.pktable x WHERE x OPERATOR(pg_catalog.=) 'a' collate "C" FOR KEY SHARE OF x; but this query associated PK table with fktable collation seems weird? summary: case1 seems hard to resolve case2 can be resolved in ri_GenerateQualCollation, not 100% sure. case3 can be resolved in RI_FKey_check because case1 is hard to resolve; so overall I feel like erroring out PK indeterministic and FK indeterministic while creating foreign keys is easier. We can mandate foreign keys and primary key columns be deterministic in ATAddForeignKeyConstraint. The attached patch does that. That means src/test/regress/sql/collate.icu.utf8.sql table test10pk, table test11pk will have a big change. so only attach src/backend/commands/tablecmds.c changes.
Attachment
On 07.06.24 08:39, jian he wrote: > On Sat, Apr 13, 2024 at 9:13 PM jian he <jian.universality@gmail.com> wrote: >> >>>>> Here is a patch implementing this. It was a bit more fuss than I expected, so maybe someone has a >>>>> better way. >> I think I found a simple way. >> >> the logic is: >> * ATExecAlterColumnType changes one column once at a time. >> * one column can only have one collation. so we don't need to store a >> list of collation oid. >> * ATExecAlterColumnType we can get the new collation (targetcollid) >> and original collation info. >> * RememberAllDependentForRebuilding will check the column dependent, >> whether this column is referenced by a foreign key or not information >> is recorded. >> so AlteredTableInfo->changedConstraintOids have the primary key and >> foreign key oids. >> * ATRewriteCatalogs will call ATPostAlterTypeCleanup (see the comments >> in ATRewriteCatalogs) >> * for tab->changedConstraintOids (foreign key, primary key) will call >> ATPostAlterTypeParse, so >> for foreign key (con->contype == CONSTR_FOREIGN) will call TryReuseForeignKey. >> * in TryReuseForeignKey, we can pass the information that our primary >> key old collation is nondeterministic >> and old collation != new collation to the foreign key constraint. >> so foreign key can act accordingly at ATAddForeignKeyConstraint (old_check_ok). >> >> >> based on the above logic, I add one bool in struct AlteredTableInfo, >> one bool in struct Constraint. >> bool in AlteredTableInfo is for storing it, later passing it to struct >> Constraint. >> we need bool in Constraint because ATAddForeignKeyConstraint needs it. > > I refactored the comments. > also added some extra tests hoping to make it more bullet proof, maybe > it's redundant. I like this patch version (v4). It's the simplest, probably also easiest to backpatch. It has a flaw: It will also trigger a FK recheck if you alter the collation of the referencing column (foreign key column), even though that is not necessary. (Note that your tests and the examples in this thread only discuss altering the PK column collation, because that is what is actually used during the foreign key checks.) Maybe there is an easy way to avoid that, but I couldn't see one in that patch structure. Maybe that is ok as a compromise. If, in the future, we make it a requirement that the collations on the PK and FK side have to be the same if either collation is nondeterministic, then this case can no longer happen anyway. And so building more infrastructure for this check might be wasted.
> So for the moment this is a master-only patch. I think once we have > tied down the behavior we want for the future /* * ri_GenerateQualCollation --- add a COLLATE spec to a WHERE clause * - * At present, we intentionally do not use this function for RI queries that - * compare a variable to a $n parameter. Since parameter symbols always have - * default collation, the effect will be to use the variable's collation. - * Now that is only strictly correct when testing the referenced column, since - * the SQL standard specifies that RI comparisons should use the referenced - * column's collation. However, so long as all collations have the same - * notion of equality (which they do, because texteq reduces to bitwise - * equality), there's no visible semantic impact from using the referencing - * column's collation when testing it, and this is a good thing to do because - * it lets us use a normal index on the referencing column. However, we do - * have to use this function when directly comparing the referencing and - * referenced columns, if they are of different collations; else the parser - * will fail to resolve the collation to use. + * We only have to use this function when directly comparing the referencing + * and referenced columns, if they are of different collations; else the + * parser will fail to resolve the collation to use. We don't need to use + * this function for RI queries that compare a variable to a $n parameter. + * Since parameter symbols always have default collation, the effect will be + * to use the variable's collation. + * + * Note that we require that the collations of the referencing and the + * referenced column have the some notion of equality: Either they have to + * both be deterministic or else they both have to be the same. */ " some notion of equality" should it be "same notion of equality"? maybe we can add "see ATAddForeignKeyConstraint also" at the end of the whole comment. /* * transformColumnNameList - transform list of column names * * Lookup each name and return its attnum and, optionally, type OID * * Note: the name of this function suggests that it's general-purpose, * but actually it's only used to look up names appearing in foreign-key * clauses. The error messages would need work to use it in other cases, * and perhaps the validity checks as well. */ static int transformColumnNameList(Oid relId, List *colList, int16 *attnums, Oid *atttypids, Oid *attcollids) comments need slight adjustment? comments in transformFkeyGetPrimaryKey also need slight change? ri_CompareWithCast, we can aslo add comments saying the output is collation aware. + /* + * This shouldn't be possible, but better check to make sure we have a + * consistent state for the check below. + */ + if ((pkcoll && !fkcoll) || (!pkcoll && fkcoll)) + elog(ERROR, "key columns are not both collatable"); + + if (pkcoll && fkcoll) cosmetic. pkcoll can change to OidIsValid(pkcoll) fkcoll change to OidIsValid(fkcoll). ereport(ERROR, (errcode(ERRCODE_COLLATION_MISMATCH), errmsg("foreign key constraint \"%s\" cannot be implemented", fkconstraint->conname), errdetail("Key columns \"%s\" and \"%s\" have incompatible collations: \"%s\" and \"%s\"." "If either collation is nondeterministic, then both collations have to be the same.", strVal(list_nth(fkconstraint->fk_attrs, i)), strVal(list_nth(fkconstraint->pk_attrs, i)), get_collation_name(fkcoll), get_collation_name(pkcoll)))); ERROR: foreign key constraint "a_fk_col1_fkey" cannot be implemented DETAIL: Key columns "col1" and "col1" have incompatible collations: "default" and "case_insensitive". If either collation is nondeterministic, then both collations have to be the same. "DETAIL: Key columns "col1" and "col1"" may be confusing. we can be more verbose. like DETAIL: Key columns "col1" in relation "X" and "col1" in relation "Y" have incompatible collations...... (just a idea, don't have much preference) old_check_ok = (new_pathtype == old_pathtype && new_castfunc == old_castfunc && (!IsPolymorphicType(pfeqop_right) || new_fktype == old_fktype) && (new_fkcoll == old_fkcoll || (get_collation_isdeterministic(old_fkcoll) && get_collation_isdeterministic(new_fkcoll)))); I am wondering if one of the new_fkcoll, old_fkcoll is zero, the other is not. turns out that would n't happen. before "if (old_check_ok)" we already did extensionsive check that new_fkcoll is ok for the job. in ATAddForeignKeyConstraint, we can leave the old_check_ok part as is. what do you think of the following: old_check_ok = (new_pathtype == old_pathtype && new_castfunc == old_castfunc && (!IsPolymorphicType(pfeqop_right) || new_fktype == old_fktype)); if (OidIsValid(new_fkcoll) && OidIsValid(old_fkcoll) && new_fkcoll != old_fkcoll) { if(!get_collation_isdeterministic(new_fkcoll)) elog(ERROR,"new_fkcoll should be deterministic"); }
On Fri, Oct 25, 2024 at 2:27 PM jian he <jian.universality@gmail.com> wrote: > > /* > * ri_GenerateQualCollation --- add a COLLATE spec to a WHERE clause > * > - * At present, we intentionally do not use this function for RI queries that > - * compare a variable to a $n parameter. Since parameter symbols always have > - * default collation, the effect will be to use the variable's collation. > - * Now that is only strictly correct when testing the referenced column, since > - * the SQL standard specifies that RI comparisons should use the referenced > - * column's collation. However, so long as all collations have the same > - * notion of equality (which they do, because texteq reduces to bitwise > - * equality), there's no visible semantic impact from using the referencing > - * column's collation when testing it, and this is a good thing to do because > - * it lets us use a normal index on the referencing column. However, we do > - * have to use this function when directly comparing the referencing and > - * referenced columns, if they are of different collations; else the parser > - * will fail to resolve the collation to use. > + * We only have to use this function when directly comparing the referencing > + * and referenced columns, if they are of different collations; else the > + * parser will fail to resolve the collation to use. We don't need to use > + * this function for RI queries that compare a variable to a $n parameter. > + * Since parameter symbols always have default collation, the effect will be > + * to use the variable's collation. > + * > + * Note that we require that the collations of the referencing and the > + * referenced column have the some notion of equality: Either they have to > + * both be deterministic or else they both have to be the same. > */ drop table if exists pktable, fktable; CREATE TABLE pktable (x text COLLATE "POSIX" PRIMARY KEY); CREATE TABLE fktable (x text COLLATE "C" REFERENCES pktable on update cascade on delete cascade); INSERT INTO pktable VALUES ('A'), ('Å'); INSERT INTO fktable VALUES ('A'); update pktable set x = 'a' collate "C" where x = 'A' collate "POSIX"; the cascade update fktable query string is: UPDATE ONLY "public"."fktable" SET "x" = $1 WHERE $2 OPERATOR(pg_catalog.=) "x" ideally it should be UPDATE ONLY "public"."fktable" SET "x" = $1 collate "C" WHERE $2 collate "POSIX" OPERATOR(pg_catalog.=) "x" as we already mentioned in several places: PK-FK tie either they have to both be deterministic or else they both have to be the same collation oid. so the reduction to UPDATE ONLY "public"."fktable" SET "x" = $1 WHERE $2 OPERATOR(pg_catalog.=) "x" is safe. but you look at SPI_execute_snapshot, _SPI_convert_params. then we can see the collation metadata is not keeped. > + We don't need to use > + * this function for RI queries that compare a variable to a $n parameter. > + * Since parameter symbols always have default collation, the effect will be > + * to use the variable's collation. so I think a better description is > + We don't need to use > + * this function for RI queries that compare a variable to a $n parameter. > + * Since parameter symbols don't have collation information, the effect will be > + * to use the variable's collation. you can see related discovery in https://www.postgresql.org/message-id/CACJufxEtPBWAk7nEn69ww2LKi9w1i4dLwd5gnjD1DQ2vaYoi2g%40mail.gmail.com
On 25.10.24 16:26, jian he wrote: > drop table if exists pktable, fktable; > CREATE TABLE pktable (x text COLLATE "POSIX" PRIMARY KEY); > CREATE TABLE fktable (x text COLLATE "C" REFERENCES pktable on update > cascade on delete cascade); > INSERT INTO pktable VALUES ('A'), ('Å'); > INSERT INTO fktable VALUES ('A'); > > update pktable set x = 'a' collate "C" where x = 'A' collate "POSIX"; > > the cascade update fktable query string is: > UPDATE ONLY "public"."fktable" SET "x" = $1 WHERE $2 OPERATOR(pg_catalog.=) "x" > ideally it should be > UPDATE ONLY "public"."fktable" SET "x" = $1 collate "C" WHERE $2 > collate "POSIX" OPERATOR(pg_catalog.=) "x" > > as we already mentioned in several places: PK-FK tie either they have to > both be deterministic or else they both have to be the same collation > oid. > so the reduction to > UPDATE ONLY "public"."fktable" SET "x" = $1 WHERE $2 OPERATOR(pg_catalog.=) "x" > is safe. > but you look at SPI_execute_snapshot, _SPI_convert_params. then we can see > the collation metadata is not keeped. > >> + We don't need to use >> + * this function for RI queries that compare a variable to a $n parameter. >> + * Since parameter symbols always have default collation, the effect will be >> + * to use the variable's collation. > > so I think a better description is >> + We don't need to use >> + * this function for RI queries that compare a variable to a $n parameter. >> + * Since parameter symbols don't have collation information, the effect will be >> + * to use the variable's collation. > > you can see related discovery in > https://www.postgresql.org/message-id/CACJufxEtPBWAk7nEn69ww2LKi9w1i4dLwd5gnjD1DQ2vaYoi2g%40mail.gmail.com I don't know. I don't think there is anything wrong with the existing code in this respect. Notably a parameter gets assigned its type's default collation (see src/backend/parser/parse_param.c), so the change of the comment as you suggest it is not correct. Also, I don't think this is actually related to the patch discussed in this thread.
On 25.10.24 08:23, jian he wrote: > ri_KeysEqual definitely deserves some comments. > for rel_is_pk, the equality is collation agnostic; > for rel_is_pk is false, the equality is collation aware. > > for example: > DROP TABLE IF EXISTS fktable, pktable; > CREATE TABLE pktable (x text COLLATE case_insensitive PRIMARY KEY); > CREATE TABLE fktable (x text collate case_insensitive REFERENCES > pktable on update restrict on delete restrict); > INSERT INTO pktable VALUES ('A'), ('Å'); > INSERT INTO fktable VALUES ('a'); > update pktable set x = 'a' where x = 'A'; > ERROR: update or delete on table "pktable" violates foreign key > constraint "fktable_x_fkey" on table "fktable" > DETAIL: Key (x)=(A) is still referenced from table "fktable". > this should not happen? Apparently this is intentional. It's the difference between RESTRICT and NO ACTION. In ri_restrict(), there is a comment: /* * If another PK row now exists providing the old key values, we should * not do anything. However, this check should only be made in the NO * ACTION case; in RESTRICT cases we don't wish to allow another row to be * substituted. */ In any case, this patch does not change this behavior. It exists in old versions as well.
On 14.11.24 03:21, jian he wrote: > On Wed, Nov 13, 2024 at 8:56 PM jian he <jian.universality@gmail.com> wrote: >> >> https://stackoverflow.com/questions/14921668/difference-between-restrict-and-no-action >> mentioned about the difference between "no action" and "restrict". >> >> RI_FKey_restrict_upd comments also says: >> >> * The SQL standard intends that this referential action occur exactly when >> * the update is performed, rather than after. This appears to be >> * the only difference between "NO ACTION" and "RESTRICT". In Postgres >> * we still implement this as an AFTER trigger, but it's non-deferrable. >> >> DROP TABLE IF EXISTS fktable, pktable; >> CREATE TABLE pktable (x text COLLATE case_insensitive PRIMARY KEY); >> CREATE TABLE fktable (x text collate case_insensitive REFERENCES >> pktable on update restrict on delete restrict); >> INSERT INTO pktable VALUES ('A'), ('Å'); >> INSERT INTO fktable VALUES ('a'); >> update pktable set x = 'a' where x = 'A'; > > my previous email was too verbose. > my point is this: given PK, FK both are case_insensitive. and > PK side values are all being referenced. > > case like `update pktable set x = 'a' where x = 'A';` > Do we consider PK side value changed? > it seems not mentioned anywhere. I think the PostgreSQL documentation is selling the difference between NO ACTION and RESTRICT a bit short. The SQL standard says this: > ON UPDATE RESTRICT: any change to a referenced column in the > referenced table is prohibited if there is a matching row. and > If a non-null value of a referenced column RC in the referenced table > is updated to a value that is distinct from the current value of RC, > then, [...] If UR specifies RESTRICT and there exists some matching > row, then an exception con- dition is raised [...] So this is exactly what is happening here. You can also reproduce this with things that are not strings with collations. You just need to find a type that has values that are "equal" but "distinct", which is not common, but it exists, for example 0.0 and -0.0 in floats. Example: create table parent (val float8 primary key); insert into parent values ('0.0'); create table child (id int, val float8 references parent (val)); insert into child values (1, '0.0'); insert into child values (2, '-0.0'); update parent set val = '-0.0'; -- ok with NO ACTION but create table child (id int, val float8 references parent (val) on update restrict); insert into child values (1, '0.0'); insert into child values (2, '-0.0'); update parent set val = '-0.0'; -- error with RESTRICT So this is a meaningful difference. There is also a bug here in that the update in the case of NO ACTION doesn't actually run, because it thinks the values are the same and the update can be skipped. I think there is room for improvement here, in the documentation, the tests, and maybe in the code. And while these are thematically related to this thread, they are actually separate issues. I propose that I go ahead with committing the v7 patch (with your typo fixes) and then we continue discussing these other issues afterwards in a separate thread. Given this overall picture, I'm also less and less inclined to do any backpatching bug fixing here. The status of this feature combination in the backbranches is just "don't push it too hard". Maybe someone has some feasible mitigation ideas, but I haven't seen any yet.
On Thu, Nov 14, 2024 at 4:04 PM Peter Eisentraut <peter@eisentraut.org> wrote: > I propose that I go ahead with committing the v7 patch (with your typo > fixes) and then we continue discussing these other issues afterwards in > a separate thread. > looks good to me. but in create_table.sgml <para> In addition, when the data in the referenced columns is changed, certain actions are performed on the data in this table's columns. <para/> I actually want to add a sentence like: If the references columns collation is indeterministic, we use bitwise comparison to check if the referenced columns data is being changed. searching the internet found out: — ON UPDATE RESTRICT: any change to a referenced column in the referenced table is prohibited if there is a matching row. — ON UPDATE NO ACTION (the default): there is no referential update action; the referential constraint only specifies a constraint check. NOTE 53 — Even if constraint checking is not deferred, ON UPDATE RESTRICT is a stricter condition than ON UPDATE NO ACTION. ON UPDATE RESTRICT prohibits an update to a particular row if there are any matching rows; ON UPDATE NO ACTION does not perform its constraint check until the entire set of rows to be updated has been processed. looking at NOTE 53, overall i think i get it.
On 14.11.24 12:35, jian he wrote: > On Thu, Nov 14, 2024 at 4:04 PM Peter Eisentraut <peter@eisentraut.org> wrote: > >> I propose that I go ahead with committing the v7 patch (with your typo >> fixes) and then we continue discussing these other issues afterwards in >> a separate thread. >> > looks good to me. done > but in create_table.sgml > <para> > In addition, when the data in the referenced columns is changed, > certain actions are performed on the data in this table's > columns. > <para/> > I actually want to add a sentence like: > > If the references columns collation is indeterministic, we use > bitwise comparison to > check if the referenced columns data is being changed. I think this is also part of the foreign key action behavior that is a separate topic. We can discuss documentation improvements as part of that.