pgsql: Fix collation handling for foreign keys - Mailing list pgsql-committers

From Peter Eisentraut
Subject pgsql: Fix collation handling for foreign keys
Date
Msg-id E1tBwso-001wX1-3B@gemulon.postgresql.org
Whole thread Raw
List pgsql-committers
Fix collation handling for foreign keys

Allowing foreign keys where the referenced and the referencing columns
have collations with different notions of equality is problematic.
This can only happen when using nondeterministic collations, for
example, if the referencing column is case-insensitive and the
referenced column is not, or vice versa.  It does not happen if both
collations are deterministic.

To show one example:

    CREATE COLLATION case_insensitive (provider = icu, deterministic = false, locale = 'und-u-ks-level2');

    CREATE TABLE pktable (x text COLLATE "C" PRIMARY KEY);
    CREATE TABLE fktable (x text COLLATE case_insensitive REFERENCES pktable ON UPDATE CASCADE ON DELETE CASCADE);
    INSERT INTO pktable VALUES ('A'), ('a');
    INSERT INTO fktable VALUES ('A');

    BEGIN; DELETE FROM pktable WHERE x = 'a'; TABLE fktable; ROLLBACK;
    BEGIN; DELETE FROM pktable WHERE x = 'A'; TABLE fktable; ROLLBACK;

Both of these DELETE statements delete the one row from fktable.  So
this means that one row from fktable references two rows in pktable,
which should not happen.  (That's why a primary key or unique
constraint is required on pktable.)

When nondeterministic collations were implemented, the SQL standard
available to yours truly said that referential integrity checks should
be performed with the collation of the referenced column, and so
that's how we implemented it.  But this turned out to be a mistake in
the SQL standard, for the same reasons as above, that was later
(SQL:2016) fixed to require both collations to be the same.  So that's
what we are aiming for here.

We don't have to be quite so strict.  We can allow different
collations if they are both deterministic.  This is also good for
backward compatibility.

So the new rule is that the collations either have to be the same or
both deterministic.  Or in other words, if one of them is
nondeterministic, then both have to be the same.

Users upgrading from before that have affected setups will need to
make changes to their schemas (i.e., change one or both collations in
affected foreign-key relationships) before the upgrade will succeed.

Some of the nice test cases for the previous situation in
collate.icu.utf8.sql are now obsolete.  They are changed to just check
the error checking of the new rule.  Note that collate.sql already
contained a test for foreign keys with different deterministic
collations.

A bunch of code in ri_triggers.c that added a COLLATE clause to
enforce the referenced column's collation can be removed, because both
columns now have to have the same notion of equality, so it doesn't
matter which one to use.

Reported-by: Paul Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/78d824e0-b21e-480d-a252-e4b84bc2c24b@illuminatedcomputing.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/9321d2fdf808e93095c4f86565dc8c8178675841

Modified Files
--------------
doc/src/sgml/ref/create_table.sgml             |   7 ++
src/backend/commands/tablecmds.c               |  84 +++++++++++++++++----
src/backend/utils/adt/ri_triggers.c            |  57 +++++---------
src/test/regress/expected/collate.icu.utf8.out | 100 ++-----------------------
src/test/regress/sql/collate.icu.utf8.sql      |  35 +--------
5 files changed, 105 insertions(+), 178 deletions(-)


pgsql-committers by date:

Previous
From: Álvaro Herrera
Date:
Subject: pgsql: Avoid deleting critical WAL segments during pg_rewind
Next
From: Fujii Masao
Date:
Subject: pgsql: Remove unnecessary backslash from CopyFrom() code.