Hi!
I think I’ve found a bug related to the strength of collations. Attached is a
WIP patch, that address some other issues too.
In this this example, the conflict of implicit collations propagates correctly:
WITH data (c, posix) AS (
values ('a' COLLATE "C", 'b' COLLATE "POSIX")
)
SELECT *
FROM data
ORDER BY ( c || posix ) || posix
ERROR: collation mismatch between implicit collations "C" and "POSIX"
LINE 6: ORDER BY ( c || posix ) || posix;
^
HINT: You can choose the collation by applying the COLLATE clause to one or both expressions.
However, if the conflict happens in a subquery, it doesn’t anymore:
WITH data (c, posix) AS (
values ('a' COLLATE "C", 'b' COLLATE "POSIX")
)
SELECT *
FROM (SELECT *, c || posix AS none FROM data) data
ORDER BY none || posix;
c | posix | none
---+-------+------
a | b | ab
(1 row)
The problem is in parse_collate.c:566: A Var (and some other nodes) without
valid collation gets the strength COLLATE_NONE:
if (OidIsValid(collation))
strength = COLLATE_IMPLICIT;
else
strength = COLLATE_NONE;
However, for a Var it should be COLLATE_CONFLICT, which corresponds to the
standards collation derivation “none”. I guess there could be other similar
cases which I don’t know about.
The patch fixes that, plus necessary consequences (error handling for new
scenarios) as well as some “cosmetic” improvements I found along the way.
Unnecessary to mention that this patch might break existing code.
With the patch, the second example form above fails similarly to the first example:
ERROR: collation mismatch between implicit collation "POSIX" and unknown collation
LINE 6: ORDER BY none || posix;
^
HINT: You can choose the collation by applying the COLLATE clause to one or both expressions.
Other changes in the patch:
** Error Handling
The current parse time error handling of implicit collisions has always both
colliding collations. The patch allows a COLLATE_CONFLICT without knowing which
collations caused the conflict (it’s not stored in the Var node). Thus we may
have know two, one or none of the collations that contributed to the collision.
The new function ereport_implicit_collation_mismatch takes care of that.
** Renaming COLLATE_NONE
I found the name COLLATE_NONE a little bit unfortuante as it can easily be
mistaken for the collation derivation “none” the SQL standard uses. I renamed
it to COLLATE_NA for “not applicable”. The standard doesn’t have a word for
that as non character string types just don’t have collations.
** Removal of location2
The code keeps track of the location (for parser_errposition) of both
collations that collided, even though it reports only one of them. The patch
removes location2.
Due to this, the some errors report the other location as before (previously
the second, now the first). See collate.out in the patch. The patch has TODO
comments for code that would be needed to take the other one.
Is there any policy to report the second location or to strictly keep the
errors where they used to be?
** General cleanup
The patch also removes a little code that I think is not needed (anymore).
** Tests
Besides the errposition, only one existing test breaks. It previously triggered
a runtime error, now it triggers a parse time error:
SELECT a, b FROM collate_test1 UNION ALL SELECT a, b FROM collate_test2 ORDER BY 2; -- fail
-ERROR: could not determine which collation to use for string comparison
-HINT: Use the COLLATE clause to set the collation explicitly.
+ERROR: collation mismatch between implicit collations
+LINE 1: SELECT a, b FROM collate_test1 UNION ALL SELECT a, b FROM co...
+ ^
+HINT: Use the COLLATE clause to set the collation explicitly
The patch also adds a new test to trigger the problem in the first place.
The patch is against REL_12_STABLE but applies to master too.
-markus