Conflict of implicit collations doesn't propagate out of subqueries - Mailing list pgsql-hackers
From | Markus Winand |
---|---|
Subject | Conflict of implicit collations doesn't propagate out of subqueries |
Date | |
Msg-id | BA5EBE75-49A9-4799-87D4-80541F59C812@winand.at Whole thread Raw |
Responses |
Re: Conflict of implicit collations doesn't propagate out of subqueries
|
List | pgsql-hackers |
Hi! I think I’ve found a bug related to the strength of collations. Attached is a WIP patch, that address some other issues too. In this this example, the conflict of implicit collations propagates correctly: WITH data (c, posix) AS ( values ('a' COLLATE "C", 'b' COLLATE "POSIX") ) SELECT * FROM data ORDER BY ( c || posix ) || posix ERROR: collation mismatch between implicit collations "C" and "POSIX" LINE 6: ORDER BY ( c || posix ) || posix; ^ HINT: You can choose the collation by applying the COLLATE clause to one or both expressions. However, if the conflict happens in a subquery, it doesn’t anymore: WITH data (c, posix) AS ( values ('a' COLLATE "C", 'b' COLLATE "POSIX") ) SELECT * FROM (SELECT *, c || posix AS none FROM data) data ORDER BY none || posix; c | posix | none ---+-------+------ a | b | ab (1 row) The problem is in parse_collate.c:566: A Var (and some other nodes) without valid collation gets the strength COLLATE_NONE: if (OidIsValid(collation)) strength = COLLATE_IMPLICIT; else strength = COLLATE_NONE; However, for a Var it should be COLLATE_CONFLICT, which corresponds to the standards collation derivation “none”. I guess there could be other similar cases which I don’t know about. The patch fixes that, plus necessary consequences (error handling for new scenarios) as well as some “cosmetic” improvements I found along the way. Unnecessary to mention that this patch might break existing code. With the patch, the second example form above fails similarly to the first example: ERROR: collation mismatch between implicit collation "POSIX" and unknown collation LINE 6: ORDER BY none || posix; ^ HINT: You can choose the collation by applying the COLLATE clause to one or both expressions. Other changes in the patch: ** Error Handling The current parse time error handling of implicit collisions has always both colliding collations. The patch allows a COLLATE_CONFLICT without knowing which collations caused the conflict (it’s not stored in the Var node). Thus we may have know two, one or none of the collations that contributed to the collision. The new function ereport_implicit_collation_mismatch takes care of that. ** Renaming COLLATE_NONE I found the name COLLATE_NONE a little bit unfortuante as it can easily be mistaken for the collation derivation “none” the SQL standard uses. I renamed it to COLLATE_NA for “not applicable”. The standard doesn’t have a word for that as non character string types just don’t have collations. ** Removal of location2 The code keeps track of the location (for parser_errposition) of both collations that collided, even though it reports only one of them. The patch removes location2. Due to this, the some errors report the other location as before (previously the second, now the first). See collate.out in the patch. The patch has TODO comments for code that would be needed to take the other one. Is there any policy to report the second location or to strictly keep the errors where they used to be? ** General cleanup The patch also removes a little code that I think is not needed (anymore). ** Tests Besides the errposition, only one existing test breaks. It previously triggered a runtime error, now it triggers a parse time error: SELECT a, b FROM collate_test1 UNION ALL SELECT a, b FROM collate_test2 ORDER BY 2; -- fail -ERROR: could not determine which collation to use for string comparison -HINT: Use the COLLATE clause to set the collation explicitly. +ERROR: collation mismatch between implicit collations +LINE 1: SELECT a, b FROM collate_test1 UNION ALL SELECT a, b FROM co... + ^ +HINT: Use the COLLATE clause to set the collation explicitly The patch also adds a new test to trigger the problem in the first place. The patch is against REL_12_STABLE but applies to master too. -markus
Attachment
pgsql-hackers by date: