On 05/06/18 07:10, Melanie Plageman wrote:
> Hi,
> I noticed what seems like a bug in collation.
>
> This query errors out:
> SELECT (SELECT 'c' COLLATE "de_DE") > (SELECT 'ç' COLLATE "es_ES"); --
> error
>
> While this query does not:
> SELECT 'c' COLLATE "de_DE" > (SELECT 'ç' COLLATE "es_ES"); -- no error
>
> It seems like this is in conflict with what the documentation
> <https://www.postgresql.org/docs/devel/static/collation.html> says:
> "If any input expression has an explicit collation derivation, then all
> explicitly derived collations among the input expressions must be the same,
> otherwise an error is raised."
>
> After digging into it a bit, I believe the reason for this is that when we
> are assigning collations in assign_collations_walker, we always set
> collation strength to IMPLICIT for the subquery and always set the
> collation strength to EXPLICIT for the collate node on the other side of
> the OpExpr. So, we don't hit an error later like the one in
> merge_collation_state when the collation of one expression is conflicting
> with that of its parent and the strength of both is EXPLICIT.
> I think this still applies to our case because one of the two arguments to
> OpExpr would have set their parent's collation strength to either IMPLICIT
> or EXPLICIT and then we will process the other argument which would then
> have a different collation strength than the one we just set its parent to.
> So, we end up setting the inputcollid for the OpExpr to that of the
> explicit collation in the collate node.
>
> Basically, it seems like our subquery will always have its collation
> strength set to IMPLICIT, so, if we have explicit collation in the first
> target entry of the subquery's target list, it looks like we will never
> truly treat that as explicit collation.
Right, a subquery's result is considered IMPLICIT, even if there is an
explicit COLLATE inside the subquery.
You could mark the subquery's result with a collation like this:
postgres=# SELECT 'c' COLLATE "de_DE" > (SELECT 'ç') COLLATE "es_ES";
ERROR: collation mismatch between explicit collations "de_DE" and "es_ES"
> I'm not sure if this behavior is considered a bug, but I also can't imagine
> how it would be expected given the current documentation. It seems to me
> one or the other should be updated.
It seems correct to me. It does say "An explicit collation derivation
occurs when a COLLATE clause is used; all other collation derivations
are implicit". A subquery falls under the "all other collation
derivations" category. Perhaps we could make it more clear what the
COLLATE clause binds to, especially with subqueries, but I'm not sure
how exactly to phrase it. Perhaps an additional example with a subquery
would help?
- Heikki