Re: Hash join not finding which collation to use for string hashing - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Hash join not finding which collation to use for string hashing
Date
Msg-id 9705.1580265968@sss.pgh.pa.us
Whole thread Raw
In response to Hash join not finding which collation to use for string hashing  (Mark Dilger <mark.dilger@enterprisedb.com>)
Responses Re: Hash join not finding which collation to use for string hashing  (Mark Dilger <mark.dilger@enterprisedb.com>)
List pgsql-hackers
Mark Dilger <mark.dilger@enterprisedb.com> writes:
> While reviewing the partition-wise join patch, I ran into an issue that exists in master, so rather than responding
tothat patch, I’m starting this new thread. 
> I noticed that this seems similar to the problem that was supposed to have been fixed in the "Re: COLLATE: Hash
partitionvs UPDATE” thread.  As such, I’ve included Tom and Amit in the CC list. 

Hm, I don't see any bug here.  You're asking it to join

>> CREATE TABLE alpha (a TEXT COLLATE "ja_JP", b TEXT COLLATE "sv_SE");
>> CREATE TABLE beta (a TEXT COLLATE "tr_TR", b TEXT COLLATE "en_US");

>> SELECT t1.a, t2.a FROM alpha t1 INNER JOIN beta t2 ON (t1.a = t2.a) WHERE t1.a IN ('äbç', 'ὀδυσσεύς');

so t1.a and t2.a have different collations, and the system can't resolve
which to use for the comparison.

Now, I'd be the first to agree that this error could be reported better.
The parser knows that it couldn't resolve a collation for t1.a = t2.a, but
what it does *not* know is whether the '=' operator cares for collation.
Throwing an error when the operator wouldn't care at runtime isn't going
to make many people happy.  On the other hand, when the operator finally
does run and can't get a collation, all it knows is that it didn't get a
collation, not why.  So we can't produce an error message as specific as
"ja_JP and tr_TR collations conflict".

Now that the collations feature has settled in, it'd be nice to go back
and see if we can't improve that somehow.  Not sure how.

(BTW, before v12 the text '=' operator indeed did not care for collation,
so this example would've worked.  But the change in behavior is a
necessary consequence of having invented nondeterministic collations,
not a bug.)

            regards, tom lane



pgsql-hackers by date:

Previous
From: yuzuko
Date:
Subject: Re: Autovacuum on partitioned table
Next
From: Tom Lane
Date:
Subject: Re: Is custom MemoryContext prohibited?