Home > mailing lists

Re: Wrong results from inner-unique joins caused by collation mismatch - Mailing list pgsql-hackers

From	Richard Guo
Subject	Re: Wrong results from inner-unique joins caused by collation mismatch
Date	April 24 18:44:32
Msg-id	CAMbWs4_pqvDepQapm6+vF=cPAQkKgerKEbO60dbfULx6+heitQ@mail.gmail.com Whole thread
In response to	Re: Wrong results from inner-unique joins caused by collation mismatch (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: Wrong results from inner-unique joins caused by collation mismatch
List	pgsql-hackers

Tree view

On Fri, Apr 24, 2026 at 11:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Richard Guo <guofenglinux@gmail.com> writes:
> > My first thought was to fix this by:
>
> > +  if (!IndexCollMatchesExprColl(ind->indexcollations[c],
> > +                                exprInputCollation((Node *) rinfo->clause)))
> > +      continue;
>
> > However, this caused an unexpected plan diff in join.out where a
> > left-join removal over (name, text) stopped working, because name and
> > text use different collations.  So this check is too strict: a
> > mismatch between two deterministic collations should be OK for
> > uniqueness proof, as a deterministic collation treats two strings as
> > equal iff they are byte-wise equal (see CREATE COLLATION).

> Yes, we'd be taking a serious performance hit if we insisted on
> exact collation matches for this purpose.  I agree that disallowing
> non-matching non-deterministic collations is the right fix.

Thanks for taking a look!

> > Hence, I got attached patch.  Thoughts?

> I don't love doing it like this, for two reasons:
>
> 1. I think there are other places in the planner that will need
> substantially this same logic.  I recommend breaking out a
> subroutine defined more or less as "do these collations have
> equivalent notions of equality".

Right.  I just found several other places that need this same logic,
which are in query_is_distinct_for().  Without it, we produce wrong
results:

select * from t t1 join
  (select distinct a from t) t2 on t1.a = t2.a COLLATE "ci";
 a | a
---+---
 A | a
 a | a
(2 rows)

select * from t t1 join
  (select a from t group by a) t2 on t1.a = t2.a COLLATE "ci";
 a | a
---+---
 A | a
 a | a
(2 rows)

> 2. I find the test next to unreadable as written --- for example,
> it's more difficult than it should be to figure out what happens
> if one collation is deterministic and the other not.  Using a
> subroutine would help here by letting you break down the test
> into multiple steps.

Agreed.  Will wrap the logic in a subroutine.

- Richard

pgsql-hackers by date:

From: Tom Lane
Date: 24 April, 17:53:17
Subject: Re: Wrong results from inner-unique joins caused by collation mismatch

From: Alexander Lakhin
Date: 24 April, 20:00:00
Subject: Re: meson: Make test output much more useful on failure (both in CI and locally)

Re: Wrong results from inner-unique joins caused by collation mismatch - Mailing list pgsql-hackers

Previous

Next