On Wed, 2026-04-22 at 08:45 +0200, David Geier wrote:
> On 17.09.2024 08:00, Laurenz Albe wrote:
> > Using a trigram index with an non-deterministic collation can
> > lead to wrong query results:
> >
> > CREATE COLLATION faux_cn (PROVIDER = icu, LOCALE = 'und', DETERMINISTIC = FALSE, RULES = '&l = r');
> >
> > CREATE TABLE boom (id integer PRIMARY KEY, t text COLLATE faux_cn);
> >
> > INSERT INTO boom VALUES (1, 'right'), (2, 'light');
> >
> > SELECT * FROM boom WHERE t = 'right';
> >
> > id │ t
> > ════╪═══════
> > 1 │ right
> > 2 │ light
> > (2 rows)
> >
> > CREATE INDEX ON boom USING gin (t gin_trgm_ops);
> >
> > SET enable_seqscan = off;
> >
> > SELECT * FROM boom WHERE t = 'right';
> >
> > id │ t
> > ════╪═══════
> > 1 │ right
> > (1 row)
> >
> > I also see questionable results with the similarity operator (with and
> > without the index):
> >
> > SELECT * FROM boom WHERE t % 'rigor';
> >
> > id │ t
> > ════╪═══════
> > 1 │ right
> > (1 row)
> >
> > But here you could argue that the operator ignores the collation, so
> > the result is correct. With equality, there is no such loophole.
>
> I think we should change that. It's very counter intuitive that a query
> can change behavior when the planner flips from using e.g. a Seq Scan to
> a Bitmap Index Scan or the other way around. There's already a patch for
> that, see [1].
>
> [1]
> https://www.postgresql.org/message-id/flat/db087c3e-230e-4119-8a03-8b5d74956bc2%40gmail.com
That's not only unintuitive, it is a clear bug.
An index is not allowed to change the semantics of a query.
Does your patch fix the bug, that is, will the query with "WHERE t = 'right'"
return both results? That's the case that is mostly in need of fixing.
I am not sure if the behavior for the % operator should also be considered
a bug.
> > I don't know what the correct fix would be. Perhaps just refusing to use
> > the index for equality comparisons with non-deterministic collations.
>
> If we merge [1], then not only = but also LIKE would be incorrect. How
> about disabling CREATE INDEX USING gin on columns with non-deterministic
> collations?
Oh, I see. So your patch won't fix the bug.
I am not sure if refusing to *create* the index is the best solution.
Perhaps a warning will be better:
WARNING: GIN indexes won't be used columns with non-deterministic collations
Even if you refuse to create the index, you'd still have to deal with the
indexes that already exist. An upgrade cannot decide to remove an index.
At the very least, there would have to be a check in pg_upgrade.
> Or is there maybe a way to make these cases work correctly for
> non-deterministic collations by applying the collation when extracting
> the search trigrams? I take a look into that.
Thank you. Making it work would of course be the best option.
Yours,
Laurenz Albe