Thread: Wrong results with equality search using trigram index and non-deterministic collation
Wrong results with equality search using trigram index and non-deterministic collation
From
Laurenz Albe
Date:
Using a trigram index with an non-deterministic collation can lead to wrong query results: CREATE COLLATION faux_cn (PROVIDER = icu, LOCALE = 'und', DETERMINISTIC = FALSE, RULES = '&l = r'); CREATE TABLE boom (id integer PRIMARY KEY, t text COLLATE faux_cn); INSERT INTO boom VALUES (1, 'right'), (2, 'light'); SELECT * FROM boom WHERE t = 'right'; id │ t ════╪═══════ 1 │ right 2 │ light (2 rows) CREATE INDEX ON boom USING gin (t gin_trgm_ops); SET enable_seqscan = off; SELECT * FROM boom WHERE t = 'right'; id │ t ════╪═══════ 1 │ right (1 row) I also see questionable results with the similarity operator (with and without the index): SELECT * FROM boom WHERE t % 'rigor'; id │ t ════╪═══════ 1 │ right (1 row) But here you could argue that the operator ignores the collation, so the result is correct. With equality, there is no such loophole. I don't know what the correct fix would be. Perhaps just refusing to use the index for equality comparisons with non-deterministic collations. Yours, Laurenz Albe
Re: Wrong results with equality search using trigram index and non-deterministic collation
From
Laurenz Albe
Date:
On Tue, 2024-09-17 at 08:00 +0200, Laurenz Albe wrote: > Using a trigram index with an non-deterministic collation can > lead to wrong query results: > [...] > > I don't know what the correct fix would be. Perhaps just refusing to use > the index for equality comparisons with non-deterministic collations. Looking into fixing that, how can you tell the optimizer to consider a certain index only for certain collations? Yours, Laurenz Albe