Wrong results with equality search using trigram index and non-deterministic collation - Mailing list pgsql-hackers

From Laurenz Albe
Subject Wrong results with equality search using trigram index and non-deterministic collation
Date
Msg-id 8ef4899c4acfebca45cc6c042a6dc611d25ffab1.camel@cybertec.at
Whole thread Raw
Responses Re: Wrong results with equality search using trigram index and non-deterministic collation
List pgsql-hackers
Using a trigram index with an non-deterministic collation can
lead to wrong query results:

  CREATE COLLATION faux_cn (PROVIDER = icu, LOCALE = 'und', DETERMINISTIC = FALSE, RULES = '&l = r');

  CREATE TABLE boom (id integer PRIMARY KEY, t text COLLATE faux_cn);

  INSERT INTO boom VALUES (1, 'right'), (2, 'light');

  SELECT * FROM boom WHERE t = 'right';

   id │   t
  ════╪═══════
    1 │ right
    2 │ light
  (2 rows)

  CREATE INDEX ON boom USING gin (t gin_trgm_ops);

  SET enable_seqscan = off;

  SELECT * FROM boom WHERE t = 'right';

   id │   t
  ════╪═══════
    1 │ right
  (1 row)

I also see questionable results with the similarity operator (with and
without the index):

  SELECT * FROM boom WHERE t % 'rigor';

   id │   t
  ════╪═══════
    1 │ right
  (1 row)

But here you could argue that the operator ignores the collation, so
the result is correct.  With equality, there is no such loophole.

I don't know what the correct fix would be.  Perhaps just refusing to use
the index for equality comparisons with non-deterministic collations.

Yours,
Laurenz Albe



pgsql-hackers by date:

Previous
From: Alexander Lakhin
Date:
Subject: Re: [HACKERS] make async slave to wait for lsn to be replayed
Next
From: Tatsuo Ishii
Date:
Subject: Re: Add memory/disk usage for WindowAgg nodes in EXPLAIN