Use correct collation in pg_trgm - Mailing list pgsql-hackers

From David Geier
Subject Use correct collation in pg_trgm
Date
Msg-id db087c3e-230e-4119-8a03-8b5d74956bc2@gmail.com
Whole thread Raw
Responses Re: Use correct collation in pg_trgm
Re: Use correct collation in pg_trgm
List pgsql-hackers
Hi hackers,

In thread [1] we found that pg_trgm always uses DEFAULT_COLLATION_OID
for converting trigrams to lower-case. Here are some examples where
today the collation is ignored:

CREATE EXSTENSION pg_trgm;
CREATE COLLATION turkish (provider = libc, locale = 'tr_TR.utf8');

postgres=# SELECT show_trgm('ISTANBUL' COLLATE "turkish");
                  show_trgm
---------------------------------------------
 {"  i"," is",anb,bul,ist,nbu,sta,tan,"ul "}

CREATE TABLE test(col TEXT COLLATE "turkish");
INSERT INTO test VALUES ('ISTANBUL');

postgres=# select show_trgm(col) FROM test;
                  show_trgm
---------------------------------------------
 {"  i"," is",anb,bul,ist,nbu,sta,tan,"ul "}

postgres=# SELECT similarity('ıstanbul' COLLATE "turkish", 'ISTANBUL'
COLLATE "turkish");
 similarity
------------
        0.5

If the database is initialized via initdb --locale="tr_TR.utf8", the
output changes:

postgres=# SELECT show_trgm('ISTANBUL');
                       show_trgm
--------------------------------------------------------
 {0xf31e1a,0xfe581d,0x3efd30,anb,bul,nbu,sta,tan,"ul "}

and

postgres=# select show_trgm(col) FROM test;
                       show_trgm
--------------------------------------------------------
 {0xf31e1a,0xfe581d,0x3efd30,anb,bul,nbu,sta,tan,"ul "}

postgres=# SELECT similarity('ıstanbul' COLLATE "turkish", 'ISTANBUL'
COLLATE "turkish");
 similarity
------------
          1

tr_TR.utf8 converts capital I to ı which is a multibyte character, while
my default collation converts I to i.

The attached patch attempts to fix that. I grepped for all occurrences
of DEFAULT_COLLATION_OID in contrib/pg_trgm and use the function's
collation OID instead DEFAULT_COLLATION_OID.

The corresponding regression tests pass.

[1]
https://www.postgresql.org/message-id/e5dd01c6-c469-405d-aea2-feca0b2dc34d%40gmail.com

--
David Geier
Attachment

pgsql-hackers by date:

Previous
From: Zsolt Parragi
Date:
Subject: Re: CREATE TABLE LIKE INCLUDING POLICIES
Next
From: David Geier
Date:
Subject: Re: Reduce build times of pg_trgm GIN indexes