Re: BUG #16020: ICU Collations querys - Mailing list pgsql-bugs

From Daniel Verite
Subject Re: BUG #16020: ICU Collations querys
Date
Msg-id a0cc6c11-c06f-4275-ac48-890e463d0c4c@manitou-mail.org
Whole thread Raw
In response to Re: BUG #16020: ICU Collations querys  (Marina Garrido Sanchez <garridosanchezmarina@gmail.com>)
List pgsql-bugs
    Marina Garrido Sanchez wrote:

> create collation ca_insensitive (provider = icu, locale =
> 'es-ES-u-ks-level1', deterministic = false;
>
> create table users (
> nombre text collate "ca_insensitive" primary key unique,
> apellidos text collate "ca_insensitive",
> direccion text
> );
>
> insert into user values ("jávier", "gonzález", "BBB")
> insert into user values ("Javier", "Gonzalez", "BBB")

Independently of the syntax problem (literals must be enclosed
in single quotes, not double quotes), the second insert should
fail since 'jávier' = 'Javier' with the above-defined collation and
there is a unique index on users.nombre.

> select * from users where users.apellidos ilike '%Gonz%';
>
> I get the error that *ilike *does not support nondeterministic operation,
> for that, my question is How can I do partial match search with case- and
> accet- insensitive in postgres? or Which operator can I use to do the
> search?

If you're interested only in the case insensitiveness, you
could force a deterministic collation to the ilike argument,
for instance:

select * from users where users.apellidos ilike '%Gonz%' collate "es-x-icu";

If you really need collate-sensitive substring search with non-deterministic
collations, I don't think there is any way in Postgres 12 to get that.
If you can install the icu_ext extension [1], its icu_strpos() function
implements that using the collation-aware string search feature in ICU.

As string like '%foo%' is equivalent to  strpos(string, 'foo')>0
when string is associated to a deterministic collation,
string like '%foo%' with a non-derministic collation
is not supported but it is equivalent to icu_strpos(string, 'foo')>0.

For instance, with the "ca_insensitive" collation you defined,
both 'jáv' and ''Jav'  match 'jav':

insert into users values ('jávier', 'gonzález', 'BBB');
insert into users values ('Javier2', 'Gonzalez', 'BBB');

select * from users where icu_strpos(nombre, 'jav')>0;
 nombre  | apellidos | direccion
---------+-----------+-----------
 jávier  | gonzález  | BBB
 Javier2 | Gonzalez  | BBB


[1] https://github.com/dverite/icu_ext#icu_strpos


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #16027: Invalid output of to_timestamp
Next
From: PG Bug reporting form
Date:
Subject: BUG #16028: JDBC currentSchema connection parameter not working for text search configuration