Thread: BUG #16020: ICU Collations querys

BUG #16020: ICU Collations querys

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      16020
Logged by:          Marina Garrido Sanchez
Email address:      garridosanchezmarina@gmail.com
PostgreSQL version: 12beta4
Operating system:   Windows
Description:

When I use a particular ICU Collation query with locale =
'es-ES-u-ks-level1' and deterministic = false for having case- and accet-
insensitive, I can use similiar, like and ilike as the documentation said,
but how can I do a parcial search without this operators?


Re: BUG #16020: ICU Collations querys

From
Peter Eisentraut
Date:
On 2019-09-25 08:51, PG Bug reporting form wrote:
> The following bug has been logged on the website:
> 
> Bug reference:      16020
> Logged by:          Marina Garrido Sanchez
> Email address:      garridosanchezmarina@gmail.com
> PostgreSQL version: 12beta4
> Operating system:   Windows
> Description:        
> 
> When I use a particular ICU Collation query with locale =
> 'es-ES-u-ks-level1' and deterministic = false for having case- and accet-
> insensitive, I can use similiar, like and ilike as the documentation said,
> but how can I do a parcial search without this operators?

This doesn't sound like a bug, so it's inappropriate for this forum.

Also, please provide some more detail, including exact queries and what
you are hoping for as a result.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: BUG #16020: ICU Collations querys

From
Marina Garrido Sanchez
Date:
Hi, 

Sorry I dont see other way to send my issue or question, and I assumed that as It is something new (ICU Collations) for case and accet insensitive it would be better to do in this forum.

The questions is...that if I do the following script:

create collation ca_insensitive (provider = icu, locale = 'es-ES-u-ks-level1', deterministic = false;

create table users (
nombre text collate "ca_insensitive" primary key unique,
apellidos text collate "ca_insensitive",
direccion text
);

insert into user values ("jávier", "gonzález", "BBB")
insert into user values ("Javier", "Gonzalez", "BBB")

and if I do the query:

select * from users where users.apellidos ilike '%Gonz%';

I get the error that ilike does not support nondeterministic operation, for that, my question is How can I do partial match search with case- and accet- insensitive in postgres? or Which operator can I use to do the search?

Thanks in advance.

El jue., 26 sept. 2019 a las 17:11, Marina Garrido Sanchez (<garridosanchezmarina@gmail.com>) escribió:
Hi, 

Sorry I dont see other way to send my issue or question, and I assumed that as It is something new (ICU Collations) for case and accet insensitive it would be better to do in this forum.

The questions is...that if I do the following script:

create collation ca_insensitive (provider = icu, locale = 'es-ES-u-ks-level1', deterministic = false;

create table users (
nombre text collate "ca_insensitive" primary key unique,
apellidos text collate "ca_insensitive",
direccion text
);

insert into user values ("jávier", "gonzález", "BBB")
insert into user values ("Javier", "Gonzalez", "BBB")

and if I do the query:

select * from users where users.apellidos ilike '%Gonz%';

I get the error that ilike does not support nondeterministic operation, for that, my question is How can I do partial match search with case- and accet- insensitive in postgres? or Which operator can I use to do the search?

Thanks in advance.





El mié., 25 sept. 2019 a las 22:09, Peter Eisentraut (<peter.eisentraut@2ndquadrant.com>) escribió:
On 2019-09-25 08:51, PG Bug reporting form wrote:
> The following bug has been logged on the website:
>
> Bug reference:      16020
> Logged by:          Marina Garrido Sanchez
> Email address:      garridosanchezmarina@gmail.com
> PostgreSQL version: 12beta4
> Operating system:   Windows
> Description:       
>
> When I use a particular ICU Collation query with locale =
> 'es-ES-u-ks-level1' and deterministic = false for having case- and accet-
> insensitive, I can use similiar, like and ilike as the documentation said,
> but how can I do a parcial search without this operators?

This doesn't sound like a bug, so it's inappropriate for this forum.

Also, please provide some more detail, including exact queries and what
you are hoping for as a result.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: BUG #16020: ICU Collations querys

From
Peter Eisentraut
Date:
On 2019-09-26 17:14, Marina Garrido Sanchez wrote:
> create collation ca_insensitive (provider = icu, locale =
> 'es-ES-u-ks-level1', deterministic = false;
> 
> create table users (
> nombre text collate "ca_insensitive" primary key unique,
> apellidos text collate "ca_insensitive",
> direccion text
> );
> 
> insert into user values ("jávier", "gonzález", "BBB")
> insert into user values ("Javier", "Gonzalez", "BBB")
> 
> and if I do the query:
> 
> select * from users where users.apellidos ilike '%Gonz%';
> 
> I get the error that *ilike *does not support nondeterministic
> operation, for that, my question is How can I do partial match search
> with case- and accet- insensitive in postgres? or Which operator can I
> use to do the search?

You can run your query with an explicit deterministic collation applied,
for example:

select * from users where users.apellidos ilike '%Gonz%' collate "C";

or "und-x-icu" or whatever.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: BUG #16020: ICU Collations querys

From
"Daniel Verite"
Date:
    Marina Garrido Sanchez wrote:

> create collation ca_insensitive (provider = icu, locale =
> 'es-ES-u-ks-level1', deterministic = false;
>
> create table users (
> nombre text collate "ca_insensitive" primary key unique,
> apellidos text collate "ca_insensitive",
> direccion text
> );
>
> insert into user values ("jávier", "gonzález", "BBB")
> insert into user values ("Javier", "Gonzalez", "BBB")

Independently of the syntax problem (literals must be enclosed
in single quotes, not double quotes), the second insert should
fail since 'jávier' = 'Javier' with the above-defined collation and
there is a unique index on users.nombre.

> select * from users where users.apellidos ilike '%Gonz%';
>
> I get the error that *ilike *does not support nondeterministic operation,
> for that, my question is How can I do partial match search with case- and
> accet- insensitive in postgres? or Which operator can I use to do the
> search?

If you're interested only in the case insensitiveness, you
could force a deterministic collation to the ilike argument,
for instance:

select * from users where users.apellidos ilike '%Gonz%' collate "es-x-icu";

If you really need collate-sensitive substring search with non-deterministic
collations, I don't think there is any way in Postgres 12 to get that.
If you can install the icu_ext extension [1], its icu_strpos() function
implements that using the collation-aware string search feature in ICU.

As string like '%foo%' is equivalent to  strpos(string, 'foo')>0
when string is associated to a deterministic collation,
string like '%foo%' with a non-derministic collation
is not supported but it is equivalent to icu_strpos(string, 'foo')>0.

For instance, with the "ca_insensitive" collation you defined,
both 'jáv' and ''Jav'  match 'jav':

insert into users values ('jávier', 'gonzález', 'BBB');
insert into users values ('Javier2', 'Gonzalez', 'BBB');

select * from users where icu_strpos(nombre, 'jav')>0;
 nombre  | apellidos | direccion
---------+-----------+-----------
 jávier  | gonzález  | BBB
 Javier2 | Gonzalez  | BBB


[1] https://github.com/dverite/icu_ext#icu_strpos


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite