Home > mailing lists

Re: case insensitive collation of Greek's sigma - Mailing list pgsql-general

From	Peter Eisentraut
Subject	Re: case insensitive collation of Greek's sigma
Date	December 1, 2021 19:29:33
Msg-id	9e3220da-d47e-add7-8b97-7c65b12ff6d7@enterprisedb.com Whole thread Raw
In response to	case insensitive collation of Greek's sigma (Jakub Jedelsky <jakub.jedelsky@gooddata.com>)
Responses	Re: case insensitive collation of Greek's sigma
List	pgsql-general

Tree view

On 26.11.21 08:37, Jakub Jedelsky wrote:
> postgres=# SELECT
> postgres-# 'ΣΣ' ILIKE 'σσ' COLLATE "en_US",
> postgres-# 'ΣΣ' ILIKE 'σς' COLLATE "en_US"
> postgres-# ;
>   ?column? | ?column?
> ----------+----------
>   t        | f
> (1 row)
> 
> postgres=# SELECT
> postgres-# 'ΣΣ' ILIKE 'σσ' COLLATE "en-US-x-icu",
> postgres-# 'ΣΣ' ILIKE 'σς' COLLATE "en-US-x-icu";
>   ?column? | ?column?
> ----------+----------
>   f        | t
> (1 row)
> 
> If I could start, I think both results are wrong as both should return 
> True. If I got it right, in the background there is a lower() function 
> running to compare strings, which is not enough for such cases (until 
> the left side isn't taken as a standalone word).

The reason for these results is that for multibyte encodings, a ILIKE b 
basically does lower(a) LIKE lower(b), and

select lower('ΣΣ' COLLATE "en_US"), lower('ΣΣ' COLLATE "en-US-x-icu");
  lower | lower
-------+-------
  σσ    | σς

Running lower() like this is really the wrong thing to do.  We should be 
doing "case folding" instead, which normalizes these differences for the 
purpose of case-insensitive comparisons.

pgsql-general by date:

From: "Jenda Krynicky"
Date: 01 December 2021, 19:20:56
Subject: INSERT ... ON CONFLICT doesn't work

From: Adrian Klaver
Date: 01 December 2021, 19:31:27
Subject: Re: INSERT ... ON CONFLICT doesn't work

Re: case insensitive collation of Greek's sigma - Mailing list pgsql-general

Previous

Next