Re: Possible to create a hidden collation - Mailing list pgsql-bugs

From Daniel Verite
Subject Re: Possible to create a hidden collation
Date
Msg-id 226e3a10-41c6-4150-a31e-e806d7c0efd4@manitou-mail.org
Whole thread Raw
In response to Possible to create a hidden collation  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: Possible to create a hidden collation  (Jeff Davis <pgsql@j-davis.com>)
List pgsql-bugs
    Jeff Davis wrote:

> create collation test(provider=icu,
>    deterministic=false, locale='@colStrength=primary');
>  create collation "en_US"(provider=icu,
>    deterministic=false, locale='@colStrength=primary');
>
>  select 'a' = 'A' collate test; -- true
>  select 'a' = 'A' collate "en_US"; -- false
>  drop collation "en_US"; -- drops built-in collation
>  select 'a' = 'A' collate "en_US"; -- true
>
> Explanation:
>
> The second collation named "en_US" is hidden behind the built-in
> collation "en_US" because the former is created with collencoding=-1
> (as all icu collations are), and the latter is a libc collation with
> collencoding equal to the current database encoding (which takes
> precedence).
>
> It's a minor bug, but could be surprising behavior.

ISTM that this behavior is not due to collencoding=-1, but
to the custom "en_US" collation being in the "public" schema
whereas the built-in "en_US" is in "pg_catalog".

Assuming a default search_path,
-  create collation "en_US"(...) means
 create collation "public"."en_US"(...)
- select 'a' = 'A' collate "en_US"   means
 select 'a' = 'A' collate "pg_catalog"."en_US"
- drop collation "en_US" means
 drop collation "pg_catalog"."en_US"

So in practice the new collation en_US is not being seen
until the system collation is dropped, independently of
collencoding.

Also, the proposed patch doesn't appear to change the outcome
of the sequence of statements given as an example, which is both
expected considering the above, and surprising because you imply
that it should improve the user experience.


What I'm seeing after applying it:

Initial state:
postgres=# select * from pg_collation where collname='en_US' \gx
-[ RECORD 1 ]-------+-----------
oid            | 12419
collname        | en_US
collnamespace        | 11
collowner        | 10
collprovider        | c
collisdeterministic | t
collencoding        | 6
collcollate        | en_US.utf8
collctype        | en_US.utf8
colliculocale        |
collicurules        |
collversion        | 2.35


postgres=# create collation "en_US"(provider=icu,
postgres(# deterministic=false, locale='@colStrength=primary');
NOTICE:  using standard form "und-u-ks-level1" for locale
"@colStrength=primary"
CREATE COLLATION


postgres=# select * from pg_collation where collname='en_US' \gx
-[ RECORD 1 ]-------+----------------
oid            | 12419
collname        | en_US
collnamespace        | 11
collowner        | 10
collprovider        | c
collisdeterministic | t
collencoding        | 6
collcollate        | en_US.utf8
collctype        | en_US.utf8
colliculocale        |
collicurules        |
collversion        | 2.35
-[ RECORD 2 ]-------+----------------
oid            | 16388
collname        | en_US
collnamespace        | 2200
collowner        | 10
collprovider        | i
collisdeterministic | f
collencoding        | 6
collcollate        |
collctype        |
colliculocale        | und-u-ks-level1
collicurules        |
collversion        | 153.112

(I notice collencoding=6 for the new collation in the public namespace)


postgres=# select 'a' = 'A' collate "en_US";
 ?column?
----------
 f
(1 row)

(the libc collation is still used, because pg_catalog comes first)

postgres=# drop collation "en_US";
DROP COLLATION

postgres=# select * from pg_collation where collname='en_US' \gx
-[ RECORD 1 ]-------+----------------
oid            | 16388
collname        | en_US
collnamespace        | 2200
collowner        | 10
collprovider        | i
collisdeterministic | f
collencoding        | 6
collcollate        |
collctype        |
colliculocale        | und-u-ks-level1
collicurules        |
collversion        | 153.112

(the collation in pg_catalog has been dropped, so we're left with only
the custom collation).

So maybe it's better to set collencoding to the db encoding as done
by the patch, but it's not clear what concrete problem it solves.


Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite



pgsql-bugs by date:

Previous
From: Nikolas Mauropoulos
Date:
Subject: Re: BUG #17929: LLVMBuildGEP: symbol not found
Next
From: Michael Paquier
Date:
Subject: Re: BUG #17884: gist_page_items() crashes for a non-leaf page of an index with non-key columns