Home > mailing lists

Re: Multi-byte character case-folding - Mailing list pgsql-hackers

From	Daniel Verite
Subject	Re: Multi-byte character case-folding
Date	July 7, 2020 14:33:16
Msg-id	c1c7e094-b07b-4fa8-84e0-2a1bff1ff456@manitou-mail.org Whole thread Raw
In response to	Re: Multi-byte character case-folding (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: Multi-byte character case-folding
List	pgsql-hackers

Tree view

    Tom Lane wrote:

> CREATE TABLE public."myÉclass" (
>    f1 text
> );
>
> If we start to case-fold É, then the only way to access this table will
> be by double-quoting its name, which the application probably is not
> expecting (else it would have double-quoted in the original CREATE TABLE).

This problem already exists when migrating from a mono-byte database
to a multi-byte database, since downcase_identifier()  does use
tolower() for mono-byte databases.

db9=# show server_encoding ;
 server_encoding
-----------------
 LATIN9
(1 row)

db9=# create table MYÉCLASS (f1 text);
CREATE TABLE

db9=# \d
      List of relations
 Schema |   Name   | Type  |  Owner
--------+----------+-------+----------
 public | myéclass | table | postgres
(1 row)

db9=# select * from MYÉCLASS;
 f1
----
(0 rows)

pg_dump will dump this as

CREATE TABLE public."myéclass" (
    f1 text
);

So far so good. But after importing this into an UTF-8 database,
the same "select * from MYÉCLASS" that used to work now fails:

u8=# show server_encoding ;
 server_encoding
-----------------
 UTF8
(1 row)

u8=# select * from MYÉCLASS;
ERROR:    relation "myÉclass" does not exist

The compromise that is mentioned in downcase_identifier() justifying
this inconsistency is not very convincing, because the issues in case
folding due to linguistic differences exist both in mono-byte and
multi-byte encodings. For instance, if it's fine to trust the locale
to downcase 'İ' in a LATIN5 db, it should be okay in a UTF-8 db too.

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite

pgsql-hackers by date:

From: Pavel Stehule
Date: 07 July 2020, 13:05:42
Subject: Re: [Proposal] Global temporary tables

From: mailajaypatel@gmail.com
Date: 07 July 2020, 14:41:15
Subject: Re: Question: PostgreSQL on Amazon linux EC2

Re: Multi-byte character case-folding - Mailing list pgsql-hackers

Previous

Next