On 2021-Oct-20, Mark Dilger wrote:
> I tried testing how this plays out by handing `createdb` the name é
> (U+00E9 "LATIN SMALL LETTER E WITH ACCUTE") and then again the name é
> (U+0065 "LATIN SMALL LETTER E" followed by U+0301 "COMBINING ACCUTE
> ACCENT".) That results in two distinct databases, not an error about
> a duplicate database name:
>
> # select oid, datname, datdba, encoding, datcollate, datctype from pg_catalog.pg_database where datname IN ('é',
'é');
> oid | datname | datdba | encoding | datcollate | datctype
> -------+---------+--------+----------+-------------+-------------
> 37852 | é | 10 | 6 | en_US.UTF-8 | en_US.UTF-8
> 37855 | é | 10 | 6 | en_US.UTF-8 | en_US.UTF-8
> (2 rows)
>
> But that doesn't seem to prove much, as other tools in my locale don't
> treat those as equal either. (Testing with perl's "eq" operator, they
> compare as distinct.) I expected to find regression tests providing
> better coverage for this somewhere, but did not. Anybody know more
> about it?
I think it would appropriate to normalize identifiers that are going to
be stored in catalogs. As presented, this is a bit ridiculous and I see
no reason to continue to support it.
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"Ed is the standard text editor."
http://groups.google.com/group/alt.religion.emacs/msg/8d94ddab6a9b0ad3