Thread: BUG #16679: Incorrect encoding of database name

BUG #16679: Incorrect encoding of database name

From

PG Bug reporting form

Date:

20 October 2020, 10:44:22

The following bug has been logged on the website:

Bug reference:      16679
Logged by:          Alexander Kass
Email address:      alexander.kass@jetbrains.com
PostgreSQL version: 13.0
Operating system:   Linux
Description:

Steps to reproduce:
1. Create database with name & encoding that won't binary match utf8 encoded
name, e.g:
> create database Français
    LC_COLLATE 'fr_FR@euro' LC_CTYPE 'fr_FR@euro'
    encoding 'latin9' template template0;
2. Now check pg_database from different databases:
aurora=> \connect français
français=> select * from pg_database;
  datname  | datdba | encoding | datcollate  |  datctype   | datistemplate |
datallowconn | datconnlimit | datlastsysoid | datfrozenxid | datminmxid |
dattablespace |               datacl

-----------+--------+----------+-------------+-------------+---------------+--------------+--------------+---------------+--------------+------------+---------------+-------------------------------------
 postgres  |  16399 |        6 | en_US.UTF-8 | en_US.UTF-8 | f             |
t            |           -1 |         13933 |          549 |          1 |
      1663 |
 franÃ§ais |  16399 |       16 | fr_FR@euro  | fr_FR@euro  | f             |
t            |           -1 |         13933 |          549 |          1 |
      1663 |
..........

français=> \connect postgres
postgres=> select * from pg_database;
  datname  | datdba | encoding | datcollate  |  datctype   | datistemplate |
datallowconn | datconnlimit | datlastsysoid | datfrozenxid | datminmxid |
dattablespace |               datacl

-----------+--------+----------+-------------+-------------+---------------+--------------+--------------+---------------+--------------+------------+---------------+-------------------------------------
 postgres  |  16399 |        6 | en_US.UTF-8 | en_US.UTF-8 | f             |
t            |           -1 |         13933 |          549 |          1 |
      1663 |
 français  |  16399 |       16 | fr_FR@euro  | fr_FR@euro  | f             |
t            |           -1 |         13933 |          549 |          1 |
      1663 |
...........

See incorrectly encoded database name. The same applies for
current_database().
If I do encode(datname::bytea, 'hex') result is matches. It looks like
automatic conversion latin9 -> utf8 is done for français database, but name
is already utf8.
Checked on PG13 & aws aurora

Re: BUG #16679: Incorrect encoding of database name

From

Tom Lane

Date:

20 October 2020, 15:23:07

PG Bug reporting form <noreply@postgresql.org> writes:
> 1. Create database with name & encoding that won't binary match utf8 encoded
> name, e.g:
> create database Français
>     LC_COLLATE 'fr_FR@euro' LC_CTYPE 'fr_FR@euro'
>     encoding 'latin9' template template0;

The short answer is don't do that.  The name will be stored in pg_database
with whatever encoding the source database (the one you were connected to
while issuing CREATE DATABASE) uses, and then it will look funny from any
database using another encoding.  Connecting to the DB will also fail from
any client not using the same encoding, since no encoding conversion is
performed during startup-packet processing.  Really the workable
alternatives are (a) use only ASCII characters in database names, or
(b) use the same encoding in every database of the cluster.

Similar remarks apply to other globally-visible names, ie roles and
tablespaces.

It'd be nice in the abstract to have a better answer, but the amount
of work required, relative to the practical benefit for people who
aren't satisfied with either (a) or (b), is discouraging.  Notable
problems include what to do when a character in pg_database cannot be
translated to the encoding you'd like to use.  The connection-request
encoding problem in particular seems insoluble without a protocol break,
which would cause a lot more unhappiness than happiness.

            regards, tom lane