Thread: BUG #16679: Incorrect encoding of database name
The following bug has been logged on the website: Bug reference: 16679 Logged by: Alexander Kass Email address: alexander.kass@jetbrains.com PostgreSQL version: 13.0 Operating system: Linux Description: Steps to reproduce: 1. Create database with name & encoding that won't binary match utf8 encoded name, e.g: > create database Français LC_COLLATE 'fr_FR@euro' LC_CTYPE 'fr_FR@euro' encoding 'latin9' template template0; 2. Now check pg_database from different databases: aurora=> \connect français français=> select * from pg_database; datname | datdba | encoding | datcollate | datctype | datistemplate | datallowconn | datconnlimit | datlastsysoid | datfrozenxid | datminmxid | dattablespace | datacl -----------+--------+----------+-------------+-------------+---------------+--------------+--------------+---------------+--------------+------------+---------------+------------------------------------- postgres | 16399 | 6 | en_US.UTF-8 | en_US.UTF-8 | f | t | -1 | 13933 | 549 | 1 | 1663 | français | 16399 | 16 | fr_FR@euro | fr_FR@euro | f | t | -1 | 13933 | 549 | 1 | 1663 | .......... français=> \connect postgres postgres=> select * from pg_database; datname | datdba | encoding | datcollate | datctype | datistemplate | datallowconn | datconnlimit | datlastsysoid | datfrozenxid | datminmxid | dattablespace | datacl -----------+--------+----------+-------------+-------------+---------------+--------------+--------------+---------------+--------------+------------+---------------+------------------------------------- postgres | 16399 | 6 | en_US.UTF-8 | en_US.UTF-8 | f | t | -1 | 13933 | 549 | 1 | 1663 | français | 16399 | 16 | fr_FR@euro | fr_FR@euro | f | t | -1 | 13933 | 549 | 1 | 1663 | ........... See incorrectly encoded database name. The same applies for current_database(). If I do encode(datname::bytea, 'hex') result is matches. It looks like automatic conversion latin9 -> utf8 is done for français database, but name is already utf8. Checked on PG13 & aws aurora
PG Bug reporting form <noreply@postgresql.org> writes: > 1. Create database with name & encoding that won't binary match utf8 encoded > name, e.g: > create database Français > LC_COLLATE 'fr_FR@euro' LC_CTYPE 'fr_FR@euro' > encoding 'latin9' template template0; The short answer is don't do that. The name will be stored in pg_database with whatever encoding the source database (the one you were connected to while issuing CREATE DATABASE) uses, and then it will look funny from any database using another encoding. Connecting to the DB will also fail from any client not using the same encoding, since no encoding conversion is performed during startup-packet processing. Really the workable alternatives are (a) use only ASCII characters in database names, or (b) use the same encoding in every database of the cluster. Similar remarks apply to other globally-visible names, ie roles and tablespaces. It'd be nice in the abstract to have a better answer, but the amount of work required, relative to the practical benefit for people who aren't satisfied with either (a) or (b), is discouraging. Notable problems include what to do when a character in pg_database cannot be translated to the encoding you'd like to use. The connection-request encoding problem in particular seems insoluble without a protocol break, which would cause a lot more unhappiness than happiness. regards, tom lane