On Fri, Nov 29, 2024 at 5:45 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Hmm, yeah maybe that could work. The other consideration here
> (which we've been dancing around in this thread) is "what encoding
> are role and database names in startup packets presented in?"
> But I think your idea addresses that too:
>
> * mode 1: incoming names must be in the One True Encoding
>
> * mode 2: incoming names must be ASCII
>
> * mode 3: same wild-west behavior as always
>
> In modes 1 and 2 we could validate that the string meets our
> expectations (and then truncate it correctly, too).
Perhaps we could have a property shared_catalog_encoding:, -1 for
unknown (mode 3), PG_SQL_ASCII (mode 2), or something else (mode 1).
I realise that PG_SQL_ASCII normally means bytes with no validation,
but we don't have an encoding that means ASCII with pg_is_ascii()
validation, and I think it'd be confusing if SQL_ASCII meant wild west
mode, IDK. To have any hope of being able to change it after initdb,
I think it has to be in the control file and suspect you might have to
take AEL on all affected catalogues while validating and changing it.
Some random UX sketches:
$ CREATE DATABASE foo ... ENCODING latin1;
ERROR: encoding LATIN1 does not match shared catalog encoding UTF8
HINT: To allow databases with different encodings,
shared_catalog_encoding must be SQL_ASCII (or UNKNOWN, not
recommended)
$ ALTER SYSTEM SET shared_catalog_encoding = 'SQL_ASCII';
ERROR: existing role name "frédéric" cannot be represented in SQL_ASCII
HINT: Rename all databases and roles to use only ASCII characters.
(I realise that ALTER SYSTEM is for GUCs, but something that sounds a
bit like that.)
$ CREATE ROLE lætitia;
ERROR: role name "lætitia" cannot be represented in the shared catalog
encoding SQL_ASCII
HINT: To allow non-ASCII roles, shared_catalog_encoding must be set to
an encoding matching all databases (or UNKNOWN, not recommended)