Hi,
On Fri, Nov 29, 2024 at 10:12:08AM +1300, Thomas Munro wrote:
> On Fri, Nov 29, 2024 at 5:45 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Hmm, yeah maybe that could work. The other consideration here
> > (which we've been dancing around in this thread) is "what encoding
> > are role and database names in startup packets presented in?"
> > But I think your idea addresses that too:
> >
> > * mode 1: incoming names must be in the One True Encoding
Yeah, in practice how would we do that? Just relying on pg_verify_mbstr()
or trying to do the actual "conversion"? Asking, because IIUC pg_verify_mbstr()
does not check for specific encoding rules for single byte (LATIN%) encoding.
> Perhaps we could have a property shared_catalog_encoding:, -1 for
> unknown (mode 3), PG_SQL_ASCII (mode 2), or something else (mode 1).
> I realise that PG_SQL_ASCII normally means bytes with no validation,
> but we don't have an encoding that means ASCII with pg_is_ascii()
> validation, and I think it'd be confusing if SQL_ASCII meant wild west
> mode, IDK.
hm, are you saying that when choosing PG_SQL_ASCII to represent Mode 2
(ASCII-only with validation) in the shared_catalog_encoding field,
we're giving it a different semantic meaning than it has elsewhere in the
system? Maybe we could just document very clearly that its meaning in
shared_catalog_encoding is special/different?
> To have any hope of being able to change it after initdb,
> I think it has to be in the control file and suspect you might have to
> take AEL on all affected catalogues
Yeah.
> Some random UX sketches:
>
> $ CREATE DATABASE foo ... ENCODING latin1;
> ERROR: encoding LATIN1 does not match shared catalog encoding UTF8
> HINT: To allow databases with different encodings,
> shared_catalog_encoding must be SQL_ASCII (or UNKNOWN, not
> recommended)
>
> $ ALTER SYSTEM SET shared_catalog_encoding = 'SQL_ASCII';
> ERROR: existing role name "frédéric" cannot be represented in SQL_ASCII
> HINT: Rename all databases and roles to use only ASCII characters.
>
> (I realise that ALTER SYSTEM is for GUCs, but something that sounds a
> bit like that.)
>
> $ CREATE ROLE lætitia;
> ERROR: role name "lætitia" cannot be represented in the shared catalog
> encoding SQL_ASCII
> HINT: To allow non-ASCII roles, shared_catalog_encoding must be set to
> an encoding matching all databases (or UNKNOWN, not recommended)
That would sound reasonable to me.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com