Thomas Munro <thomas.munro@gmail.com> writes:
> Problem #1: You can have two databases with different encodings, and
> they both pretend that pg_database, pg_authid, pg_db_role_setting etc
> are in the local database encoding. That doesn't work too well:
> non-ASCII text can be reinterpreted in the wrong encoding.
> There's no problem if you only use one encoding everywhere (probably
> UTF8). There's also no problem if you use multiple database
> encodings, but put only ASCII in the shared catalogues (because ASCII
> is a subset of every supported server encoding). This patch is about
> formalising and enforcing those two working arrangements, hopefully
> invisibly to most users. There's still an escape hatch mode if you
> need it, e.g. for a non-conforming pg_upgrade'd system.
Over in the discussion of bug #18735, I've come to the realization
that these problems apply equally to the filesystem path names that
the server deals with: not only the data directory path, but the
path to the installation files [1]. Can we apply the same sort of
restrictions to those? I'm envisioning that initdb would check
either encoding-validity or all-ASCII-ness of those path names
depending on which mode it's setting the server up in.
> The patch invents a new setting CLUSTER CATALOG ENCODING, which can be
> inspected with SHOW and changed with ALTER SYSTEM.
Changing the catalog encoding would also have to re-verify the
suitability of the paths. Of course this isn't 100% bulletproof
since someone could rename those directories later. But I think
that's in "if you break it you get to keep both pieces" territory.
regards, tom lane
[1] https://www.postgresql.org/message-id/2840430.1733510664%40sss.pgh.pa.us