On Thu, Nov 28, 2024 at 5:04 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> There is nothing
> about our handling of non-ASCII characters in shared system catalogs
> that isn't squishy as heck, and yet there have been darn few field
> complaints over the many years it's been like that. Maybe trying to
> make this truncation issue better in isolation wasn't such a great
> plan.
I guess most people in Unix-land just use UTF-8 in every layer of
their software stack these days, so don't often see confused encodings
anymore? But I don't think that's true in the other place, where they
still routinely juggle multiple encodings and see garbled junk when it
goes wrong[1]. They might still generally prefer UTF-8 for database
encoding though, IDK.
> (If we recorded the encoding of names in shared catalogs then this
> particular issue would be far easier to solve, but then we have
> other problems to address --- particularly, what to do if a name
> in the catalog fails to convert to the encoding we are using.)
Here is a much dumber coarse-grained way I have wondered about for
making the encoding certain, without having to do any new conversions
at all: (1) single-encoding cluster mode, shared catalogues use same
encoding as all databases, (2) multi-encoding cluster mode with
ASCII-only shared catalogues, and (3) legacy squishy/raw mode you
normally only reach by pg_upgrade. Maybe you could switch between
them with an operation that validates names.
Then I think you could always know the shared cat encoding even with
no database context, and when you are connected to a database you
could mostly just carry on assuming it's database encoding (either it
is, or it's the ASCII subset). That can only be wrong in mode 3, all
bets off just like today, but that's your own fault for using mode 3.
I guess serious users of multi-encoding clusters already learn to
stick to ASCII-only role names and database names anyway, unless they
like seeing garbage?
[1] https://www.postgresql.org/message-id/flat/00a601db3b20%24b00261e0%24100725a0%24%40gmx.net