Re: Windows UTF8 system locale - Mailing list pgsql-hackers

From Noah Misch
Subject Re: Windows UTF8 system locale
Date
Msg-id 20250102042634.b5.nmisch@google.com
Whole thread Raw
In response to Re: Windows UTF8 system locale  (Vladlen Popolitov <v.popolitov@postgrespro.ru>)
List pgsql-hackers
On Wed, Dec 25, 2024 at 06:55:51PM +0300, Vladlen Popolitov wrote:
> This UTF-8 feature leads to annoying test failure
> (010_dump_connstr).

It's not merely an annoying test failure.  On Windows configured with a
multibyte system locale, anyone with CREATEDB privilege can name a database
such that pg_dumpall can't restore it.

> Option 1
> Skip this test for Windows in UTF-8 mode.
> 
> Option 2.
> Exclude all 8-bit characters for Windows in UTF-8 mode. Now only " excluded
> for Windows.
> 
> Option 3.
> Test with some limited list of correct UTF-8 symbols - just in case, that
> they also works.
> It could be 64 2-bytes UTF-8 characters.

Those are ways to suppress the test failure.  But we have that test because
pg_dumpall and pg_upgrade rely on the ability to send all possible rolname and
datname on the command line.  In a cluster that uses a single-byte encoding,
that requires the ability to pass every sequence of bytes [0x01,0xFF].  It's
not much of a win to make the test stop failing if real use of pg_dump and
pg_upgrade would still fail.  Message
postgr.es/m/20241215023221.4d.nmisch@google.com (original post of this thread)
gave PGSERVICEFILE as a way to make the real usage work.  That works by
removing the requirement to pass arbitrary bytes in command lines.  The
command line would contain an ASCII-only service name, and the arbitrary bytes
would appear inside the service file.

Another way might be to create the objects with placeholder ASCII names.  As
the last step of the restore, rename the placeholder ASCII names to the source
cluster's names.

Once we can assume Windows 11 or later, another way is
<activeCodePage>en-US</activeCodePage> in a fusion manifest, per
https://learn.microsoft.com/en-us/windows/win32/sbscs/application-manifests#activeCodePage.
Any single-byte encoding choice might suffice.  That makes PostgreSQL
independent of the system locale.



pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: Re: Introduce XID age and inactive timeout based replication slot invalidation
Next
From: Gurjeet Singh
Date:
Subject: Re: Document How Commit Handles Aborted Transactions