Re: Windows default locale vs initdb - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: Windows default locale vs initdb
Date
Msg-id CA+hUKGJ=ca39Cg=y=S89EaCYvvCF8NrZRO=uog-cnz0VzC6Kfg@mail.gmail.com
Whole thread Raw
In response to Re: Windows default locale vs initdb  (Thomas Munro <thomas.munro@gmail.com>)
List pgsql-hackers
On Tue, Jul 23, 2024 at 11:19 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> On Tue, Jul 23, 2024 at 1:44 AM Andrew Dunstan <andrew@dunslane.net> wrote:
> > I have an environment I can use for testing. But what exactly am I
> > testing? :-) Install a few "problem" language/region settings, switch
> > the system and ensure initdb runs ok?

I thought a bit more about what to do with the messy .UTF-8 situation
on Windows, and I think I might see a way forward that harmonises the
code and behaviour with Unix, and deletes a lot of special case code.
But it's only theories + CI so far.

0001, 0002:  As before, teach initdb.exe to choose eg "en-US" by default.

0003:  Force people to choose locales that match the database
encoding, as we do on Unix.  That is, forbid contradictory
combinations like --locale="English_United States.1252"
--encoding=UTF8, which are currently allowed (and the world is full of
such database clusters because that is how the EDB installer GUI makes
them).  The only allowed combinations for American English should now
be: --locale="en-US" --encoding="WIN1252", and --locale="en-US.UTF-8"
--encoding="UTF8".  You can still use the old names if you like, by
explicitly writing --locale="English_United States.1252", but the
encoding then has to be WIN1252.  It's crazy to mix them up, let's ban
that.

Obviously there is a pg_upgrade case to worry about there.  We'd have
to "fix" the now illegal combinations, and I don't know exactly how
yet.

0004:  Rip out the code that does extra wchar_t conversations for
collations.  If I've understood correctly, we don't need them: if you
have a .UTF-8 locale then your encoding is UTF-8 and should be able to
use strcoll_l() directly.  Right?

0005:  Something similar was being done for strftime().  And we might
as well use strftime_l() instead while we're here (part of general
movement to use _l functions and stop splattering setlocale() all over
the place, for the multithreaded future).

These patches pass on CI.  Do they give the expected results when used
on a real Windows system?

There are a few more places where we do wchar_t conversions that could
probably be stripped out too, if my assumptions are correct, and we
could dig further if the basic idea can be validated and people think
this is going in a good direction.

Attachment

pgsql-hackers by date:

Previous
From: "Hayato Kuroda (Fujitsu)"
Date:
Subject: RE: Conflict detection and logging in logical replication
Next
From: Peter Smith
Date:
Subject: Re: Logical Replication of sequences