Re: Collation versioning - Mailing list pgsql-hackers

From David Rowley
Subject Re: Collation versioning
Date
Msg-id CAApHDvpfOzs-tfm8wfiYg1zUZS0PDzNJG=rqnPHTfvks+G9ivg@mail.gmail.com
Whole thread Raw
In response to Re: Collation versioning  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: Collation versioning  (David Rowley <dgrowleyml@gmail.com>)
List pgsql-hackers
On Tue, 3 Nov 2020 at 09:43, Thomas Munro <thomas.munro@gmail.com> wrote:
> Fortunately David Rowley is able to repro this on his Windows box (it
> fails even with strings that are succeeding on the other BF machines),
> so we have something to work with.  The name mangling that is done in
> get_iso_localename() looks pretty interesting...  It does feel a bit
> like there is some other hidden environmental factor or setting here,
> because commit 352f6f2df60 tested OK on Juan Jose's machine too.
> Hopefully more soon.

It seems to boil down to GetNLSVersionEx() not liking the "English_New
Zealand.1252" string.  The theory about it having a space does not
seem to be a factor as if I change it to "English_Australia.1252", I
get the same issue.

Going by the docs in [1] and following the "local name" link to [2],
there's a description there that mentions: "Generally, the pattern
<language>-<REGION> is used.".  So, if I just hack the code in
get_collation_actual_version() to pass "en-NZ" to GetNLSVersionEx(),
that works fine.

In [3], Juan José was passing in en-US rather than these more weird
Windows-specific locale strings, so the testing that code got when it
went in didn't include seeing if something like "English_New
Zealand.1252" would be accepted.

The "English_New Zealand.1252" string seems to come from the
setlocales() call in initdb via check_locale_name(LC_COLLATE,
lc_collate, &canonname), and fundamentally setlocale(LC_COLLATE).

I'm still a bit mystified why whelk seems unphased by this change. You
can see from [4] that it must be passing "German_Germany.1252" to
GetNLSVersionEx().  I've tested both on Windows 8.1 and Windows 10 and
I can't get GetNLSVersionEx() to accept that. So maybe Windows 7
allowed these non-ISO formats?  That theory seems to break down a bit
when you see that walleye is perfectly happy on Windows 10 (MinGW64).
You can see from [5] it mentions "The database cluster will be
initialized with locale "English_United States.1252".".

Running low on ideas for now, so thought I'd post this in case it
someone thinks of something else.

David

[1] https://docs.microsoft.com/en-us/windows/win32/api/winnls/nf-winnls-getnlsversionex
[2] https://docs.microsoft.com/en-us/windows/win32/intl/locale-names
[3] https://www.postgresql.org/message-id/CAC+AXB0Eat3aLeTrbDoBB9jX863CU_+RSbgiAjcED5DcXoBoFQ@mail.gmail.com
[4]
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=whelk&dt=2020-11-02%2020%3A41%3A40&stg=check-pg_upgrade
[5]
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=walleye&dt=2020-11-02%2020%3A55%3A31&stg=check-pg_upgrade



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: WIP: BRIN multi-range indexes
Next
From: David Rowley
Date:
Subject: Re: Collation versioning