Thread: WIN32 pg_import_system_collations

WIN32 pg_import_system_collations

From
Juan José Santamaría Flecha
Date:
I want to propose an implementation of pg_import_system_collations() for WIN32 using EnumSystemLocalesEx() [1], which is available from Windows Server 2008 onwards.

The patch includes a test emulating that of collate.linux.utf8, but for Windows-1252. The main difference is that it doesn't have the tests for Turkish dotted and undotted 'i', since that locale is WIN1254.


Regards,

Juan José Santamaría Flecha
Attachment

Re: WIN32 pg_import_system_collations

From
Juan José Santamaría Flecha
Date:

On Mon, Dec 13, 2021 at 9:41 AM Juan José Santamaría Flecha <juanjo.santamaria@gmail.com> wrote:

Per path tester.
 
Regards,

Juan José Santamaría Flecha
Attachment

Re: WIN32 pg_import_system_collations

From
Thomas Munro
Date:
On Tue, Dec 14, 2021 at 5:29 AM Juan José Santamaría Flecha
<juanjo.santamaria@gmail.com> wrote:
> On Mon, Dec 13, 2021 at 9:41 AM Juan José Santamaría Flecha <juanjo.santamaria@gmail.com> wrote:
> Per path tester.

Hi Juan José,

I haven't tested yet but +1 for the feature.  I guess the API didn't
exist at the time collation support was added.

+    /*
+     * Windows will use hyphens between language and territory, where ANSI
+     * uses an underscore. Simply make it ANSI looking.
+     */
+    hyphen = strchr(localebuf, '-');
+    if (hyphen)
+        *hyphen = '_';
+

This conversion makes sense, to keep the user experience the same
across platforms.   Nitpick on the comment: why ANSI?  I think we can
call "en_NZ" a POSIX locale identifier[1], and I think we can call
"en-NZ" a BCP 47 language tag.

+/*
+ * This test is for Windows/Visual Studio systems and assumes that a full set
+ * of locales is installed. It must be run in a database with WIN1252 encoding,
+ * because of the locales' encondings. We lose some interesting cases from the
+ * UTF-8 version, like Turkish dotted and undotted 'i' or Greek sigma.
+ */

s/encondings/encodings/

When would the full set of locales not be installed on a Windows
system, and why does this need Visual Studio?  Wondering if this test
will work with some of the frankenstein/cross toolchains tool chains
(not objecting if it doesn't and could be skipped, just trying to
understand the comment).

Slightly related to this, in case you didn't see it, I'd also like to
use BCP 47 tags for the default locale for PostgreSQL 15[2].

[1] https://en.wikipedia.org/wiki/Locale_(computer_software)#POSIX_platforms
[2]
https://www.postgresql.org/message-id/flat/CA%2BhUKGJ%3DXThErgAQRoqfCy1bKPxXVuF0%3D2zDbB%2BSxDs59pv7Fw%40mail.gmail.com



Re: WIN32 pg_import_system_collations

From
Juan José Santamaría Flecha
Date:

On Mon, Dec 13, 2021 at 9:54 PM Thomas Munro <thomas.munro@gmail.com> wrote:

I haven't tested yet but +1 for the feature.  I guess the API didn't
exist at the time collation support was added.

Good to hear.
 
This conversion makes sense, to keep the user experience the same
across platforms.   Nitpick on the comment: why ANSI?  I think we can
call "en_NZ" a POSIX locale identifier[1], and I think we can call
"en-NZ" a BCP 47 language tag.

POSIX also works for me.
 
When would the full set of locales not be installed on a Windows
system, and why does this need Visual Studio?  Wondering if this test
will work with some of the frankenstein/cross toolchains tool chains
(not objecting if it doesn't and could be skipped, just trying to
understand the comment).

What I meant to say is that to run the test, you need a database that has successfully run pg_import_system_collations. This would be also possible in Mingw for _WIN32_WINNT> = 0x0600, but the current value in src\include\port\win32.h is _WIN32_WINNT = 0x0501 when compiling with Mingw.
 
Regards, 
 Juan José Santamaría Flecha

Re: WIN32 pg_import_system_collations

From
Thomas Munro
Date:
On Wed, Dec 15, 2021 at 9:14 AM Juan José Santamaría Flecha
<juanjo.santamaria@gmail.com> wrote:
> What I meant to say is that to run the test, you need a database that has successfully run
pg_import_system_collations.This would be also possible in Mingw for _WIN32_WINNT> = 0x0600, but the current value in
src\include\port\win32.his _WIN32_WINNT = 0x0501 when compiling with Mingw. 

Ah, right.  I hope we can make the leap to 0x0A00 (Win10) soon and
just stop thinking about these old ghosts, as mentioned by various
people in various threads.  Do you happen to know if there are
complications for that, with the non-MSVC tool chains?



Re: WIN32 pg_import_system_collations

From
Michael Paquier
Date:
On Wed, Dec 15, 2021 at 10:45:28AM +1300, Thomas Munro wrote:
> Ah, right.  I hope we can make the leap to 0x0A00 (Win10) soon and
> just stop thinking about these old ghosts, as mentioned by various
> people in various threads.

Seeing your message here..  My apologies for the short digression.
Would that mean that we could use CreateSymbolicLinkA() as a mapper
for pgreadlink() rather than junction points?  I am wondering how much
code in src/port/ such a move could allow us to do.
--
Michael

Attachment

Re: WIN32 pg_import_system_collations

From
Thomas Munro
Date:
On Wed, Dec 15, 2021 at 3:52 PM Michael Paquier <michael@paquier.xyz> wrote:
> On Wed, Dec 15, 2021 at 10:45:28AM +1300, Thomas Munro wrote:
> > Ah, right.  I hope we can make the leap to 0x0A00 (Win10) soon and
> > just stop thinking about these old ghosts, as mentioned by various
> > people in various threads.
>
> Seeing your message here..  My apologies for the short digression.
> Would that mean that we could use CreateSymbolicLinkA() as a mapper
> for pgreadlink() rather than junction points?  I am wondering how much
> code in src/port/ such a move could allow us to do.

Sadly, (1) it wouldn't work unless running with a special privilege or
as admin, and (2) it wouldn't work on non-NTFS filesystems.  I think
it's mostly intended to allow things like unpacking tarballs, checking
out git repos etc etc etc that came from Unix systems, which is why it
works with 'developer mode' enabled[1], though obviously it wouldn't
be totally impossible for us to require that privilege.  Didn't seem
great to me, though, that's why I gave up on it over in
https://commitfest.postgresql.org/36/3090/ where this was recently
discussed.

[1] https://blogs.windows.com/windowsdeveloper/2016/12/02/symlinks-windows-10/



Re: WIN32 pg_import_system_collations

From
Julien Rouhaud
Date:
Hi,

On Mon, Dec 13, 2021 at 05:28:47PM +0100, Juan José Santamaría Flecha wrote:
> On Mon, Dec 13, 2021 at 9:41 AM Juan José Santamaría Flecha <
> juanjo.santamaria@gmail.com> wrote:
> 
> Per path tester.

This version doesn't apply anymore:

http://cfbot.cputube.org/patch_36_3450.log
=== Applying patches on top of PostgreSQL commit ID e0e567a106726f6709601ee7cffe73eb6da8084e ===
=== applying patch ./v2-0001-WIN32-pg_import_system_collations.patch
[...]
patching file src/tools/msvc/vcregress.pl
Hunk #1 succeeded at 153 (offset -1 lines).
Hunk #2 FAILED at 170.
1 out of 2 hunks FAILED -- saving rejects to file src/tools/msvc/vcregress.pl.rej

Could you send a rebased version?  In the meantime I will switch the CF entry
to Waiting on Author.



Re: WIN32 pg_import_system_collations

From
Juan José Santamaría Flecha
Date:

On Wed, Jan 19, 2022 at 10:53 AM Julien Rouhaud <rjuju123@gmail.com> wrote:

This version doesn't apply anymore:

Thanks for the heads up.

Please find attached a rebased patch. I have also rewritten some comments to address previous reviews, code and test remain the same.

Regards,

Juan José Santamaría Flecha
Attachment

Re: WIN32 pg_import_system_collations

From
Dmitry Koval
Date:
Hi Juan José,

I a bit tested this feature and have small doubts about block:

+/*
+ * Windows will use hyphens between language and territory, where POSIX
+ * uses an underscore. Simply make it POSIX looking.
+ */
+ hyphen = strchr(localebuf, '-');
+ if (hyphen)
+    *hyphen = '_';

After this block modified collation name is used in function

GetNLSVersionEx(COMPARE_STRING, wide_collcollate, &version)

(see win32_read_locale() -> CollationFromLocale() -> CollationCreate()
call). Is it correct to use (wide_collcollate = "en_NZ") instead of
(wide_collcollate = "en-NZ") in GetNLSVersionEx() function?

1) Documentation [1], [2], quote:
If it is a neutral locale for which the script is significant,
the pattern is <language>-<Script>.

2) Conversation [3], David Rowley, quote:
Then, since GetNLSVersionEx()
wants yet another variant with a - rather than an _, I've just added a
couple of lines to swap the _ for a -.


On my computer (Windows 10 Pro 21H2 19044.1466, MSVC2019 version
16.11.9) work correctly both variants ("en_NZ", "en-NZ").

But David Rowley (MSVC2010 and MSVC2017) replaced "_" to "-"
for the same function. Maybe he had a problem with "_" on MSVC2010 or 
MSVC2017?

[1] 
https://docs.microsoft.com/en-us/windows/win32/api/winnls/nf-winnls-getnlsversionex
[2] https://docs.microsoft.com/en-us/windows/win32/intl/locale-names
[3] 
https://www.postgresql.org/message-id/flat/CAApHDvq3FXpH268rt-6sD_Uhe7Ekv9RKXHFvpv%3D%3Duh4c9OeHHQ%40mail.gmail.com

With best regards,
Dmitry Koval.



Re: WIN32 pg_import_system_collations

From
Dmitry Koval
Date:
Hi Juan José,

I a bit tested this feature and have small doubts about block:

+/*
+ * Windows will use hyphens between language and territory, where POSIX
+ * uses an underscore. Simply make it POSIX looking.
+ */
+ hyphen = strchr(localebuf, '-');
+ if (hyphen)
+    *hyphen = '_';

After this block modified collation name is used in function

GetNLSVersionEx(COMPARE_STRING, wide_collcollate, &version)

(see win32_read_locale() -> CollationFromLocale() -> CollationCreate()
call). Is it correct to use (wide_collcollate = "en_NZ") instead of
(wide_collcollate = "en-NZ") in GetNLSVersionEx() function?

1) Documentation [1], [2], quote:
If it is a neutral locale for which the script is significant,
the pattern is <language>-<Script>.

2) Conversation [3], David Rowley, quote:
Then, since GetNLSVersionEx()
wants yet another variant with a - rather than an _, I've just added a
couple of lines to swap the _ for a -.


On my computer (Windows 10 Pro 21H2 19044.1466, MSVC2019 version
16.11.9) work correctly both variants ("en_NZ", "en-NZ").

But David Rowley (MSVC2010 and MSVC2017) replaced "_" to "-"
for the same function. Maybe he had a problem with "_" on MSVC2010 or 
MSVC2017?

[1] 
https://docs.microsoft.com/en-us/windows/win32/api/winnls/nf-winnls-getnlsversionex
[2] https://docs.microsoft.com/en-us/windows/win32/intl/locale-names
[3] 
https://www.postgresql.org/message-id/flat/CAApHDvq3FXpH268rt-6sD_Uhe7Ekv9RKXHFvpv%3D%3Duh4c9OeHHQ%40mail.gmail.com

With best regards,
Dmitry Koval.



Re: WIN32 pg_import_system_collations

From
Peter Eisentraut
Date:
On 24.01.22 22:23, Dmitry Koval wrote:
> +/*
> + * Windows will use hyphens between language and territory, where POSIX
> + * uses an underscore. Simply make it POSIX looking.
> + */
> + hyphen = strchr(localebuf, '-');
> + if (hyphen)
> +    *hyphen = '_';
> 
> After this block modified collation name is used in function
> 
> GetNLSVersionEx(COMPARE_STRING, wide_collcollate, &version)
> 
> (see win32_read_locale() -> CollationFromLocale() -> CollationCreate()
> call). Is it correct to use (wide_collcollate = "en_NZ") instead of
> (wide_collcollate = "en-NZ") in GetNLSVersionEx() function?

I don't really know if this is necessary anyway.  Just create the 
collations with the names that the operating system presents.  There is 
no requirement to make the names match POSIX.

If you want to make them match POSIX for some reason, you can also just 
change the object name but leave the collcollate/collctype fields the 
way they came from the OS.



Re: WIN32 pg_import_system_collations

From
Juan José Santamaría Flecha
Date:

On Tue, Jan 25, 2022 at 11:40 AM Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:
On 24.01.22 22:23, Dmitry Koval wrote:
 
Thanks for looking into this.
 
> +/*
> + * Windows will use hyphens between language and territory, where POSIX
> + * uses an underscore. Simply make it POSIX looking.
> + */
> + hyphen = strchr(localebuf, '-');
> + if (hyphen)
> +    *hyphen = '_';
>
> After this block modified collation name is used in function
>
> GetNLSVersionEx(COMPARE_STRING, wide_collcollate, &version)
>
> (see win32_read_locale() -> CollationFromLocale() -> CollationCreate()
> call). Is it correct to use (wide_collcollate = "en_NZ") instead of
> (wide_collcollate = "en-NZ") in GetNLSVersionEx() function?

The problem that David Rowley addressed was coming from Windows collations in the shape of "English_New Zealand", GetNLSVersionEx() will work with both "en_NZ" and "en-NZ". You can check collversion in pg_collation in the patched version.

I don't really know if this is necessary anyway.  Just create the
collations with the names that the operating system presents.  There is
no requirement to make the names match POSIX.

If you want to make them match POSIX for some reason, you can also just
change the object name but leave the collcollate/collctype fields the
way they came from the OS.

I think there is some value in making collation names consistent across different platforms, e.g. making user scripts more portable. So, I'm doing that in the attached version, just changing the object name.

Regards,

Juan José Santamaría Flecha
 
Attachment

Re: WIN32 pg_import_system_collations

From
Andres Freund
Date:
Hi,

On 2022-01-25 15:49:01 +0100, Juan José Santamaría Flecha wrote:
> So, I'm doing that in the attached version, just changing the object name.

Currently fails to apply, please rebase: http://cfbot.cputube.org/patch_37_3450.log

Marked as waiting-on-author.

- Andres



Re: WIN32 pg_import_system_collations

From
Juan José Santamaría Flecha
Date:

On Tue, Mar 22, 2022 at 2:00 AM Andres Freund <andres@anarazel.de> wrote:

Currently fails to apply, please rebase: http://cfbot.cputube.org/patch_37_3450.log

Marked as waiting-on-author.

Please, find attached a rebased version, no other significant change.

Regards,

Juan José Santamaría Flecha
Attachment

Re: WIN32 pg_import_system_collations

From
Juan José Santamaría Flecha
Date:

Please find attached a rebased version. I have split the patch into two parts trying to make it easier to review, one with the code changes and the other with the test.

Other than that, there are minimal changes from the previous version to the code due to the update of _WIN32_WINNT and enabling the test on cirrus.

Regards,

Juan José Santamaría Flecha
Attachment

Re: WIN32 pg_import_system_collations

From
Peter Eisentraut
Date:
On 12.07.22 21:32, Juan José Santamaría Flecha wrote:
> Please find attached a rebased version. I have split the patch into two 
> parts trying to make it easier to review, one with the code changes and 
> the other with the test.
> 
> Other than that, there are minimal changes from the previous version to 
> the code due to the update of _WIN32_WINNT and enabling the test on cirrus.

I'm not familiar with Windows, so I'm just looking at the overall 
structure of this patch.  I think it pretty much makes sense.  But we 
need to consider that this operates on the confluence of various 
different operating system interfaces that not all people will be 
familiar with, so we need to really get the documentation done well.

Consider this function you are introducing:

+/*
+ * Create a collation if the input locale is valid for so.
+ * Also keeps track of the number of valid locales and collations created.
+ */
+static int
+CollationFromLocale(char *isolocale, char *localebuf, int *nvalid,
+                                       int *ncreated, int nspid)

This declaration is incomprehensible without studying all the callers 
and the surrounding code.

Start with the name: What does "collation from locale" mean?  Does it 
make a collation?  Does it convert one?  Does it find one?  There should 
be a verb in there.

(I think in the context of this file, a lower case name would be more 
appropriate for a static function.)

Then the arguments.  The input arguments should be "const".  All the 
arguments should be documented.  What is "isolocale", what is 
"localebuf", how are they different?  What is being counted by "valid" 
(collatons?, locales?), and what makes a thing valid and invalid?  What 
is being "created"?  What is nspid?  What is the return value?

Please make another pass over this.

Also consider describing in the commit message what you are doing in 
more detail, including some of the things that have been discussed in 
this thread.




Re: WIN32 pg_import_system_collations

From
Juan José Santamaría Flecha
Date:

On Mon, Oct 31, 2022 at 3:09 PM Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:

Thanks for taking a look into this patch.

Consider this function you are introducing:

+/*
+ * Create a collation if the input locale is valid for so.
+ * Also keeps track of the number of valid locales and collations created.
+ */
+static int
+CollationFromLocale(char *isolocale, char *localebuf, int *nvalid,
+                                       int *ncreated, int nspid)

This declaration is incomprehensible without studying all the callers
and the surrounding code.

Start with the name: What does "collation from locale" mean?  Does it
make a collation?  Does it convert one?  Does it find one?  There should
be a verb in there.

(I think in the context of this file, a lower case name would be more
appropriate for a static function.)

Then the arguments.  The input arguments should be "const".  All the
arguments should be documented.  What is "isolocale", what is
"localebuf", how are they different?  What is being counted by "valid"
(collatons?, locales?), and what makes a thing valid and invalid?  What
is being "created"?  What is nspid?  What is the return value?

Please make another pass over this.

Ok, I can definitely improve the comments for that function.
 
Also consider describing in the commit message what you are doing in
more detail, including some of the things that have been discussed in
this thread.

Going through the thread for the commit message, I think that maybe the collation naming remarks were not properly addressed. In the current version the collations retain their native name, but an alias is created for those with a shape that we can assume a POSIX equivalent exists.

Please find attached a new version.

Regards,

Juan José Santamaría Flecha
Attachment

Re: WIN32 pg_import_system_collations

From
Peter Eisentraut
Date:
On 04.11.22 23:08, Juan José Santamaría Flecha wrote:
> Ok, I can definitely improve the comments for that function.
> 
>     Also consider describing in the commit message what you are doing in
>     more detail, including some of the things that have been discussed in
>     this thread.
> 
> Going through the thread for the commit message, I think that maybe the 
> collation naming remarks were not properly addressed. In the current 
> version the collations retain their native name, but an alias is created 
> for those with a shape that we can assume a POSIX equivalent exists.

This looks pretty good to me.  The refactoring of the non-Windows parts 
makes sense.  The Windows parts look reasonable on manual inspection, 
but again, I don't have access to Windows here, so someone else should 
also look it over.

A small style issue: Change return (TRUE) to return TRUE.

The code

+   if (strlen(localebuf) == 5 && localebuf[2] == '-')

might be too specific.  At least on some POSIX systems, I have seen 
locales with a three-letter language name.  Maybe you should look with 
strchr() and not be too strict about the exact position.

For the test patch, why is a separate test for non-UTF8 needed on 
Windows.  Does the UTF8 one not work?

+       version() !~ 'Visual C\+\+'

This probably won't work for MinGW.




Re: WIN32 pg_import_system_collations

From
Juan José Santamaría Flecha
Date:

On Mon, Nov 7, 2022 at 4:08 PM Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:

This looks pretty good to me.  The refactoring of the non-Windows parts
makes sense.  The Windows parts look reasonable on manual inspection,
but again, I don't have access to Windows here, so someone else should
also look it over.

I was going to say that at least it is getting tested on the CI, but I have found out that meson changes version(). That is fixed in this version.
 
A small style issue: Change return (TRUE) to return TRUE.

Fixed.
 
The code

+   if (strlen(localebuf) == 5 && localebuf[2] == '-')

might be too specific.  At least on some POSIX systems, I have seen
locales with a three-letter language name.  Maybe you should look with
strchr() and not be too strict about the exact position.

Ok, in this version the POSIX alias is created unconditionally.
 
For the test patch, why is a separate test for non-UTF8 needed on
Windows.  Does the UTF8 one not work?

Windows locales will retain their CP_ACP encoding unless you change the OS code page to UFT8, which is still experimental [1].
 
+       version() !~ 'Visual C\+\+'

This probably won't work for MinGW.

When I proposed this patch it wouldn't have worked because of the project's Windows minimum version requirement, now it should work in MinGW. It actually doesn't because most locales are failing with "skipping locale with unrecognized encoding", but checking what's wrong with pg_get_encoding_from_locale() in MiNGW is subject for another thread.


Regards,

Juan José Santamaría Flecha
Attachment

Re: WIN32 pg_import_system_collations

From
Juan José Santamaría Flecha
Date:

On Wed, Nov 9, 2022 at 12:02 AM Juan José Santamaría Flecha <juanjo.santamaria@gmail.com> wrote:
On Mon, Nov 7, 2022 at 4:08 PM Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:

This looks pretty good to me.  The refactoring of the non-Windows parts
makes sense.  The Windows parts look reasonable on manual inspection,
but again, I don't have access to Windows here, so someone else should
also look it over.

I was going to say that at least it is getting tested on the CI, but I have found out that meson changes version(). That is fixed in this version.

Now is currently failing due to [1], so maybe we can leave this patch on hold until that's addressed. 


Regards,

Juan José Santamaría Flecha


Re: WIN32 pg_import_system_collations

From
Peter Eisentraut
Date:
On 10.11.22 11:08, Juan José Santamaría Flecha wrote:
>         This looks pretty good to me.  The refactoring of the
>         non-Windows parts
>         makes sense.  The Windows parts look reasonable on manual
>         inspection,
>         but again, I don't have access to Windows here, so someone else
>         should
>         also look it over.
> 
>     I was going to say that at least it is getting tested on the CI, but
>     I have found out that meson changes version(). That is fixed in this
>     version.
> 
> 
> Now is currently failing due to [1], so maybe we can leave this patch on 
> hold until that's addressed.
> 
> [1] 
> https://www.postgresql.org/message-id/CAC%2BAXB1wJEqfKCuVcNpoH%3Dgxd61N%3D7c2fR3Ew6YRPpSfEUA%3DyQ%40mail.gmail.com
<https://www.postgresql.org/message-id/CAC%2BAXB1wJEqfKCuVcNpoH%3Dgxd61N%3D7c2fR3Ew6YRPpSfEUA%3DyQ%40mail.gmail.com>

What is the status of this now?  I think the other issue has been addressed?




Re: WIN32 pg_import_system_collations

From
Juan José Santamaría Flecha
Date:

On Thu, Dec 1, 2022 at 8:46 AM Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:

What is the status of this now?  I think the other issue has been addressed?

Yes, that's addressed for MSVC builds. I think there are a couple of pending issues for MinGW, but those should have their own threads.

The patch had rotten, so PFA a rebased version.

Regards,

Juan José Santamaría Flecha  
Attachment

Re: WIN32 pg_import_system_collations

From
Peter Eisentraut
Date:
On 09.12.22 13:48, Juan José Santamaría Flecha wrote:
> On Thu, Dec 1, 2022 at 8:46 AM Peter Eisentraut 
> <peter.eisentraut@enterprisedb.com 
> <mailto:peter.eisentraut@enterprisedb.com>> wrote:
> 
> 
>     What is the status of this now?  I think the other issue has been
>     addressed?
> 
> 
> Yes, that's addressed for MSVC builds. I think there are a couple of 
> pending issues for MinGW, but those should have their own threads.
> 
> The patch had rotten, so PFA a rebased version.

committed




Re: WIN32 pg_import_system_collations

From
Andrew Dunstan
Date:


On 2023-01-03 Tu 08:48, Peter Eisentraut wrote:
On 09.12.22 13:48, Juan José Santamaría Flecha wrote:
On Thu, Dec 1, 2022 at 8:46 AM Peter Eisentraut <peter.eisentraut@enterprisedb.com <mailto:peter.eisentraut@enterprisedb.com>> wrote:


    What is the status of this now?  I think the other issue has been
    addressed?


Yes, that's addressed for MSVC builds. I think there are a couple of pending issues for MinGW, but those should have their own threads.

The patch had rotten, so PFA a rebased version.

committed



Now that I have removed the barrier to testing this in the buildfarm, and added an appropriate locale setting to drongo, we can see that this test fails like this:


diff -w -U3 c:/prog/bf/root/HEAD/pgsql.build/src/test/regress/expected/collate.windows.win1252.out c:/prog/bf/root/HEAD/pgsql.build/src/test/regress/results/collate.windows.win1252.out
--- c:/prog/bf/root/HEAD/pgsql.build/src/test/regress/expected/collate.windows.win1252.out	2023-01-23 04:39:06.755149600 +0000
+++ c:/prog/bf/root/HEAD/pgsql.build/src/test/regress/results/collate.windows.win1252.out	2023-02-26 17:32:54.115515200 +0000
@@ -363,16 +363,17 @@  -- to_char SET lc_time TO 'de_DE';
+ERROR:  invalid value for parameter "lc_time": "de_DE" SELECT to_char(date '2010-03-01', 'DD TMMON YYYY');    to_char -------------
- 01 MRZ 2010
+ 01 MAR 2010 (1 row)  SELECT to_char(date '2010-03-01', 'DD TMMON YYYY' COLLATE "de_DE");    to_char -------------
- 01 MRZ 2010
+ 01 MAR 2010 (1 row)  -- to_date


The last of these is especially an issue, as it doesn't even throw an error.

See <https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&dt=2023-02-26%2016%3A56%3A30>


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: WIN32 pg_import_system_collations

From
Andrew Dunstan
Date:


On 2023-02-26 Su 16:02, Andrew Dunstan wrote:


On 2023-01-03 Tu 08:48, Peter Eisentraut wrote:
On 09.12.22 13:48, Juan José Santamaría Flecha wrote:
On Thu, Dec 1, 2022 at 8:46 AM Peter Eisentraut <peter.eisentraut@enterprisedb.com <mailto:peter.eisentraut@enterprisedb.com>> wrote:


    What is the status of this now?  I think the other issue has been
    addressed?


Yes, that's addressed for MSVC builds. I think there are a couple of pending issues for MinGW, but those should have their own threads.

The patch had rotten, so PFA a rebased version.

committed



Now that I have removed the barrier to testing this in the buildfarm, and added an appropriate locale setting to drongo, we can see that this test fails like this:


diff -w -U3 c:/prog/bf/root/HEAD/pgsql.build/src/test/regress/expected/collate.windows.win1252.out c:/prog/bf/root/HEAD/pgsql.build/src/test/regress/results/collate.windows.win1252.out
--- c:/prog/bf/root/HEAD/pgsql.build/src/test/regress/expected/collate.windows.win1252.out	2023-01-23 04:39:06.755149600 +0000
+++ c:/prog/bf/root/HEAD/pgsql.build/src/test/regress/results/collate.windows.win1252.out	2023-02-26 17:32:54.115515200 +0000
@@ -363,16 +363,17 @@  -- to_char SET lc_time TO 'de_DE';
+ERROR:  invalid value for parameter "lc_time": "de_DE" SELECT to_char(date '2010-03-01', 'DD TMMON YYYY');    to_char -------------
- 01 MRZ 2010
+ 01 MAR 2010 (1 row)  SELECT to_char(date '2010-03-01', 'DD TMMON YYYY' COLLATE "de_DE");    to_char -------------
- 01 MRZ 2010
+ 01 MAR 2010 (1 row)  -- to_date


The last of these is especially an issue, as it doesn't even throw an error.

See <https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&dt=2023-02-26%2016%3A56%3A30>





Further investigation shows that if we change the two instances of "de_DE" to "de-DE" the tests behave as expected, so it appears that while POSIX style aliases have been created for the BCP 47 style locales, using the POSIX aliases doesn't in fact work. I cant see anything that turns the POSIX locale name back into BCP 47 at the point of use, which seems to be what's needed.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: WIN32 pg_import_system_collations

From
Juan José Santamaría Flecha
Date:

On Mon, Feb 27, 2023 at 1:10 PM Andrew Dunstan <andrew@dunslane.net> wrote:

On 2023-02-26 Su 16:02, Andrew Dunstan wrote:

Now that I have removed the barrier to testing this in the buildfarm, and added an appropriate locale setting to drongo, we can see that this test fails like this:


diff -w -U3 c:/prog/bf/root/HEAD/pgsql.build/src/test/regress/expected/collate.windows.win1252.out c:/prog/bf/root/HEAD/pgsql.build/src/test/regress/results/collate.windows.win1252.out
--- c:/prog/bf/root/HEAD/pgsql.build/src/test/regress/expected/collate.windows.win1252.out	2023-01-23 04:39:06.755149600 +0000
+++ c:/prog/bf/root/HEAD/pgsql.build/src/test/regress/results/collate.windows.win1252.out	2023-02-26 17:32:54.115515200 +0000
@@ -363,16 +363,17 @@  -- to_char SET lc_time TO 'de_DE';
+ERROR:  invalid value for parameter "lc_time": "de_DE" SELECT to_char(date '2010-03-01', 'DD TMMON YYYY');    to_char -------------
- 01 MRZ 2010
+ 01 MAR 2010 (1 row)  SELECT to_char(date '2010-03-01', 'DD TMMON YYYY' COLLATE "de_DE");    to_char -------------
- 01 MRZ 2010
+ 01 MAR 2010 (1 row)  -- to_date


The last of these is especially an issue, as it doesn't even throw an error.

See <https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&dt=2023-02-26%2016%3A56%3A30>


Further investigation shows that if we change the two instances of "de_DE" to "de-DE" the tests behave as expected, so it appears that while POSIX style aliases have been created for the BCP 47 style locales, using the POSIX aliases doesn't in fact work. I cant see anything that turns the POSIX locale name back into BCP 47 at the point of use, which seems to be what's needed.


The command that's failing is "SET lc_time TO 'de_DE';", and that area of code is untouched by this patch. As mentioned in [1], the problem seems to come from a Windows bug that the CI images and my development machines have patched out.

I think we should change the locale name to make the test more robust, as the attached. But I don't see a problem with making an alias for the collations.

Regards,

Juan José Santamaría Flecha
Attachment

Re: WIN32 pg_import_system_collations

From
Juan José Santamaría Flecha
Date:


El lun, 27 feb 2023, 23:05, Juan José Santamaría Flecha <juanjo.santamaria@gmail.com> escribió: 

The command that's failing is "SET lc_time TO 'de_DE';", and that area of code is untouched by this patch. As mentioned in [1], the problem seems to come from a Windows bug that the CI images and my development machines have patched out.

What I wanted to post as [1]:



Regards,

Juan José Santamaría Flecha

Re: WIN32 pg_import_system_collations

From
Andrew Dunstan
Date:


On 2023-02-27 Mo 17:20, Juan José Santamaría Flecha wrote:


El lun, 27 feb 2023, 23:05, Juan José Santamaría Flecha <juanjo.santamaria@gmail.com> escribió: 

The command that's failing is "SET lc_time TO 'de_DE';", and that area of code is untouched by this patch. As mentioned in [1], the problem seems to come from a Windows bug that the CI images and my development machines have patched out.

What I wanted to post as [1]:


Hmm, yeah. I'm not sure I understand the point of this test anyway:


SELECT to_char(date '2010-03-01', 'DD TMMON YYYY' COLLATE "de_DE");


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: WIN32 pg_import_system_collations

From
Juan José Santamaría Flecha
Date:

On Tue, Feb 28, 2023 at 12:55 PM Andrew Dunstan <andrew@dunslane.net> wrote:

On 2023-02-27 Mo 17:20, Juan José Santamaría Flecha wrote:


Hmm, yeah. I'm not sure I understand the point of this test anyway:


SELECT to_char(date '2010-03-01', 'DD TMMON YYYY' COLLATE "de_DE");


Uhm, they probably don't make much sense except for "tr_TR", so I'm fine with removing them. PFA a patch for so.

Regards,

Juan José Santamaría Flecha
 
Attachment

Re: WIN32 pg_import_system_collations

From
Andrew Dunstan
Date:


On 2023-02-28 Tu 11:40, Juan José Santamaría Flecha wrote:

On Tue, Feb 28, 2023 at 12:55 PM Andrew Dunstan <andrew@dunslane.net> wrote:

On 2023-02-27 Mo 17:20, Juan José Santamaría Flecha wrote:


Hmm, yeah. I'm not sure I understand the point of this test anyway:


SELECT to_char(date '2010-03-01', 'DD TMMON YYYY' COLLATE "de_DE");


Uhm, they probably don't make much sense except for "tr_TR", so I'm fine with removing them. PFA a patch for so.



I think you missed my point, which was that the COLLATE clause above seemed particularly pointless. But I agree that all these are not much use, so I'll remove them as you suggest.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: WIN32 pg_import_system_collations

From
Juan José Santamaría Flecha
Date:

On Tue, Feb 28, 2023 at 9:26 PM Andrew Dunstan <andrew@dunslane.net> wrote:

I think you missed my point, which was that the COLLATE clause above seemed particularly pointless. But I agree that all these are not much use, so I'll remove them as you suggest.


Maybe there has been some miscommunication, please let me try to explain myself a little better. The whole test is an attempt to mimic collate.linux.utf8, which has that same command, only for collate 'tr_TR', and so does collate.icu.utf8 but commented out.

I've seen that you have committed this and now drongo is green, which is great. Thank you for taking care of it.

Regards,

Juan José Santamaría Flecha