Thread: Can we get rid of GetLocaleInfoEx() yet?

Can we get rid of GetLocaleInfoEx() yet?

From
Tom Lane
Date:
Commit 0fb54de9a ("Support building with Visual Studio 2015")
introduced a hack in chklocale.c's win32_langinfo() to make
it use GetLocaleInfoEx() in place of _create_locale().

There's a problem with this, which is that if I'm reading the
docs correctly, GetLocaleInfoEx() accepts a smaller set of
possible locale strings (only "locale names") than do either
_create_locale() or setlocale().  The _create_locale() docs say

    The locale argument can take a locale name, a language string, a
    language string and country/region code, a code page, or a language
    string, country/region code, and code page.

and they imply (but don't quite manage to say in so many words)
that these are the same strings accepted by setlocale().

The reason this is a problem is that when given a locale string,
in either initdb or CREATE DATABASE, we first validate it by
seeing if setlocale() likes it.  We produce a reasonable error
message if not.  Otherwise we then go on to try to identify the
implied encoding via chklocale.c.  But if GetLocaleInfoEx()
fails, we fall back to trying to parse out the codepage for
ourselves, which can lead to a silly/unhelpful error message,
as recently complained of at [1].

The reason for the hack, per the comments, is that VS2015
omits a codepage field from the result of _create_locale();
and some optimism is expressed therein that Microsoft might
undo that oversight in future.  Has this been fixed in more
recent VS versions?  If not, can we find another, more robust
way to do it?

            regards, tom lane

[1] https://www.postgresql.org/message-id/flat/F4D04849032C4464B8FF17CB0F896F9E%40dell2



Re: Can we get rid of GetLocaleInfoEx() yet?

From
Juan José Santamaría Flecha
Date:

On Sun, Mar 29, 2020 at 3:29 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

The reason for the hack, per the comments, is that VS2015
omits a codepage field from the result of _create_locale();
and some optimism is expressed therein that Microsoft might
undo that oversight in future.  Has this been fixed in more
recent VS versions?  If not, can we find another, more robust
way to do it?

While working on another issue I have seen this issue reproduce in VS2019. So no, it has not been fixed.

Please find attached a patch that provides a better detection of the "uft8" cases.

Regards,

Juan José Santamaría Flecha

Attachment

Re: Can we get rid of GetLocaleInfoEx() yet?

From
Tom Lane
Date:
=?UTF-8?Q?Juan_Jos=C3=A9_Santamar=C3=ADa_Flecha?= <juanjo.santamaria@gmail.com> writes:
> On Sun, Mar 29, 2020 at 3:29 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The reason for the hack, per the comments, is that VS2015
>> omits a codepage field from the result of _create_locale();
>> and some optimism is expressed therein that Microsoft might
>> undo that oversight in future.  Has this been fixed in more
>> recent VS versions?  If not, can we find another, more robust
>> way to do it?

> While working on another issue I have seen this issue reproduce in VS2019.
> So no, it has not been fixed.

Oh well, I figured that was too optimistic :-(

> Please find attached a patch that provides a better detection of the "uft8"
> cases.

In general, I think the problem is that we might be dealing with a
Unix-style locale string, in which the encoding name might be quite
a few other things besides "utf8".  But actually your patch works
for that too, since what's going to happen next is we'll search the
encoding_match_list[] for a match.  I do suggest being a bit more
paranoid about what's a codepage number though, as attached.
(Untested, since I lack a Windows environment, but it's pretty
straightforward code.)

            regards, tom lane

diff --git a/src/port/chklocale.c b/src/port/chklocale.c
index c9c680f..9e3c6db 100644
--- a/src/port/chklocale.c
+++ b/src/port/chklocale.c
@@ -239,25 +239,44 @@ win32_langinfo(const char *ctype)
     {
         r = malloc(16);            /* excess */
         if (r != NULL)
-            sprintf(r, "CP%u", cp);
+        {
+            /*
+             * If the return value is CP_ACP that means no ANSI code page is
+             * available, so only Unicode can be used for the locale.
+             */
+            if (cp == CP_ACP)
+                strcpy(r, "utf8");
+            else
+                sprintf(r, "CP%u", cp);
+        }
     }
     else
 #endif
     {
         /*
-         * Locale format on Win32 is <Language>_<Country>.<CodePage> . For
-         * example, English_United States.1252.
+         * Locale format on Win32 is <Language>_<Country>.<CodePage>.  For
+         * example, English_United States.1252.  If we see digits after the
+         * last dot, assume it's a codepage number.  Otherwise, we might be
+         * dealing with a Unix-style locale string; Windows' setlocale() will
+         * take those even though GetLocaleInfoEx() won't, so we end up here.
+         * In that case, just return what's after the last dot and hope we can
+         * find it in our table.
          */
         codepage = strrchr(ctype, '.');
         if (codepage != NULL)
         {
-            int            ln;
+            size_t        ln;
 
             codepage++;
             ln = strlen(codepage);
             r = malloc(ln + 3);
             if (r != NULL)
-                sprintf(r, "CP%s", codepage);
+            {
+                if (strspn(codepage, "0123456789") == ln)
+                    sprintf(r, "CP%s", codepage);
+                else
+                    strcpy(r, codepage);
+            }
         }
 
     }

Re: Can we get rid of GetLocaleInfoEx() yet?

From
Juan José Santamaría Flecha
Date:

On Sun, Mar 29, 2020 at 7:00 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

In general, I think the problem is that we might be dealing with a
Unix-style locale string, in which the encoding name might be quite
a few other things besides "utf8".  But actually your patch works
for that too, since what's going to happen next is we'll search the
encoding_match_list[] for a match.  I do suggest being a bit more
paranoid about what's a codepage number though, as attached.
(Untested, since I lack a Windows environment, but it's pretty
straightforward code.)

It works for the issue just fine, and more comments make a better a patch, so no objections from me.

Regards,

Juan José Santamaría Flecha

Re: Can we get rid of GetLocaleInfoEx() yet?

From
Michael Paquier
Date:
On Sun, Mar 29, 2020 at 07:06:56PM +0200, Juan José Santamaría Flecha wrote:
> It works for the issue just fine, and more comments make a better a
> patch, so no objections from me.

+1 from me.  And yes, we are still missing lc_codepage in newer
versions of VS.  Locales + Windows != 2, business as usual.
--
Michael

Attachment

Re: Can we get rid of GetLocaleInfoEx() yet?

From
Tom Lane
Date:
Michael Paquier <michael@paquier.xyz> writes:
> On Sun, Mar 29, 2020 at 07:06:56PM +0200, Juan José Santamaría Flecha wrote:
>> It works for the issue just fine, and more comments make a better a
>> patch, so no objections from me.

> +1 from me.  And yes, we are still missing lc_codepage in newer
> versions of VS.  Locales + Windows != 2, business as usual.

OK, pushed.

            regards, tom lane