Re: Can we get rid of GetLocaleInfoEx() yet? - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Can we get rid of GetLocaleInfoEx() yet?
Date
Msg-id 28339.1585501223@sss.pgh.pa.us
Whole thread Raw
In response to Re: Can we get rid of GetLocaleInfoEx() yet?  (Juan José Santamaría Flecha <juanjo.santamaria@gmail.com>)
Responses Re: Can we get rid of GetLocaleInfoEx() yet?
List pgsql-hackers
=?UTF-8?Q?Juan_Jos=C3=A9_Santamar=C3=ADa_Flecha?= <juanjo.santamaria@gmail.com> writes:
> On Sun, Mar 29, 2020 at 3:29 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The reason for the hack, per the comments, is that VS2015
>> omits a codepage field from the result of _create_locale();
>> and some optimism is expressed therein that Microsoft might
>> undo that oversight in future.  Has this been fixed in more
>> recent VS versions?  If not, can we find another, more robust
>> way to do it?

> While working on another issue I have seen this issue reproduce in VS2019.
> So no, it has not been fixed.

Oh well, I figured that was too optimistic :-(

> Please find attached a patch that provides a better detection of the "uft8"
> cases.

In general, I think the problem is that we might be dealing with a
Unix-style locale string, in which the encoding name might be quite
a few other things besides "utf8".  But actually your patch works
for that too, since what's going to happen next is we'll search the
encoding_match_list[] for a match.  I do suggest being a bit more
paranoid about what's a codepage number though, as attached.
(Untested, since I lack a Windows environment, but it's pretty
straightforward code.)

            regards, tom lane

diff --git a/src/port/chklocale.c b/src/port/chklocale.c
index c9c680f..9e3c6db 100644
--- a/src/port/chklocale.c
+++ b/src/port/chklocale.c
@@ -239,25 +239,44 @@ win32_langinfo(const char *ctype)
     {
         r = malloc(16);            /* excess */
         if (r != NULL)
-            sprintf(r, "CP%u", cp);
+        {
+            /*
+             * If the return value is CP_ACP that means no ANSI code page is
+             * available, so only Unicode can be used for the locale.
+             */
+            if (cp == CP_ACP)
+                strcpy(r, "utf8");
+            else
+                sprintf(r, "CP%u", cp);
+        }
     }
     else
 #endif
     {
         /*
-         * Locale format on Win32 is <Language>_<Country>.<CodePage> . For
-         * example, English_United States.1252.
+         * Locale format on Win32 is <Language>_<Country>.<CodePage>.  For
+         * example, English_United States.1252.  If we see digits after the
+         * last dot, assume it's a codepage number.  Otherwise, we might be
+         * dealing with a Unix-style locale string; Windows' setlocale() will
+         * take those even though GetLocaleInfoEx() won't, so we end up here.
+         * In that case, just return what's after the last dot and hope we can
+         * find it in our table.
          */
         codepage = strrchr(ctype, '.');
         if (codepage != NULL)
         {
-            int            ln;
+            size_t        ln;
 
             codepage++;
             ln = strlen(codepage);
             r = malloc(ln + 3);
             if (r != NULL)
-                sprintf(r, "CP%s", codepage);
+            {
+                if (strspn(codepage, "0123456789") == ln)
+                    sprintf(r, "CP%s", codepage);
+                else
+                    strcpy(r, codepage);
+            }
         }
 
     }

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction
Next
From: Juan José Santamaría Flecha
Date:
Subject: Re: Can we get rid of GetLocaleInfoEx() yet?