Thread: Can we get rid of GetLocaleInfoEx() yet?
Commit 0fb54de9a ("Support building with Visual Studio 2015") introduced a hack in chklocale.c's win32_langinfo() to make it use GetLocaleInfoEx() in place of _create_locale(). There's a problem with this, which is that if I'm reading the docs correctly, GetLocaleInfoEx() accepts a smaller set of possible locale strings (only "locale names") than do either _create_locale() or setlocale(). The _create_locale() docs say The locale argument can take a locale name, a language string, a language string and country/region code, a code page, or a language string, country/region code, and code page. and they imply (but don't quite manage to say in so many words) that these are the same strings accepted by setlocale(). The reason this is a problem is that when given a locale string, in either initdb or CREATE DATABASE, we first validate it by seeing if setlocale() likes it. We produce a reasonable error message if not. Otherwise we then go on to try to identify the implied encoding via chklocale.c. But if GetLocaleInfoEx() fails, we fall back to trying to parse out the codepage for ourselves, which can lead to a silly/unhelpful error message, as recently complained of at [1]. The reason for the hack, per the comments, is that VS2015 omits a codepage field from the result of _create_locale(); and some optimism is expressed therein that Microsoft might undo that oversight in future. Has this been fixed in more recent VS versions? If not, can we find another, more robust way to do it? regards, tom lane [1] https://www.postgresql.org/message-id/flat/F4D04849032C4464B8FF17CB0F896F9E%40dell2
On Sun, Mar 29, 2020 at 3:29 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
The reason for the hack, per the comments, is that VS2015
omits a codepage field from the result of _create_locale();
and some optimism is expressed therein that Microsoft might
undo that oversight in future. Has this been fixed in more
recent VS versions? If not, can we find another, more robust
way to do it?
While working on another issue I have seen this issue reproduce in VS2019. So no, it has not been fixed.
Please find attached a patch that provides a better detection of the "uft8" cases.
Regards,
Juan José Santamaría Flecha
Attachment
=?UTF-8?Q?Juan_Jos=C3=A9_Santamar=C3=ADa_Flecha?= <juanjo.santamaria@gmail.com> writes: > On Sun, Mar 29, 2020 at 3:29 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> The reason for the hack, per the comments, is that VS2015 >> omits a codepage field from the result of _create_locale(); >> and some optimism is expressed therein that Microsoft might >> undo that oversight in future. Has this been fixed in more >> recent VS versions? If not, can we find another, more robust >> way to do it? > While working on another issue I have seen this issue reproduce in VS2019. > So no, it has not been fixed. Oh well, I figured that was too optimistic :-( > Please find attached a patch that provides a better detection of the "uft8" > cases. In general, I think the problem is that we might be dealing with a Unix-style locale string, in which the encoding name might be quite a few other things besides "utf8". But actually your patch works for that too, since what's going to happen next is we'll search the encoding_match_list[] for a match. I do suggest being a bit more paranoid about what's a codepage number though, as attached. (Untested, since I lack a Windows environment, but it's pretty straightforward code.) regards, tom lane diff --git a/src/port/chklocale.c b/src/port/chklocale.c index c9c680f..9e3c6db 100644 --- a/src/port/chklocale.c +++ b/src/port/chklocale.c @@ -239,25 +239,44 @@ win32_langinfo(const char *ctype) { r = malloc(16); /* excess */ if (r != NULL) - sprintf(r, "CP%u", cp); + { + /* + * If the return value is CP_ACP that means no ANSI code page is + * available, so only Unicode can be used for the locale. + */ + if (cp == CP_ACP) + strcpy(r, "utf8"); + else + sprintf(r, "CP%u", cp); + } } else #endif { /* - * Locale format on Win32 is <Language>_<Country>.<CodePage> . For - * example, English_United States.1252. + * Locale format on Win32 is <Language>_<Country>.<CodePage>. For + * example, English_United States.1252. If we see digits after the + * last dot, assume it's a codepage number. Otherwise, we might be + * dealing with a Unix-style locale string; Windows' setlocale() will + * take those even though GetLocaleInfoEx() won't, so we end up here. + * In that case, just return what's after the last dot and hope we can + * find it in our table. */ codepage = strrchr(ctype, '.'); if (codepage != NULL) { - int ln; + size_t ln; codepage++; ln = strlen(codepage); r = malloc(ln + 3); if (r != NULL) - sprintf(r, "CP%s", codepage); + { + if (strspn(codepage, "0123456789") == ln) + sprintf(r, "CP%s", codepage); + else + strcpy(r, codepage); + } } }
On Sun, Mar 29, 2020 at 7:00 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
In general, I think the problem is that we might be dealing with a
Unix-style locale string, in which the encoding name might be quite
a few other things besides "utf8". But actually your patch works
for that too, since what's going to happen next is we'll search the
encoding_match_list[] for a match. I do suggest being a bit more
paranoid about what's a codepage number though, as attached.
(Untested, since I lack a Windows environment, but it's pretty
straightforward code.)
It works for the issue just fine, and more comments make a better a patch, so no objections from me.
Regards,
Juan José Santamaría Flecha
On Sun, Mar 29, 2020 at 07:06:56PM +0200, Juan José Santamaría Flecha wrote: > It works for the issue just fine, and more comments make a better a > patch, so no objections from me. +1 from me. And yes, we are still missing lc_codepage in newer versions of VS. Locales + Windows != 2, business as usual. -- Michael
Attachment
Michael Paquier <michael@paquier.xyz> writes: > On Sun, Mar 29, 2020 at 07:06:56PM +0200, Juan José Santamaría Flecha wrote: >> It works for the issue just fine, and more comments make a better a >> patch, so no objections from me. > +1 from me. And yes, we are still missing lc_codepage in newer > versions of VS. Locales + Windows != 2, business as usual. OK, pushed. regards, tom lane