Thread: Re: [COMMITTERS] pgsql: setlocale() on Windows doesn't work correctly if the locale name

(2011/04/16 2:56), Heikki Linnakangas wrote:
> setlocale() on Windows doesn't work correctly if the locale name contains
> apostrophes or dots.

As for apostrophes, isn't the cause that initdb loses the single quote
of locale? ([BUGS] BUG #5818: initdb lose the single quote of locale)

As the bug reporter mentions, initdb loses the single quote in reality.
Concretely speaking, scanstr() called from bootscanner.l loses it.
I'm not sure if it's suitable for the bootstrap code to call scanstr().

regards,
Hiroshi Inoue

 > There isn't much hope of Microsoft fixing it any time
> soon, it's been like that for ages, so we better work around it. So, map a
> few common Windows locale names known to cause problems to aliases that work.
>
> Branch
> ------
> master
>
> Details
> -------
> http://git.postgresql.org/pg/commitdiff/d5a7bf8c11c8b66c822bbb1a6c90e1a14425bd6e
>
> Modified Files
> --------------
> src/bin/initdb/initdb.c |   89 +++++++++++++++++++++++++++++++++++++++++++----
> 1 files changed, 82 insertions(+), 7 deletions(-)

Hiroshi Inoue <inoue@tpf.co.jp> writes:
> (2011/04/16 2:56), Heikki Linnakangas wrote:
>> setlocale() on Windows doesn't work correctly if the locale name contains
>> apostrophes or dots.

> As for apostrophes, isn't the cause that initdb loses the single quote
> of locale? ([BUGS] BUG #5818: initdb lose the single quote of locale)

> As the bug reporter mentions, initdb loses the single quote in reality.
> Concretely speaking, scanstr() called from bootscanner.l loses it.
> I'm not sure if it's suitable for the bootstrap code to call scanstr().

Huh?  Bootstrap mode just deals with the data found in
src/include/catalog/*.h.  The locale names found by initdb.c are stuck
in there afterwards, using regular SQL commands.  I don't know where the
problem really comes from, but I doubt the connection you're trying to
make above.

            regards, tom lane

(2011/04/20 9:22), Tom Lane wrote:
> Hiroshi Inoue<inoue@tpf.co.jp>  writes:
>> (2011/04/16 2:56), Heikki Linnakangas wrote:
>>> setlocale() on Windows doesn't work correctly if the locale name contains
>>> apostrophes or dots.
>
>> As for apostrophes, isn't the cause that initdb loses the single quote
>> of locale? ([BUGS] BUG #5818: initdb lose the single quote of locale)
>
>> As the bug reporter mentions, initdb loses the single quote in reality.
>> Concretely speaking, scanstr() called from bootscanner.l loses it.
>> I'm not sure if it's suitable for the bootstrap code to call scanstr().
>
> Huh?  Bootstrap mode just deals with the data found in
> src/include/catalog/*.h.  The locale names found by initdb.c are stuck
> in there afterwards, using regular SQL commands.

bootstrap_template1() in initdb runs the BKI script in bootstrap
mode to create template1. Some symbols (LC_COLLATE, LC_CTYPE in
pg_database etc) in the BKI script are substituted by actual values
using replace_token(). Isn't it correct?
ISTM replace_token() takes care of nothing about single quotes
in its input values but the comment in scanstr() says
                        /*
                         * Note: if scanner is working right, unescaped
quotes can only
                         * appear in pairs, so there should be another
character.
                         */

regards,
Hiroshi Inoue

> I don't know where the
> problem really comes from, but I doubt the connection you're trying to
> make above.
>
>             regards, tom lane


On 04/19/2011 09:42 PM, Hiroshi Inoue wrote:
>
> bootstrap_template1() in initdb runs the BKI script in bootstrap
> mode to create template1. Some symbols (LC_COLLATE, LC_CTYPE in
> pg_database etc) in the BKI script are substituted by actual values
> using replace_token(). Isn't it correct?
> ISTM replace_token() takes care of nothing about single quotes
> in its input values but the comment in scanstr() says
>                         /*
>                          * Note: if scanner is working right, unescaped
> quotes can only
>                          * appear in pairs, so there should be another
> character.
>                          */
>

That's perfectly true, but only one of the replaced locale names
contains a single quote mark. So clearly there's more going on here than
just the bug you're referring to. Heikki's commit message specifically
refers to dots in locale names, which shouldn't cause a problem of that
type, I believe.

cheers

andrew

(2011/04/20 12:25), Andrew Dunstan wrote:
>
> On 04/19/2011 09:42 PM, Hiroshi Inoue wrote:
>>
>> bootstrap_template1() in initdb runs the BKI script in bootstrap
>> mode to create template1. Some symbols (LC_COLLATE, LC_CTYPE in
>> pg_database etc) in the BKI script are substituted by actual values
>> using replace_token(). Isn't it correct?
>> ISTM replace_token() takes care of nothing about single quotes
>> in its input values but the comment in scanstr() says
>>                          /*
>>                           * Note: if scanner is working right, unescaped
>> quotes can only
>>                           * appear in pairs, so there should be another
>> character.
>>                           */
>>
>
> That's perfectly true, but only one of the replaced locale names
> contains a single quote mark. So clearly there's more going on here than
> just the bug you're referring to. Heikki's commit message specifically
> refers to dots in locale names, which shouldn't cause a problem of that
> type, I believe.

Yes it's completely another issue as for dots.
I can find no concrete reference to problems about locale
 names containing dots. Is the following an example?

In my environment (Windows Vista using VC8)

  setlocale(LC_XXXX, "Chinese (Traditional)_MCO.950");
works and
  setlocale(LC_XXXX, NULL);
returns
  Chinese (Traditional)_Macao S.A.R..950
but
  setlocale(LC_XXXX, "Chinese (Traditional)_Macao S.A.R..950");
fails.

regards,
Hiroshi Inoue

On 20.04.2011 06:48, Hiroshi Inoue wrote:
> I can find no concrete reference to problems about locale
>   names containing dots. Is the following an example?

Yes.

> In my environment (Windows Vista using VC8)
> 
>    setlocale(LC_XXXX, "Chinese (Traditional)_MCO.950");
> works and
>    setlocale(LC_XXXX, NULL);
> returns
>    Chinese (Traditional)_Macao S.A.R..950

Interesting. According to Microsoft's documentation, the codes are
three-letter country codes specified by ISO-3166
(http://msdn.microsoft.com/en-us/library/cdax410z%28v=VS.100%29.aspx).
However, according to Wikipedia, MCO stands for Monaco, not Macau
(https://secure.wikimedia.org/wikipedia/en/wiki/ISO_3166-1_alpha-3).

So according to bug #5818, the problem with "People's Republic of China"
was different from "Hong Kong S.A.R.", "Macau S.A.R.", and "U.A.E.".
setlocale() handles apostrophe fine, but it's not escaped correctly in
the BKI file. I'll remove the "People's Republic of China" -> "China"
mapping I committed, and fix the escaping instead.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Hiroshi Inoue <inoue@tpf.co.jp> writes:
> In my environment (Windows Vista using VC8)

>   setlocale(LC_XXXX, "Chinese (Traditional)_MCO.950");
> works and
>   setlocale(LC_XXXX, NULL);
> returns
>   Chinese (Traditional)_Macao S.A.R..950
> but
>   setlocale(LC_XXXX, "Chinese (Traditional)_Macao S.A.R..950");
> fails.

Interesting.  This example suggests that maybe Windows' setlocale can
only cope with dot as introducing a codepage number.  Are there any
cases where a dot works as part of the basic locale name?
        regards, tom lane


(2011/04/20 15:30), Heikki Linnakangas wrote:
> On 20.04.2011 06:48, Hiroshi Inoue wrote:
>> I can find no concrete reference to problems about locale
>>    names containing dots. Is the following an example?
> 
> Yes.
> 
>> In my environment (Windows Vista using VC8)
>>
>>     setlocale(LC_XXXX, "Chinese (Traditional)_MCO.950");
>> works and
>>     setlocale(LC_XXXX, NULL);
>> returns
>>     Chinese (Traditional)_Macao S.A.R..950

but setlocale(LC_XXXX, "Chinese (Traditional)_Macao S.A.R..950");
fails.

I see another issue for the behavior.

For example, the following code in src/backend/utis/adt/pg_locale.c
won't work as expected in case the current locale is Hong Kong, Macao or
UAE because the last setlocale() in the code would fail. I can
find such save & restore operations of locales in several places.

bool
check_locale(int category, const char *value)
{char       *save;bool        ret;
save = setlocale(category, NULL);if (!save)    return false;            /* won't happen, we hope */
/* save may be pointing at a modifiable scratch variable, see above */save = pstrdup(save);
/* set the locale with setlocale, to see if it accepts it. */ret = (setlocale(category, value) != NULL);
setlocale(category, save);    /* assume this won't fail */pfree(save);
return ret;
}

regards,
Hiroshi Inoue


Hiroshi Inoue <inoue@tpf.co.jp> writes:
> I see another issue for the behavior.

> For example, the following code in src/backend/utis/adt/pg_locale.c
> won't work as expected in case the current locale is Hong Kong, Macao or
> UAE because the last setlocale() in the code would fail. I can
> find such save & restore operations of locales in several places.

Well, if Windows' setlocale is too brain-dead to accept its own output,
there's nothing to be done about it except to file a bug with Microsoft.
There isn't anything in the POSIX API that would let us avoid using
setlocale with a previous result value to restore the previous setting.
        regards, tom lane


(2011/04/20 22:08), Tom Lane wrote:
> Hiroshi Inoue<inoue@tpf.co.jp>  writes:
>> In my environment (Windows Vista using VC8)
> 
>>    setlocale(LC_XXXX, "Chinese (Traditional)_MCO.950");
>> works and
>>    setlocale(LC_XXXX, NULL);
>> returns
>>    Chinese (Traditional)_Macao S.A.R..950
>> but
>>    setlocale(LC_XXXX, "Chinese (Traditional)_Macao S.A.R..950");
>> fails.
> 
> Interesting.  This example suggests that maybe Windows' setlocale can
> only cope with dot as introducing a codepage number.

ACP or OCP as well as codepage number seem to be allowed.

> Are there any
> cases where a dot works as part of the basic locale name?

Unfortunately I don't know any explanation how dots are allowed.

regards,
Hiroshi Inoue




(2011/04/20 15:30), Heikki Linnakangas wrote:
> On 20.04.2011 06:48, Hiroshi Inoue wrote:
>> I can find no concrete reference to problems about locale
>>    names containing dots. Is the following an example?
> 
> Yes.
> 
>> In my environment (Windows Vista using VC8)
>>
>>     setlocale(LC_XXXX, "Chinese (Traditional)_MCO.950");
>> works and
>>     setlocale(LC_XXXX, NULL);
>> returns
>>     Chinese (Traditional)_Macao S.A.R..950
> 
> Interesting. According to Microsoft's documentation, the codes are
> three-letter country codes specified by ISO-3166
> (http://msdn.microsoft.com/en-us/library/cdax410z%28v=VS.100%29.aspx).
> However, according to Wikipedia, MCO stands for Monaco, not Macau
> (https://secure.wikimedia.org/wikipedia/en/wiki/ISO_3166-1_alpha-3).

Hmm Windows locale system seems to have an inconsistency and the same
country code (MCO) corresponds to different countries.
ZHM_MCO corresponds to Chinese (Traditional)_Macao S.A.R..950 whereas
FRM_MCO corresponds to French_Principality of Monaco.

regards,
Hiroshi Inoue