Thread: Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.

Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.

From
Magnus Hagander
Date:
Hiroshi Inoue wrote:
> Hi Magnus and all,
> 
> Magnus Hagander wrote:
>> Log Message:
>> -----------
>> Explicitly bind gettext() to the  UTF8 locale when in use.
>> This is required on Windows due to the special locale
>> handling for UTF8 that doesn't change the full environment.
> 
> Thanks to this change UTF-8 case was solved but Japanese users
> are still unhappy with Windows databases with EUC_JP encoding.
> Shift_JIS which is a Japanese encoding under Windows doesn't
> match any server encoding and causes a crash with the use of
> gettext. So Saito-san removed ja message catalog just before
> the 8.3 release.
> 
> Attached is a simple patch to avoid the crash and enable the
> use of Japanese message catalog.
> Please apply the patch if there's no problem.

Hi!

It will clearly also need an update to the comment, but I can take care
of that.

I assume you have tested this? The comment says that it works because we
are handling UTF8 on a special way on Windows, but AFAIK we *don't*
handle EUC_JP in a special way there?

If your database is in EUC_JP, I don't see why gettext() isn't picking
it up properly in the first place..

And why do we need that on Windows only, and not on other platforms?

//Magnus



Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.

From
Hiroshi Inoue
Date:
Magnus Hagander wrote:
> Hiroshi Inoue wrote:
>> Hi Magnus and all,
>>
>> Magnus Hagander wrote:
>>> Log Message:
>>> -----------
>>> Explicitly bind gettext() to the  UTF8 locale when in use.
>>> This is required on Windows due to the special locale
>>> handling for UTF8 that doesn't change the full environment.
>> Thanks to this change UTF-8 case was solved but Japanese users
>> are still unhappy with Windows databases with EUC_JP encoding.
>> Shift_JIS which is a Japanese encoding under Windows doesn't
>> match any server encoding and causes a crash with the use of
>> gettext. So Saito-san removed ja message catalog just before
>> the 8.3 release.
>>
>> Attached is a simple patch to avoid the crash and enable the
>> use of Japanese message catalog.
>> Please apply the patch if there's no problem.
> 
> Hi!
> 
> It will clearly also need an update to the comment, but I can take care
> of that.
> 
> I assume you have tested this?

Though I myself didn't test it, Saito-san tested it.

> The comment says that it works because we
> are handling UTF8 on a special way on Windows,

ISTM UTF-8 isn't a special case.
In fact the comment also mentions the following.
* In future we might want to call bind_textdomain_codeset    * unconditionally, but that....

> If your database is in EUC_JP, I don't see why gettext() isn't picking
> it up properly in the first place..

In Japan 2 encodings (EUC_JP and Shift_JIS) are often used.
EUC_JP is mainly used on *nix and on Windows Shift_JIS is
used. We use EUC_JP not Shift_JIS as the server encoding

> And why do we need that on Windows only, and not on other platforms?

because Shift_JIS isn't allowed as a server encoding. So
the Japanese Windows native message encoding Shift_JIS nevermatches the server encoding EUC_JP and a conversion
between
Shitt_jis and EUC_JP is necessarily needed.

regards,
Hiroshi Inoue



Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.

From
Magnus Hagander
Date:
Hiroshi Inoue wrote:
> Magnus Hagander wrote:
>> Hiroshi Inoue wrote:
>>> Hi Magnus and all,
>>>
>>> Magnus Hagander wrote:
>>>> Log Message:
>>>> -----------
>>>> Explicitly bind gettext() to the  UTF8 locale when in use.
>>>> This is required on Windows due to the special locale
>>>> handling for UTF8 that doesn't change the full environment.
>>> Thanks to this change UTF-8 case was solved but Japanese users
>>> are still unhappy with Windows databases with EUC_JP encoding.
>>> Shift_JIS which is a Japanese encoding under Windows doesn't
>>> match any server encoding and causes a crash with the use of
>>> gettext. So Saito-san removed ja message catalog just before
>>> the 8.3 release.
>>>
>>> Attached is a simple patch to avoid the crash and enable the
>>> use of Japanese message catalog.
>>> Please apply the patch if there's no problem.
>> Hi!
>>
>> It will clearly also need an update to the comment, but I can take care
>> of that.
>>
>> I assume you have tested this?
> 
> Though I myself didn't test it, Saito-san tested it.

Ok, good.


>> The comment says that it works because we
>> are handling UTF8 on a special way on Windows,
> 
> ISTM UTF-8 isn't a special case.
> In fact the comment also mentions the following.
> 
>     * In future we might want to call bind_textdomain_codeset
>      * unconditionally, but that....

I think that's partially unrelated. UTF8 is special in that the
environment that the backend runs in is different from SERVER_ENCODING
in this case. For other encodings, we have setlocale():d to the same
encoding.

>> If your database is in EUC_JP, I don't see why gettext() isn't picking
>> it up properly in the first place..
> 
> In Japan 2 encodings (EUC_JP and Shift_JIS) are often used.
> EUC_JP is mainly used on *nix and on Windows Shift_JIS is
> used. We use EUC_JP not Shift_JIS as the server encoding
> 
>> And why do we need that on Windows only, and not on other platforms?
> 
> because Shift_JIS isn't allowed as a server encoding. So
> the Japanese Windows native message encoding Shift_JIS never
>  matches the server encoding EUC_JP and a conversion between
> Shitt_jis and EUC_JP is necessarily needed.

Ah, so we're basically hardcoding that information? The system will go
up in SJIS, but since we can't deal with it, we switch it to EUC_JP?

Ok, I think I understand. I've made some minor stylistic changes (we
don't normally use if (NULL != <whatever>) in the pg sources), and will
apply with those.

This is for HEAD only, correct? Or is it something we should backpatch?

//Magnus


Magnus Hagander <magnus@hagander.net> writes:
> Hiroshi Inoue wrote:
>> because Shift_JIS isn't allowed as a server encoding. So
>> the Japanese Windows native message encoding Shift_JIS never
>> matches the server encoding EUC_JP and a conversion between
>> Shitt_jis and EUC_JP is necessarily needed.

> Ah, so we're basically hardcoding that information? The system will go
> up in SJIS, but since we can't deal with it, we switch it to EUC_JP?

I'm not following this either.  If the patch is really necessary then it
seems it must be working around a bug in the Windows version of gettext,
ie failure to distinguish CP932 from CP20932.  Is that correct?

> Ok, I think I understand. I've made some minor stylistic changes (we
> don't normally use if (NULL != <whatever>) in the pg sources), and will
> apply with those.

It definitely needs a comment explaining why this is needed.
        regards, tom lane


Tom Lane wrote:
> Magnus Hagander <magnus@hagander.net> writes:
>> Hiroshi Inoue wrote:
>>> because Shift_JIS isn't allowed as a server encoding. So
>>> the Japanese Windows native message encoding Shift_JIS never
>>> matches the server encoding EUC_JP and a conversion between
>>> Shitt_jis and EUC_JP is necessarily needed.
> 
>> Ah, so we're basically hardcoding that information? The system will go
>> up in SJIS, but since we can't deal with it, we switch it to EUC_JP?
> 
> I'm not following this either.  If the patch is really necessary then it
> seems it must be working around a bug in the Windows version of gettext,
> ie failure to distinguish CP932 from CP20932.  Is that correct?

I'm afraid I don't understand what you mean exactly.
AFAIK the output encoding of Windows gettext is detemined by the
ANSI system code page which is usualy CP932(Shift_JIS) in Japan andunrelated to the locale settings.
In addition CP20932 is rarely used in Japan IIRC. I've never used it
and don't know what it is correctly (maybe a kind of EUC_JP).

regards,
Hiroshi Inoue



Hiroshi Inoue <inoue@tpf.co.jp> writes:
> Tom Lane wrote:
>> I'm not following this either.  If the patch is really necessary then it
>> seems it must be working around a bug in the Windows version of gettext,
>> ie failure to distinguish CP932 from CP20932.  Is that correct?

> I'm afraid I don't understand what you mean exactly.
> AFAIK the output encoding of Windows gettext is detemined by the
> ANSI system code page which is usualy CP932(Shift_JIS) in Japan and
>  unrelated to the locale settings.

If that's true then this code is presently broken for *every* locale
under Windows, not only Japanese.

To my mind the really correct thing to be doing here would be to call
bind_textdomain_codeset in all cases, rather than trusting gettext to
guess correctly about which encoding we want.  As the comment notes,
we have not attempted that because the codeset names aren't well
standardized.  But it seems to me that we could certainly find out what
codeset names are used on Windows, and apply bind_textdomain_codeset
all the time on Windows.  That would make a lot more sense than ad-hoc
treatment of UTF-8 and EUC-JP if you ask me ...
        regards, tom lane


Tom Lane wrote:
> Hiroshi Inoue <inoue@tpf.co.jp> writes:
>> Tom Lane wrote:
>>> I'm not following this either.  If the patch is really necessary then it
>>> seems it must be working around a bug in the Windows version of gettext,
>>> ie failure to distinguish CP932 from CP20932.  Is that correct?
> 
>> I'm afraid I don't understand what you mean exactly.
>> AFAIK the output encoding of Windows gettext is detemined by the
>> ANSI system code page which is usualy CP932(Shift_JIS) in Japan and
>>  unrelated to the locale settings.
> 
> If that's true then this code is presently broken for *every* locale
> under Windows, not only Japanese.

Maybe there are a few languages/countires where 2 encodings arewidely used.

> To my mind the really correct thing to be doing here would be to call
> bind_textdomain_codeset in all cases, rather than trusting gettext to
> guess correctly about which encoding we want. As the comment notes,
> we have not attempted that because the codeset names aren't well
> standardized.  But it seems to me that we could certainly find out what
> codeset names are used on Windows, and apply bind_textdomain_codeset
> all the time on Windows.  That would make a lot more sense than ad-hoc
> treatment of UTF-8 and EUC-JP if you ask me ...

I fundamentally agree with you.
What we hope is to enable the use of Japanese message catalog which
we gave up in 8.3 Windows-version release.

regards,
Hiroshi Inoue



Hiroshi Inoue <inoue@tpf.co.jp> writes:
> Tom Lane wrote:
>> If that's true then this code is presently broken for *every* locale
>> under Windows, not only Japanese.

> Maybe there are a few languages/countires where 2 encodings are
>  widely used.

UTF8 vs Latin-N?  In any case I think the problem is that gettext is
looking at a setting that is not what we are looking at.  Particularly
with the 8.4 changes to allow per-database locale settings, this has
got to be fixed in a bulletproof way.
        regards, tom lane


Re: Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.

From
Magnus Hagander
Date:
On 25 nov 2008, at 05.00, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Hiroshi Inoue <inoue@tpf.co.jp> writes:
>> Tom Lane wrote:
>>> If that's true then this code is presently broken for *every* locale
>>> under Windows, not only Japanese.
>
>> Maybe there are a few languages/countires where 2 encodings are
>> widely used.
>
> UTF8 vs Latin-N?

We already special-cases utf8...

I think the thing us that as long as the encodings are compatible  
(latin1 with different names for example) it worked  fine.

>  In any case I think the problem is that gettext is
> looking at a setting that is not what we are looking at.  Particularly
> with the 8.4 changes to allow per-database locale settings, this has
> got to be fixed in a bulletproof way.
>

Agreed.

/Magnus



Magnus Hagander wrote:
> On 25 nov 2008, at 05.00, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
>> Hiroshi Inoue <inoue@tpf.co.jp> writes:
>>> Tom Lane wrote:
>>>> If that's true then this code is presently broken for *every* locale
>>>> under Windows, not only Japanese.
>>
>>> Maybe there are a few languages/countires where 2 encodings are
>>> widely used.
>>
>> UTF8 vs Latin-N?
>
> We already special-cases utf8...
>
> I think the thing us that as long as the encodings are compatible
> (latin1 with different names for example) it worked  fine.
>
>>  In any case I think the problem is that gettext is
>> looking at a setting that is not what we are looking at.  Particularly
>> with the 8.4 changes to allow per-database locale settings, this has
>> got to be fixed in a bulletproof way.

Attached is a new patch to apply bind_textdomain_codeset() to most
server encodings. Exceptions are PG_SQL_ASCII, PG_MULE_INTERNAL
and PG_EUC_JIS_2004. "EUC-JP" may be OK for EUC_JIS_2004.

Unfortunately it's hard for Saito-san and me to check encodings
other than EUC-JP.

regards,
Hiroshi Inoue
*** mbutils.c.orig    Sun Nov 23 08:42:57 2008
--- mbutils.c    Wed Nov 26 12:17:12 2008
***************
*** 822,830 ****
--- 822,870 ----
      return clen;
  }

+ #ifdef    WIN32
+ static const struct codeset_map {
+     int    encoding;
+     const char *codeset;
+ } codeset_map_array[] = {
+         {PG_UTF8, "UTF-8"},
+         {PG_LATIN1, "LATIN1"},
+         {PG_LATIN2, "LATIN2"},
+         {PG_LATIN3, "LATIN3"},
+         {PG_LATIN4, "LATIN4"},
+         {PG_ISO_8859_5, "ISO-8859-5"},
+         {PG_ISO_8859_6, "ISO_8859-6"},
+         {PG_ISO_8859_7, "ISO-8859-7"},
+         {PG_ISO_8859_8, "ISO-8859-8"},
+         {PG_LATIN5, "LATIN5"},
+         {PG_LATIN6, "LATIN6"},
+         {PG_LATIN7, "LATIN7"},
+         {PG_LATIN8, "LATIN8"},
+         {PG_LATIN9, "LATIN-9"},
+         {PG_LATIN10, "LATIN10"},
+         {PG_KOI8R, "KOI8-R"},
+         {PG_WIN1250, "CP1250"},
+         {PG_WIN1251, "CP1251"},
+         {PG_WIN1252, "CP1252"},
+         {PG_WIN1253, "CP1253"},
+         {PG_WIN1254, "CP1254"},
+         {PG_WIN1255, "CP1255"},
+         {PG_WIN1256, "CP1256"},
+         {PG_WIN1257, "CP1257"},
+         {PG_WIN1258, "CP1258"},
+         {PG_WIN866, "CP866"},
+         {PG_WIN874, "CP874"},
+         {PG_EUC_CN, "EUC-CN"},
+         {PG_EUC_JP, "EUC-JP"},
+         {PG_EUC_KR, "EUC-KR"},
+         {PG_EUC_TW, "EUC-TW"}};
+ #endif /* WIN32 */
+
  void
  SetDatabaseEncoding(int encoding)
  {
+     const char *target_codeset = NULL;
+
      if (!PG_VALID_BE_ENCODING(encoding))
          elog(ERROR, "invalid database encoding: %d", encoding);

***************
*** 846,852 ****
       */
  #ifdef ENABLE_NLS
      if (encoding == PG_UTF8)
!         if (bind_textdomain_codeset("postgres", "UTF-8") == NULL)
              elog(LOG, "bind_textdomain_codeset failed");
  #endif
  }
--- 886,907 ----
       */
  #ifdef ENABLE_NLS
      if (encoding == PG_UTF8)
!         target_codeset = "UTF-8";
! #ifdef    WIN32
!     else
!     {
!         int    i;
!
!         for (i = 0; i < sizeof(codeset_map_array) / sizeof(struct codeset_map); i++)
!             if (codeset_map_array[i].encoding == encoding)
!             {
!                 target_codeset = codeset_map_array[i].codeset;
!                 break;
!             }
!     }
! #endif /* WIN32 */
!     if (target_codeset != NULL)
!         if (bind_textdomain_codeset("postgres", target_codeset) == NULL)
              elog(LOG, "bind_textdomain_codeset failed");
  #endif
  }

Re: Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.

From
Magnus Hagander
Date:
Hiroshi Inoue wrote:
>> I think the thing us that as long as the encodings are compatible
>> (latin1 with different names for example) it worked  fine.
>>
>>>  In any case I think the problem is that gettext is
>>> looking at a setting that is not what we are looking at.  Particularly
>>> with the 8.4 changes to allow per-database locale settings, this has
>>> got to be fixed in a bulletproof way.
> 
> Attached is a new patch to apply bind_textdomain_codeset() to most
> server encodings. Exceptions are PG_SQL_ASCII, PG_MULE_INTERNAL
> and PG_EUC_JIS_2004. "EUC-JP" may be OK for EUC_JIS_2004.
> 
> Unfortunately it's hard for Saito-san and me to check encodings
> other than EUC-JP.


In principle this looks good, I think, but I'm a bit worried around the
lack of testing. I can do some testing under LATIN1 which is what we use
in Sweden (just need to get gettext working *at all* in my dev
environment again - I've somehow managed to break it), and perhaps we
can find someone to do a test in an eastern-european locale to get some
more datapoints?

Can you outline the steps one needs to go through to show the problem,
so we can confirm it's fixed in these locales?

//Magnus



Magnus Hagander wrote:
> Hiroshi Inoue wrote:
>>> I think the thing us that as long as the encodings are compatible
>>> (latin1 with different names for example) it worked  fine.
>>>
>>>>  In any case I think the problem is that gettext is
>>>> looking at a setting that is not what we are looking at.  Particularly
>>>> with the 8.4 changes to allow per-database locale settings, this has
>>>> got to be fixed in a bulletproof way.
>> Attached is a new patch to apply bind_textdomain_codeset() to most
>> server encodings. Exceptions are PG_SQL_ASCII, PG_MULE_INTERNAL
>> and PG_EUC_JIS_2004. "EUC-JP" may be OK for EUC_JIS_2004.
>>
>> Unfortunately it's hard for Saito-san and me to check encodings
>> other than EUC-JP.
>
> In principle this looks good, I think,  but I'm a bit worried around the
> lack of testing.

Thanks and I agree with you.

 > I can do some testing under LATIN1 which is what we use
> in Sweden (just need to get gettext working *at all* in my dev
> environment again - I've somehow managed to break it), and perhaps we
> can find someone to do a test in an eastern-european locale to get some
> more datapoints?
>
> Can you outline the steps one needs to go through to show the problem,
> so we can confirm it's fixed in these locales?

Saito-san and I have been working on another related problem about
changing LC_MESSAGES locale properly under Windows and would be able
to provide a patch in a few days. It seems preferable for us to apply
the patch also so as to change the message catalog easily.

Attached is an example in which LC_MESSAGES is cht_twn(zh_TW) and
the server encoding is EUC-TW. You can see it as a UTF-8 text
because the client_encoding is set to UTF-8 first.

BTW you can see another problem at line 4 in the text.
At the point the LC_MESSAGES is still japanese and postgres fails
to convert a Japanese error message to EUC_TW encoding. There's
no wonder but it doesn't seem preferable.

regards,
Hiroshi Inoue
set client_encoding to utf_8;
SET
1;
psql:cmd/euctw.sql:2: ERROR:  character 0xb9e6 of encoding "EUC_TW" has no equivalent in "UTF8"
select current_database();
 current_database
------------------
 euctw
(1 �s)

show server_encoding;
 server_encoding
-----------------
 EUC_TW
(1 �s)

show lc_messages;
    lc_messages
--------------------
 Japanese_Japan.932
(1 �s)

set lc_messages to cht;
SET
select a;
psql:cmd/euctw.sql:7: 錯誤:  欄位"a"不存在
LINE 1: select a;
               ^
1;
psql:cmd/euctw.sql:8: 錯誤:  在"語法錯誤"附近發生 1
LINE 1: 1;
        ^
select * from a;
psql:cmd/euctw.sql:9: 錯誤:  relation "a"不存在
LINE 1: select * from a;
                      ^

Hiroshi, is this patch still needed?

---------------------------------------------------------------------------

Hiroshi Inoue wrote:
> Magnus Hagander wrote:
> > On 25 nov 2008, at 05.00, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > 
> >> Hiroshi Inoue <inoue@tpf.co.jp> writes:
> >>> Tom Lane wrote:
> >>>> If that's true then this code is presently broken for *every* locale
> >>>> under Windows, not only Japanese.
> >>
> >>> Maybe there are a few languages/countires where 2 encodings are
> >>> widely used.
> >>
> >> UTF8 vs Latin-N?
> > 
> > We already special-cases utf8...
> > 
> > I think the thing us that as long as the encodings are compatible 
> > (latin1 with different names for example) it worked  fine.
> > 
> >>  In any case I think the problem is that gettext is
> >> looking at a setting that is not what we are looking at.  Particularly
> >> with the 8.4 changes to allow per-database locale settings, this has
> >> got to be fixed in a bulletproof way.
> 
> Attached is a new patch to apply bind_textdomain_codeset() to most
> server encodings. Exceptions are PG_SQL_ASCII, PG_MULE_INTERNAL
> and PG_EUC_JIS_2004. "EUC-JP" may be OK for EUC_JIS_2004.
> 
> Unfortunately it's hard for Saito-san and me to check encodings
> other than EUC-JP.
> 
> regards,
> Hiroshi Inoue

> *** mbutils.c.orig    Sun Nov 23 08:42:57 2008
> --- mbutils.c    Wed Nov 26 12:17:12 2008
> ***************
> *** 822,830 ****
> --- 822,870 ----
>       return clen;
>   }
>   
> + #ifdef    WIN32
> + static const struct codeset_map {
> +     int    encoding;
> +     const char *codeset;
> + } codeset_map_array[] = {
> +         {PG_UTF8, "UTF-8"},
> +         {PG_LATIN1, "LATIN1"}, 
> +         {PG_LATIN2, "LATIN2"},
> +         {PG_LATIN3, "LATIN3"},
> +         {PG_LATIN4, "LATIN4"},
> +         {PG_ISO_8859_5, "ISO-8859-5"},
> +         {PG_ISO_8859_6, "ISO_8859-6"},
> +         {PG_ISO_8859_7, "ISO-8859-7"},
> +         {PG_ISO_8859_8, "ISO-8859-8"},
> +         {PG_LATIN5, "LATIN5"},
> +         {PG_LATIN6, "LATIN6"},
> +         {PG_LATIN7, "LATIN7"},
> +         {PG_LATIN8, "LATIN8"},
> +         {PG_LATIN9, "LATIN-9"},
> +         {PG_LATIN10, "LATIN10"},
> +         {PG_KOI8R, "KOI8-R"},
> +         {PG_WIN1250, "CP1250"},
> +         {PG_WIN1251, "CP1251"},
> +         {PG_WIN1252, "CP1252"},
> +         {PG_WIN1253, "CP1253"},
> +         {PG_WIN1254, "CP1254"},
> +         {PG_WIN1255, "CP1255"},
> +         {PG_WIN1256, "CP1256"},
> +         {PG_WIN1257, "CP1257"},
> +         {PG_WIN1258, "CP1258"},
> +         {PG_WIN866, "CP866"},
> +         {PG_WIN874, "CP874"},
> +         {PG_EUC_CN, "EUC-CN"},
> +         {PG_EUC_JP, "EUC-JP"},
> +         {PG_EUC_KR, "EUC-KR"},
> +         {PG_EUC_TW, "EUC-TW"}};
> + #endif /* WIN32 */
> + 
>   void
>   SetDatabaseEncoding(int encoding)
>   {
> +     const char *target_codeset = NULL;
> + 
>       if (!PG_VALID_BE_ENCODING(encoding))
>           elog(ERROR, "invalid database encoding: %d", encoding);
>   
> ***************
> *** 846,852 ****
>        */
>   #ifdef ENABLE_NLS
>       if (encoding == PG_UTF8)
> !         if (bind_textdomain_codeset("postgres", "UTF-8") == NULL)
>               elog(LOG, "bind_textdomain_codeset failed");
>   #endif
>   }
> --- 886,907 ----
>        */
>   #ifdef ENABLE_NLS
>       if (encoding == PG_UTF8)
> !         target_codeset = "UTF-8";
> ! #ifdef    WIN32
> !     else
> !     {
> !         int    i;
> ! 
> !         for (i = 0; i < sizeof(codeset_map_array) / sizeof(struct codeset_map); i++)
> !             if (codeset_map_array[i].encoding == encoding)
> !             {
> !                 target_codeset = codeset_map_array[i].codeset;
> !                 break;
> !             }
> !     }
> ! #endif /* WIN32 */
> !     if (target_codeset != NULL)
> !         if (bind_textdomain_codeset("postgres", target_codeset) == NULL)
>               elog(LOG, "bind_textdomain_codeset failed");
>   #endif
>   }

> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Bruce Momjian <bruce@momjian.us> writes:
> Hiroshi, is this patch still needed?

This patch is for a problem that's entirely separate from the LC_TIME
issue, if that's what you were wondering.
        regards, tom lane


Bruce Momjian wrote:
> Hiroshi, is this patch still needed?

Yes though it should be slightly changed now.
*set lc_messages does not work* issue isn't directly related to this issue.

regards,
Hiroshi Inoue


Hi.

My swift attack test was the MinGW environment.
But, Inoue-san suggestion.

1. MinGW+gcc build
HIROSHI=# set LC_TIME=Ja;
SET
HIROSHI=# select to_char(now(),'TMDay');to_char
---------日曜日
(1 行)
HIROSHI=# set LC_TIME='Japan';
SET
HIROSHI=# select to_char(Now(),'TMDay');to_char
---------日曜日
(1 行)
HIROSHI=# set LC_TIME='Japanese';
SET
HIROSHI=# select to_char(Now(),'TMDay');to_char
---------日曜日
(1 行)

However,  A setup of 'Ja' was strange.?_?
http://msdn.microsoft.com/en-us/library/aa246450(VS.60).aspx

2. MSVC build
HIROSHI=# set LC_TIME='Ja';
ERROR:  invalid value for parameter "lc_time": "Ja"
STATEMENT:  set LC_TIME='Ja';
ERROR:  invalid value for parameter "lc_time": "Ja"
HIROSHI=# set LC_TIME='Japan';
ERROR:  invalid value for parameter "lc_time": "Japan"
STATEMENT:  set LC_TIME='Japan';
ERROR:  invalid value for parameter "lc_time": "Japan"
HIROSHI=# set LC_TIME=Japanese;
SET
HIROSHI=# select to_char(Now(),'TMDay');to_char
---------日曜日
(1 行)

Umm, Re-investigation is required for this. :-(
However, If reasonable clear, it will be good for a document at suggestion.

Regards,
Hiroshi Saito 



Hiroshi Inoue wrote:
> Bruce Momjian wrote:
>> Hiroshi, is this patch still needed?
>
> Yes though it should be slightly changed now.
> *set lc_messages does not work* issue isn't directly related to this
>  issue.

Though I'm not sure how we can test it, I can provide test results
like the attached one.

The attached is a test result in case LC_MESSAGES=fr causing
the following 3 errors in databases with various encodings.

   select a; ==> column "a" does not exist
   1; ==> syntax error at or near "1"
   select * from a; ==> relation "a" does not exist

Comments?
Please note the encoding of the attached file is utf-8.

regards,
Hiroshi Inoue
SET
 current_database
------------------
 utf8
(1 row)

 server_encoding
-----------------
 UTF8
(1 row)

SET
 lc_messages
-------------
 fr
(1 row)

ERREUR:  la colonne « a » n'existe pas
LINE 1: select a;
               ^
ERREUR:  erreur de syntaxe sur ou près de « 1 »
LINE 1: 1;
        ^
ERREUR:  la relation « a » n'existe pas
LINE 1: select * from a;
                      ^
SET
 current_database
------------------
 eucjp
(1 row)

 server_encoding
-----------------
 EUC_JP
(1 row)

SET
 lc_messages
-------------
 fr
(1 row)

ERREUR:  la colonne << a >> n'existe pas
LINE 1: select a;
               ^
ERREUR:  erreur de syntaxe sur ou près de << 1 >>
LINE 1: 1;
        ^
ERREUR:  la relation << a >> n'existe pas
LINE 1: select * from a;
                      ^
SET
 current_database
------------------
 euc-jis-2004
(1 row)

 server_encoding
-----------------
 EUC_JIS_2004
(1 row)

SET
 lc_messages
-------------
 fr
(1 row)

ERREUR:  la colonne << a >> n'existe pas
LINE 1: select a;
               ^
ERROR:  character 0x8fabb2 of encoding "EUC_JIS_2004" has no equivalent in "UTF8"
ERREUR:  la relation << a >> n'existe pas
LINE 1: select * from a;
                      ^
SET
 current_database
------------------
 euccn
(1 row)

 server_encoding
-----------------
 EUC_CN
(1 row)

SET
 lc_messages
-------------
 fr
(1 row)

ERREUR:  la colonne << a >> n'existe pas
LINE 1: select a;
               ^
ERREUR:  erreur de syntaxe sur ou près de << 1 >>
LINE 1: 1;
        ^
ERREUR:  la relation << a >> n'existe pas
LINE 1: select * from a;
                      ^
SET
 current_database
------------------
 euctw
(1 row)

 server_encoding
-----------------
 EUC_TW
(1 row)

SET
 lc_messages
-------------
 fr
(1 row)

ERREUR:  la colonne << a >> n'existe pas
LINE 1: select a;
               ^
ERREUR:  erreur de syntaxe sur ou pr`es de << 1 >>
LINE 1: 1;
        ^
ERREUR:  la relation << a >> n'existe pas
LINE 1: select * from a;
                      ^
SET
 current_database
------------------
 euckr
(1 row)

 server_encoding
-----------------
 EUC_KR
(1 row)

SET
 lc_messages
-------------
 fr
(1 row)

ERREUR:  la colonne << a >> n'existe pas
LINE 1: select a;
               ^
ERREUR:  erreur de syntaxe sur ou pr`es de << 1 >>
LINE 1: 1;
        ^
ERREUR:  la relation << a >> n'existe pas
LINE 1: select * from a;
                      ^
SET
 current_database
------------------
 latin1
(1 row)

 server_encoding
-----------------
 LATIN1
(1 row)

SET
 lc_messages
-------------
 fr
(1 row)

ERREUR:  la colonne « a » n'existe pas
LINE 1: select a;
               ^
ERREUR:  erreur de syntaxe sur ou près de « 1 »
LINE 1: 1;
        ^
ERREUR:  la relation « a » n'existe pas
LINE 1: select * from a;
                      ^
SET
 current_database
------------------
 latin2
(1 row)

 server_encoding
-----------------
 LATIN2
(1 row)

SET
 lc_messages
-------------
 fr
(1 row)

ERREUR:  la colonne << a >> n'existe pas
LINE 1: select a;
               ^
ERREUR:  erreur de syntaxe sur ou pr`es de << 1 >>
LINE 1: 1;
        ^
ERREUR:  la relation << a >> n'existe pas
LINE 1: select * from a;
                      ^
SET
 current_database
------------------
 latin3
(1 row)

 server_encoding
-----------------
 LATIN3
(1 row)

SET
 lc_messages
-------------
 fr
(1 row)

ERREUR:  la colonne << a >> n'existe pas
LINE 1: select a;
               ^
ERREUR:  erreur de syntaxe sur ou près de << 1 >>
LINE 1: 1;
        ^
ERREUR:  la relation << a >> n'existe pas
LINE 1: select * from a;
                      ^
SET
 current_database
------------------
 latin4
(1 row)

 server_encoding
-----------------
 LATIN4
(1 row)

SET
 lc_messages
-------------
 fr
(1 row)

ERREUR:  la colonne << a >> n'existe pas
LINE 1: select a;
               ^
ERREUR:  erreur de syntaxe sur ou pr`es de << 1 >>
LINE 1: 1;
        ^
ERREUR:  la relation << a >> n'existe pas
LINE 1: select * from a;
                      ^
SET
 current_database
------------------
 latin5
(1 row)

 server_encoding
-----------------
 LATIN5
(1 row)

SET
 lc_messages
-------------
 fr
(1 row)

ERREUR:  la colonne « a » n'existe pas
LINE 1: select a;
               ^
ERREUR:  erreur de syntaxe sur ou près de « 1 »
LINE 1: 1;
        ^
ERREUR:  la relation « a » n'existe pas
LINE 1: select * from a;
                      ^
SET
 current_database
------------------
 latin6
(1 row)

 server_encoding
-----------------
 LATIN6
(1 row)

SET
 lc_messages
-------------
 fr
(1 row)

ERREUR:  la colonne << a >> n'existe pas
LINE 1: select a;
               ^
ERREUR:  erreur de syntaxe sur ou pr`es de << 1 >>
LINE 1: 1;
        ^
ERREUR:  la relation << a >> n'existe pas
LINE 1: select * from a;
                      ^
SET
 current_database
------------------
 latin7
(1 row)

 server_encoding
-----------------
 LATIN7
(1 row)

SET
 lc_messages
-------------
 fr
(1 row)

ERREUR:  la colonne « a » n'existe pas
LINE 1: select a;
               ^
ERREUR:  erreur de syntaxe sur ou pr`es de « 1 »
LINE 1: 1;
        ^
ERREUR:  la relation « a » n'existe pas
LINE 1: select * from a;
                      ^
SET
 current_database
------------------
 latin8
(1 row)

 server_encoding
-----------------
 LATIN8
(1 row)

SET
 lc_messages
-------------
 fr
(1 row)

ERREUR:  la colonne << a >> n'existe pas
LINE 1: select a;
               ^
ERREUR:  erreur de syntaxe sur ou près de << 1 >>
LINE 1: 1;
        ^
ERREUR:  la relation << a >> n'existe pas
LINE 1: select * from a;
                      ^
SET
 current_database
------------------
 latin9
(1 row)

 server_encoding
-----------------
 LATIN9
(1 row)

SET
 lc_messages
-------------
 fr
(1 row)

ERREUR:  la colonne « a » n'existe pas
LINE 1: select a;
               ^
ERREUR:  erreur de syntaxe sur ou près de « 1 »
LINE 1: 1;
        ^
ERREUR:  la relation « a » n'existe pas
LINE 1: select * from a;
                      ^
SET
 current_database
------------------
 latin10
(1 row)

 server_encoding
-----------------
 LATIN10
(1 row)

SET
 lc_messages
-------------
 fr
(1 row)

ERREUR:  la colonne « a » n'existe pas
LINE 1: select a;
               ^
ERREUR:  erreur de syntaxe sur ou près de « 1 »
LINE 1: 1;
        ^
ERREUR:  la relation « a » n'existe pas
LINE 1: select * from a;
                      ^
SET
 current_database
------------------
 iso-8859-5
(1 row)

 server_encoding
-----------------
 ISO_8859_5
(1 row)

SET
 lc_messages
-------------
 fr
(1 row)

ERREUR:  la colonne << a >> n'existe pas
LINE 1: select a;
               ^
ERREUR:  erreur de syntaxe sur ou pr`es de << 1 >>
LINE 1: 1;
        ^
ERREUR:  la relation << a >> n'existe pas
LINE 1: select * from a;
                      ^
SET
 current_database
------------------
 iso-8859-6
(1 row)

 server_encoding
-----------------
 ISO_8859_6
(1 row)

SET
 lc_messages
-------------
 fr
(1 row)

ERREUR:  la colonne << a >> n'existe pas
LINE 1: select a;
               ^
ERREUR:  erreur de syntaxe sur ou pr`es de << 1 >>
LINE 1: 1;
        ^
ERREUR:  la relation << a >> n'existe pas
LINE 1: select * from a;
                      ^
SET
 current_database
------------------
 iso-8859-7
(1 row)

 server_encoding
-----------------
 ISO_8859_7
(1 row)

SET
 lc_messages
-------------
 fr
(1 row)

ERREUR:  la colonne « a » n'existe pas
LINE 1: select a;
               ^
ERREUR:  erreur de syntaxe sur ou pr`es de « 1 »
LINE 1: 1;
        ^
ERREUR:  la relation « a » n'existe pas
LINE 1: select * from a;
                      ^
SET
 current_database
------------------
 iso-8859-8
(1 row)

 server_encoding
-----------------
 ISO_8859_8
(1 row)

SET
 lc_messages
-------------
 fr
(1 row)

ERREUR:  la colonne « a » n'existe pas
LINE 1: select a;
               ^
ERREUR:  erreur de syntaxe sur ou pr`es de « 1 »
LINE 1: 1;
        ^
ERREUR:  la relation « a » n'existe pas
LINE 1: select * from a;
                      ^
SET
 current_database
------------------
 koi8-r
(1 row)

 server_encoding
-----------------
 KOI8
(1 row)

SET
 lc_messages
-------------
 fr
(1 row)

ERREUR:  la colonne << a >> n'existe pas
LINE 1: select a;
               ^
ERREUR:  erreur de syntaxe sur ou pr`es de << 1 >>
LINE 1: 1;
        ^
ERREUR:  la relation << a >> n'existe pas
LINE 1: select * from a;
                      ^
SET
 current_database
------------------
 win1250
(1 row)

 server_encoding
-----------------
 WIN1250
(1 row)

SET
 lc_messages
-------------
 fr
(1 row)

ERREUR:  la colonne « a » n'existe pas
LINE 1: select a;
               ^
ERREUR:  erreur de syntaxe sur ou pr`es de « 1 »
LINE 1: 1;
        ^
ERREUR:  la relation « a » n'existe pas
LINE 1: select * from a;
                      ^
SET
 current_database
------------------
 win1251
(1 row)

 server_encoding
-----------------
 WIN1251
(1 row)

SET
 lc_messages
-------------
 fr
(1 row)

ERREUR:  la colonne « a » n'existe pas
LINE 1: select a;
               ^
ERREUR:  erreur de syntaxe sur ou pr`es de « 1 »
LINE 1: 1;
        ^
ERREUR:  la relation « a » n'existe pas
LINE 1: select * from a;
                      ^
SET
 current_database
------------------
 win1252
(1 row)

 server_encoding
-----------------
 WIN1252
(1 row)

SET
 lc_messages
-------------
 fr
(1 row)

ERREUR:  la colonne « a » n'existe pas
LINE 1: select a;
               ^
ERREUR:  erreur de syntaxe sur ou près de « 1 »
LINE 1: 1;
        ^
ERREUR:  la relation « a » n'existe pas
LINE 1: select * from a;
                      ^
SET
 current_database
------------------
 win1253
(1 row)

 server_encoding
-----------------
 WIN1253
(1 row)

SET
 lc_messages
-------------
 fr
(1 row)

ERREUR:  la colonne « a » n'existe pas
LINE 1: select a;
               ^
ERREUR:  erreur de syntaxe sur ou pr`es de « 1 »
LINE 1: 1;
        ^
ERREUR:  la relation « a » n'existe pas
LINE 1: select * from a;
                      ^
SET
 current_database
------------------
 win1254
(1 row)

 server_encoding
-----------------
 WIN1254
(1 row)

SET
 lc_messages
-------------
 fr
(1 row)

ERREUR:  la colonne « a » n'existe pas
LINE 1: select a;
               ^
ERREUR:  erreur de syntaxe sur ou près de « 1 »
LINE 1: 1;
        ^
ERREUR:  la relation « a » n'existe pas
LINE 1: select * from a;
                      ^
SET
 current_database
------------------
 win1255
(1 row)

 server_encoding
-----------------
 WIN1255
(1 row)

SET
 lc_messages
-------------
 fr
(1 row)

ERREUR:  la colonne « a » n'existe pas
LINE 1: select a;
               ^
ERREUR:  erreur de syntaxe sur ou pr`es de « 1 »
LINE 1: 1;
        ^
ERREUR:  la relation « a » n'existe pas
LINE 1: select * from a;
                      ^
SET
 current_database
------------------
 win1256
(1 row)

 server_encoding
-----------------
 WIN1256
(1 row)

SET
 lc_messages
-------------
 fr
(1 row)

ERREUR:  la colonne « a » n'existe pas
LINE 1: select a;
               ^
ERREUR:  erreur de syntaxe sur ou près de « 1 »
LINE 1: 1;
        ^
ERREUR:  la relation « a » n'existe pas
LINE 1: select * from a;
                      ^
SET
 current_database
------------------
 win1257
(1 row)

 server_encoding
-----------------
 WIN1257
(1 row)

SET
 lc_messages
-------------
 fr
(1 row)

ERREUR:  la colonne « a » n'existe pas
LINE 1: select a;
               ^
ERREUR:  erreur de syntaxe sur ou pr`es de « 1 »
LINE 1: 1;
        ^
ERREUR:  la relation « a » n'existe pas
LINE 1: select * from a;
                      ^
SET
 current_database
------------------
 win1258
(1 row)

 server_encoding
-----------------
 WIN1258
(1 row)

SET
 lc_messages
-------------
 fr
(1 row)

ERREUR:  la colonne « a » n'existe pas
LINE 1: select a;
               ^
ERREUR:  erreur de syntaxe sur ou près de « 1 »
LINE 1: 1;
        ^
ERREUR:  la relation « a » n'existe pas
LINE 1: select * from a;
                      ^
SET
 current_database
------------------
 win866
(1 row)

 server_encoding
-----------------
 WIN866
(1 row)

SET
 lc_messages
-------------
 fr
(1 row)

ERREUR:  la colonne << a >> n'existe pas
LINE 1: select a;
               ^
ERREUR:  erreur de syntaxe sur ou pr`es de << 1 >>
LINE 1: 1;
        ^
ERREUR:  la relation << a >> n'existe pas
LINE 1: select * from a;
                      ^
SET
 current_database
------------------
 win874
(1 row)

 server_encoding
-----------------
 WIN874
(1 row)

SET
 lc_messages
-------------
 fr
(1 row)

ERREUR:  la colonne << a >> n'existe pas
LINE 1: select a;
               ^
ERREUR:  erreur de syntaxe sur ou pr`es de << 1 >>
LINE 1: 1;
        ^
ERREUR:  la relation << a >> n'existe pas
LINE 1: select * from a;
                      ^

Hiroshi Inoue wrote:
> Hiroshi Inoue wrote:
>> Bruce Momjian wrote:
>>> Hiroshi, is this patch still needed?
>>
>> Yes though it should be slightly changed now.

In what way should it be changed?

//Magnus



Magnus Hagander wrote:
> Hiroshi Inoue wrote:
>> Hiroshi Inoue wrote:
>>> Bruce Momjian wrote:
>>>> Hiroshi, is this patch still needed?
>>> Yes though it should be slightly changed now.
> 
> In what way should it be changed?

One is already committed by you. [COMMITTERS] pgsql: Use the new text domain names

Another is to bind the codeset "EUC-JP" for PG_EUC_JIS_2004 server encoding.
Though EUC_JP and EUC_JIS_2004 aren't completely compatible, it seems OK in most cases.

regards,
Hiroshi Inoue



Hiroshi Inoue wrote:
> Magnus Hagander wrote:
>> Hiroshi Inoue wrote:
>>> Hiroshi Inoue wrote:
>>>> Bruce Momjian wrote:
>>>>> Hiroshi, is this patch still needed?
>>>> Yes though it should be slightly changed now.
>>
>> In what way should it be changed?
>
> One is already committed by you.
>  [COMMITTERS] pgsql: Use the new text domain names
>
> Another is to bind the codeset "EUC-JP" for
>  PG_EUC_JIS_2004 server encoding.

The attached is an updated patch.

regards,
Hiroshi Inoue

Index: mbutils.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/utils/mb/mbutils.c,v
retrieving revision 1.77
diff -c -c -r1.77 mbutils.c
*** mbutils.c    19 Jan 2009 15:34:23 -0000    1.77
--- mbutils.c    20 Jan 2009 12:54:33 -0000
***************
*** 837,842 ****
--- 837,881 ----
      return clen;
  }

+ #ifdef    WIN32
+ static const struct codeset_map {
+     int    encoding;
+     const char *codeset;
+ } codeset_map_array[] = {
+         {PG_UTF8, "UTF-8"},
+         {PG_LATIN1, "LATIN1"},
+         {PG_LATIN2, "LATIN2"},
+         {PG_LATIN3, "LATIN3"},
+         {PG_LATIN4, "LATIN4"},
+         {PG_ISO_8859_5, "ISO-8859-5"},
+         {PG_ISO_8859_6, "ISO_8859-6"},
+         {PG_ISO_8859_7, "ISO-8859-7"},
+         {PG_ISO_8859_8, "ISO-8859-8"},
+         {PG_LATIN5, "LATIN5"},
+         {PG_LATIN6, "LATIN6"},
+         {PG_LATIN7, "LATIN7"},
+         {PG_LATIN8, "LATIN8"},
+         {PG_LATIN9, "LATIN-9"},
+         {PG_LATIN10, "LATIN10"},
+         {PG_KOI8R, "KOI8-R"},
+         {PG_WIN1250, "CP1250"},
+         {PG_WIN1251, "CP1251"},
+         {PG_WIN1252, "CP1252"},
+         {PG_WIN1253, "CP1253"},
+         {PG_WIN1254, "CP1254"},
+         {PG_WIN1255, "CP1255"},
+         {PG_WIN1256, "CP1256"},
+         {PG_WIN1257, "CP1257"},
+         {PG_WIN1258, "CP1258"},
+         {PG_WIN866, "CP866"},
+         {PG_WIN874, "CP874"},
+         {PG_EUC_CN, "EUC-CN"},
+         {PG_EUC_JP, "EUC-JP"},
+         {PG_EUC_KR, "EUC-KR"},
+         {PG_EUC_TW, "EUC-TW"},
+         {PG_EUC_JIS_2004, "EUC-JP"}};
+ #endif /* WIN32 */
+
  /* mbcliplen for any single-byte encoding */
  static int
  cliplen(const char *str, int len, int limit)
***************
*** 852,857 ****
--- 891,898 ----
  void
  SetDatabaseEncoding(int encoding)
  {
+     const char *target_codeset = NULL;
+
      if (!PG_VALID_BE_ENCODING(encoding))
          elog(ERROR, "invalid database encoding: %d", encoding);

***************
*** 873,879 ****
       */
  #ifdef ENABLE_NLS
      if (encoding == PG_UTF8)
!         if (bind_textdomain_codeset(textdomain(NULL), "UTF-8") == NULL)
              elog(LOG, "bind_textdomain_codeset failed");
  #endif
  }
--- 914,935 ----
       */
  #ifdef ENABLE_NLS
      if (encoding == PG_UTF8)
!         target_codeset = "UTF-8";
! #ifdef    WIN32
!     else
!     {
!         int    i;
!
!         for (i = 0; i < sizeof(codeset_map_array) / sizeof(struct codeset_map); i++)
!             if (codeset_map_array[i].encoding == encoding)
!             {
!                 target_codeset = codeset_map_array[i].codeset;
!                 break;
!             }
!     }
! #endif /* WIN32 */
!     if (target_codeset != NULL)
!         if (bind_textdomain_codeset(textdomain(NULL), target_codeset) == NULL)
              elog(LOG, "bind_textdomain_codeset failed");
  #endif
  }

Hiroshi Inoue wrote:
> Hiroshi Inoue wrote:
>> Magnus Hagander wrote:
>>> Hiroshi Inoue wrote:
>>>> Hiroshi Inoue wrote:
>>>>> Bruce Momjian wrote:
>>>>>> Hiroshi, is this patch still needed?
>>>>> Yes though it should be slightly changed now.
>>>
>>> In what way should it be changed?
>>
>> One is already committed by you.
>>  [COMMITTERS] pgsql: Use the new text domain names
>>
>> Another is to bind the codeset "EUC-JP" for
>>  PG_EUC_JIS_2004 server encoding.
> 
> The attached is an updated patch.

Thanks.

Looking at it, the comment clearly needs updating - I'll do that.

However, one question: The comment currently says it's harmless to do
this on non-windows platforms. Does this still hold true? In that case,
this whole thing shouldn't be #ifdef:ed to WIN32 and can be simplified.
Or does the "middle part" of the comment come into play, in that the
codeset names can be different on different platforms?

Peter, can you comment on that?

If we do keep the thing win32 only, I think we should just wrap the
whole thing in #ifdef WIN32 and no longer do the codeset stuff at all on
Unix - that'll make for cleaner code.

//Magnus



Re: Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.

From
Peter Eisentraut
Date:
Magnus Hagander wrote:
> However, one question: The comment currently says it's harmless to do
> this on non-windows platforms. Does this still hold true?

Yes, the non-WIN32 code path appears to be the same, still.  But the 
ifdef WIN32 part we don't want, because that presumes something about 
the spelling of encoding names in the local iconv library.

> If we do keep the thing win32 only, I think we should just wrap the
> whole thing in #ifdef WIN32 and no longer do the codeset stuff at all on
> Unix - that'll make for cleaner code.

Yes, that would be much better.


Peter Eisentraut wrote:
> Magnus Hagander wrote:
>> However, one question: The comment currently says it's harmless to do
>> this on non-windows platforms. Does this still hold true?
>
> Yes, the non-WIN32 code path appears to be the same, still.  But the
> ifdef WIN32 part we don't want, because that presumes something about
> the spelling of encoding names in the local iconv library.
>
>> If we do keep the thing win32 only, I think we should just wrap the
>> whole thing in #ifdef WIN32 and no longer do the codeset stuff at all on
>> Unix - that'll make for cleaner code.
>
> Yes, that would be much better.

Something like this then?

//Magnus

*** a/src/backend/utils/mb/mbutils.c
--- b/src/backend/utils/mb/mbutils.c
***************
*** 849,854 **** cliplen(const char *str, int len, int limit)
--- 849,894 ----
      return l;
  }

+ #if defined(ENABLE_NLS) && defined(WIN32)
+ static const struct codeset_map {
+     int    encoding;
+     const char *codeset;
+ } codeset_map_array[] = {
+     {PG_UTF8, "UTF-8"},
+     {PG_LATIN1, "LATIN1"},
+     {PG_LATIN2, "LATIN2"},
+     {PG_LATIN3, "LATIN3"},
+     {PG_LATIN4, "LATIN4"},
+     {PG_ISO_8859_5, "ISO-8859-5"},
+     {PG_ISO_8859_6, "ISO_8859-6"},
+     {PG_ISO_8859_7, "ISO-8859-7"},
+     {PG_ISO_8859_8, "ISO-8859-8"},
+     {PG_LATIN5, "LATIN5"},
+     {PG_LATIN6, "LATIN6"},
+     {PG_LATIN7, "LATIN7"},
+     {PG_LATIN8, "LATIN8"},
+     {PG_LATIN9, "LATIN-9"},
+     {PG_LATIN10, "LATIN10"},
+     {PG_KOI8R, "KOI8-R"},
+     {PG_WIN1250, "CP1250"},
+     {PG_WIN1251, "CP1251"},
+     {PG_WIN1252, "CP1252"},
+     {PG_WIN1253, "CP1253"},
+     {PG_WIN1254, "CP1254"},
+     {PG_WIN1255, "CP1255"},
+     {PG_WIN1256, "CP1256"},
+     {PG_WIN1257, "CP1257"},
+     {PG_WIN1258, "CP1258"},
+     {PG_WIN866, "CP866"},
+     {PG_WIN874, "CP874"},
+     {PG_EUC_CN, "EUC-CN"},
+     {PG_EUC_JP, "EUC-JP"},
+     {PG_EUC_KR, "EUC-KR"},
+     {PG_EUC_TW, "EUC-TW"},
+     {PG_EUC_JIS_2004, "EUC-JP"}
+ };
+ #endif /* WIN32 */
+
  void
  SetDatabaseEncoding(int encoding)
  {
***************
*** 859,880 **** SetDatabaseEncoding(int encoding)
      Assert(DatabaseEncoding->encoding == encoding);

      /*
!      * On Windows, we allow UTF-8 database encoding to be used with any
!      * locale setting, because UTF-8 requires special handling anyway.
!      * But this means that gettext() might be misled about what output
!      * encoding it should use, so we have to tell it explicitly.
!      *
!      * In future we might want to call bind_textdomain_codeset
!      * unconditionally, but that requires knowing how to spell the codeset
!      * name properly for all encodings on all platforms, which might be
!      * problematic.
!      *
!      * This is presently unnecessary, but harmless, on non-Windows platforms.
       */
! #ifdef ENABLE_NLS
!     if (encoding == PG_UTF8)
!         if (bind_textdomain_codeset(textdomain(NULL), "UTF-8") == NULL)
!             elog(LOG, "bind_textdomain_codeset failed");
  #endif
  }

--- 899,921 ----
      Assert(DatabaseEncoding->encoding == encoding);

      /*
!      * On Windows, we need to explicitly bind gettext to the correct
!      * encoding, because gettext() tends to get confused.
       */
! #if defined(ENABLE_NLS) && defined(WIN32)
!     {
!         int    i;
!
!         for (i = 0; i < sizeof(codeset_map_array) / sizeof(codeset_map_array[0]); i++)
!         {
!             if (codeset_map_array[i].encoding == encoding)
!             {
!                 if (bind_textdomain_codeset(textdomain(NULL), codeset_map_array[i].codeset) == NULL)
!                     elog(LOG, "bind_textdomain_codeset failed");
!                 break;
!             }
!         }
!     }
  #endif
  }


Re: Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.

From
Peter Eisentraut
Date:
Magnus Hagander wrote:
> Peter Eisentraut wrote:
>> Magnus Hagander wrote:
>>> However, one question: The comment currently says it's harmless to do
>>> this on non-windows platforms. Does this still hold true?
>> Yes, the non-WIN32 code path appears to be the same, still.  But the
>> ifdef WIN32 part we don't want, because that presumes something about
>> the spelling of encoding names in the local iconv library.
>>
>>> If we do keep the thing win32 only, I think we should just wrap the
>>> whole thing in #ifdef WIN32 and no longer do the codeset stuff at all on
>>> Unix - that'll make for cleaner code.
>> Yes, that would be much better.
> 
> Something like this then?

Looks OK to me.


Magnus Hagander wrote:
> Peter Eisentraut wrote:
>> Magnus Hagander wrote:
>>> However, one question: The comment currently says it's harmless to do
>>> this on non-windows platforms. Does this still hold true?
>> Yes, the non-WIN32 code path appears to be the same, still.  But the
>> ifdef WIN32 part we don't want, because that presumes something about
>> the spelling of encoding names in the local iconv library.
>>
>>> If we do keep the thing win32 only, I think we should just wrap the
>>> whole thing in #ifdef WIN32 and no longer do the codeset stuff at all on
>>> Unix - that'll make for cleaner code.
>> Yes, that would be much better.
> 
> Something like this then?

It seems OK to me.

regards,
Hiroshi Inoue




Hiroshi Inoue wrote:
> Magnus Hagander wrote:
>> Peter Eisentraut wrote:
>>> Magnus Hagander wrote:
>>>> However, one question: The comment currently says it's harmless to do
>>>> this on non-windows platforms. Does this still hold true?
>>> Yes, the non-WIN32 code path appears to be the same, still.  But the
>>> ifdef WIN32 part we don't want, because that presumes something about
>>> the spelling of encoding names in the local iconv library.
>>>
>>>> If we do keep the thing win32 only, I think we should just wrap the
>>>> whole thing in #ifdef WIN32 and no longer do the codeset stuff at
>>>> all on
>>>> Unix - that'll make for cleaner code.
>>> Yes, that would be much better.
>>
>> Something like this then?
> 
> It seems OK to me.

Applied.

//Magnus