Thread: Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.
Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.
From
Magnus Hagander
Date:
Hiroshi Inoue wrote: > Hi Magnus and all, > > Magnus Hagander wrote: >> Log Message: >> ----------- >> Explicitly bind gettext() to the UTF8 locale when in use. >> This is required on Windows due to the special locale >> handling for UTF8 that doesn't change the full environment. > > Thanks to this change UTF-8 case was solved but Japanese users > are still unhappy with Windows databases with EUC_JP encoding. > Shift_JIS which is a Japanese encoding under Windows doesn't > match any server encoding and causes a crash with the use of > gettext. So Saito-san removed ja message catalog just before > the 8.3 release. > > Attached is a simple patch to avoid the crash and enable the > use of Japanese message catalog. > Please apply the patch if there's no problem. Hi! It will clearly also need an update to the comment, but I can take care of that. I assume you have tested this? The comment says that it works because we are handling UTF8 on a special way on Windows, but AFAIK we *don't* handle EUC_JP in a special way there? If your database is in EUC_JP, I don't see why gettext() isn't picking it up properly in the first place.. And why do we need that on Windows only, and not on other platforms? //Magnus
Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.
From
Hiroshi Inoue
Date:
Magnus Hagander wrote: > Hiroshi Inoue wrote: >> Hi Magnus and all, >> >> Magnus Hagander wrote: >>> Log Message: >>> ----------- >>> Explicitly bind gettext() to the UTF8 locale when in use. >>> This is required on Windows due to the special locale >>> handling for UTF8 that doesn't change the full environment. >> Thanks to this change UTF-8 case was solved but Japanese users >> are still unhappy with Windows databases with EUC_JP encoding. >> Shift_JIS which is a Japanese encoding under Windows doesn't >> match any server encoding and causes a crash with the use of >> gettext. So Saito-san removed ja message catalog just before >> the 8.3 release. >> >> Attached is a simple patch to avoid the crash and enable the >> use of Japanese message catalog. >> Please apply the patch if there's no problem. > > Hi! > > It will clearly also need an update to the comment, but I can take care > of that. > > I assume you have tested this? Though I myself didn't test it, Saito-san tested it. > The comment says that it works because we > are handling UTF8 on a special way on Windows, ISTM UTF-8 isn't a special case. In fact the comment also mentions the following. * In future we might want to call bind_textdomain_codeset * unconditionally, but that.... > If your database is in EUC_JP, I don't see why gettext() isn't picking > it up properly in the first place.. In Japan 2 encodings (EUC_JP and Shift_JIS) are often used. EUC_JP is mainly used on *nix and on Windows Shift_JIS is used. We use EUC_JP not Shift_JIS as the server encoding > And why do we need that on Windows only, and not on other platforms? because Shift_JIS isn't allowed as a server encoding. So the Japanese Windows native message encoding Shift_JIS nevermatches the server encoding EUC_JP and a conversion between Shitt_jis and EUC_JP is necessarily needed. regards, Hiroshi Inoue
Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.
From
Magnus Hagander
Date:
Hiroshi Inoue wrote: > Magnus Hagander wrote: >> Hiroshi Inoue wrote: >>> Hi Magnus and all, >>> >>> Magnus Hagander wrote: >>>> Log Message: >>>> ----------- >>>> Explicitly bind gettext() to the UTF8 locale when in use. >>>> This is required on Windows due to the special locale >>>> handling for UTF8 that doesn't change the full environment. >>> Thanks to this change UTF-8 case was solved but Japanese users >>> are still unhappy with Windows databases with EUC_JP encoding. >>> Shift_JIS which is a Japanese encoding under Windows doesn't >>> match any server encoding and causes a crash with the use of >>> gettext. So Saito-san removed ja message catalog just before >>> the 8.3 release. >>> >>> Attached is a simple patch to avoid the crash and enable the >>> use of Japanese message catalog. >>> Please apply the patch if there's no problem. >> Hi! >> >> It will clearly also need an update to the comment, but I can take care >> of that. >> >> I assume you have tested this? > > Though I myself didn't test it, Saito-san tested it. Ok, good. >> The comment says that it works because we >> are handling UTF8 on a special way on Windows, > > ISTM UTF-8 isn't a special case. > In fact the comment also mentions the following. > > * In future we might want to call bind_textdomain_codeset > * unconditionally, but that.... I think that's partially unrelated. UTF8 is special in that the environment that the backend runs in is different from SERVER_ENCODING in this case. For other encodings, we have setlocale():d to the same encoding. >> If your database is in EUC_JP, I don't see why gettext() isn't picking >> it up properly in the first place.. > > In Japan 2 encodings (EUC_JP and Shift_JIS) are often used. > EUC_JP is mainly used on *nix and on Windows Shift_JIS is > used. We use EUC_JP not Shift_JIS as the server encoding > >> And why do we need that on Windows only, and not on other platforms? > > because Shift_JIS isn't allowed as a server encoding. So > the Japanese Windows native message encoding Shift_JIS never > matches the server encoding EUC_JP and a conversion between > Shitt_jis and EUC_JP is necessarily needed. Ah, so we're basically hardcoding that information? The system will go up in SJIS, but since we can't deal with it, we switch it to EUC_JP? Ok, I think I understand. I've made some minor stylistic changes (we don't normally use if (NULL != <whatever>) in the pg sources), and will apply with those. This is for HEAD only, correct? Or is it something we should backpatch? //Magnus
Re: Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.
From
Tom Lane
Date:
Magnus Hagander <magnus@hagander.net> writes: > Hiroshi Inoue wrote: >> because Shift_JIS isn't allowed as a server encoding. So >> the Japanese Windows native message encoding Shift_JIS never >> matches the server encoding EUC_JP and a conversion between >> Shitt_jis and EUC_JP is necessarily needed. > Ah, so we're basically hardcoding that information? The system will go > up in SJIS, but since we can't deal with it, we switch it to EUC_JP? I'm not following this either. If the patch is really necessary then it seems it must be working around a bug in the Windows version of gettext, ie failure to distinguish CP932 from CP20932. Is that correct? > Ok, I think I understand. I've made some minor stylistic changes (we > don't normally use if (NULL != <whatever>) in the pg sources), and will > apply with those. It definitely needs a comment explaining why this is needed. regards, tom lane
Re: Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.
From
Hiroshi Inoue
Date:
Tom Lane wrote: > Magnus Hagander <magnus@hagander.net> writes: >> Hiroshi Inoue wrote: >>> because Shift_JIS isn't allowed as a server encoding. So >>> the Japanese Windows native message encoding Shift_JIS never >>> matches the server encoding EUC_JP and a conversion between >>> Shitt_jis and EUC_JP is necessarily needed. > >> Ah, so we're basically hardcoding that information? The system will go >> up in SJIS, but since we can't deal with it, we switch it to EUC_JP? > > I'm not following this either. If the patch is really necessary then it > seems it must be working around a bug in the Windows version of gettext, > ie failure to distinguish CP932 from CP20932. Is that correct? I'm afraid I don't understand what you mean exactly. AFAIK the output encoding of Windows gettext is detemined by the ANSI system code page which is usualy CP932(Shift_JIS) in Japan andunrelated to the locale settings. In addition CP20932 is rarely used in Japan IIRC. I've never used it and don't know what it is correctly (maybe a kind of EUC_JP). regards, Hiroshi Inoue
Re: Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.
From
Tom Lane
Date:
Hiroshi Inoue <inoue@tpf.co.jp> writes: > Tom Lane wrote: >> I'm not following this either. If the patch is really necessary then it >> seems it must be working around a bug in the Windows version of gettext, >> ie failure to distinguish CP932 from CP20932. Is that correct? > I'm afraid I don't understand what you mean exactly. > AFAIK the output encoding of Windows gettext is detemined by the > ANSI system code page which is usualy CP932(Shift_JIS) in Japan and > unrelated to the locale settings. If that's true then this code is presently broken for *every* locale under Windows, not only Japanese. To my mind the really correct thing to be doing here would be to call bind_textdomain_codeset in all cases, rather than trusting gettext to guess correctly about which encoding we want. As the comment notes, we have not attempted that because the codeset names aren't well standardized. But it seems to me that we could certainly find out what codeset names are used on Windows, and apply bind_textdomain_codeset all the time on Windows. That would make a lot more sense than ad-hoc treatment of UTF-8 and EUC-JP if you ask me ... regards, tom lane
Re: Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.
From
Hiroshi Inoue
Date:
Tom Lane wrote: > Hiroshi Inoue <inoue@tpf.co.jp> writes: >> Tom Lane wrote: >>> I'm not following this either. If the patch is really necessary then it >>> seems it must be working around a bug in the Windows version of gettext, >>> ie failure to distinguish CP932 from CP20932. Is that correct? > >> I'm afraid I don't understand what you mean exactly. >> AFAIK the output encoding of Windows gettext is detemined by the >> ANSI system code page which is usualy CP932(Shift_JIS) in Japan and >> unrelated to the locale settings. > > If that's true then this code is presently broken for *every* locale > under Windows, not only Japanese. Maybe there are a few languages/countires where 2 encodings arewidely used. > To my mind the really correct thing to be doing here would be to call > bind_textdomain_codeset in all cases, rather than trusting gettext to > guess correctly about which encoding we want. As the comment notes, > we have not attempted that because the codeset names aren't well > standardized. But it seems to me that we could certainly find out what > codeset names are used on Windows, and apply bind_textdomain_codeset > all the time on Windows. That would make a lot more sense than ad-hoc > treatment of UTF-8 and EUC-JP if you ask me ... I fundamentally agree with you. What we hope is to enable the use of Japanese message catalog which we gave up in 8.3 Windows-version release. regards, Hiroshi Inoue
Re: Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.
From
Tom Lane
Date:
Hiroshi Inoue <inoue@tpf.co.jp> writes: > Tom Lane wrote: >> If that's true then this code is presently broken for *every* locale >> under Windows, not only Japanese. > Maybe there are a few languages/countires where 2 encodings are > widely used. UTF8 vs Latin-N? In any case I think the problem is that gettext is looking at a setting that is not what we are looking at. Particularly with the 8.4 changes to allow per-database locale settings, this has got to be fixed in a bulletproof way. regards, tom lane
Re: Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.
From
Magnus Hagander
Date:
On 25 nov 2008, at 05.00, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Hiroshi Inoue <inoue@tpf.co.jp> writes: >> Tom Lane wrote: >>> If that's true then this code is presently broken for *every* locale >>> under Windows, not only Japanese. > >> Maybe there are a few languages/countires where 2 encodings are >> widely used. > > UTF8 vs Latin-N? We already special-cases utf8... I think the thing us that as long as the encodings are compatible (latin1 with different names for example) it worked fine. > In any case I think the problem is that gettext is > looking at a setting that is not what we are looking at. Particularly > with the 8.4 changes to allow per-database locale settings, this has > got to be fixed in a bulletproof way. > Agreed. /Magnus
Re: Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.
From
Hiroshi Inoue
Date:
Magnus Hagander wrote: > On 25 nov 2008, at 05.00, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> Hiroshi Inoue <inoue@tpf.co.jp> writes: >>> Tom Lane wrote: >>>> If that's true then this code is presently broken for *every* locale >>>> under Windows, not only Japanese. >> >>> Maybe there are a few languages/countires where 2 encodings are >>> widely used. >> >> UTF8 vs Latin-N? > > We already special-cases utf8... > > I think the thing us that as long as the encodings are compatible > (latin1 with different names for example) it worked fine. > >> In any case I think the problem is that gettext is >> looking at a setting that is not what we are looking at. Particularly >> with the 8.4 changes to allow per-database locale settings, this has >> got to be fixed in a bulletproof way. Attached is a new patch to apply bind_textdomain_codeset() to most server encodings. Exceptions are PG_SQL_ASCII, PG_MULE_INTERNAL and PG_EUC_JIS_2004. "EUC-JP" may be OK for EUC_JIS_2004. Unfortunately it's hard for Saito-san and me to check encodings other than EUC-JP. regards, Hiroshi Inoue *** mbutils.c.orig Sun Nov 23 08:42:57 2008 --- mbutils.c Wed Nov 26 12:17:12 2008 *************** *** 822,830 **** --- 822,870 ---- return clen; } + #ifdef WIN32 + static const struct codeset_map { + int encoding; + const char *codeset; + } codeset_map_array[] = { + {PG_UTF8, "UTF-8"}, + {PG_LATIN1, "LATIN1"}, + {PG_LATIN2, "LATIN2"}, + {PG_LATIN3, "LATIN3"}, + {PG_LATIN4, "LATIN4"}, + {PG_ISO_8859_5, "ISO-8859-5"}, + {PG_ISO_8859_6, "ISO_8859-6"}, + {PG_ISO_8859_7, "ISO-8859-7"}, + {PG_ISO_8859_8, "ISO-8859-8"}, + {PG_LATIN5, "LATIN5"}, + {PG_LATIN6, "LATIN6"}, + {PG_LATIN7, "LATIN7"}, + {PG_LATIN8, "LATIN8"}, + {PG_LATIN9, "LATIN-9"}, + {PG_LATIN10, "LATIN10"}, + {PG_KOI8R, "KOI8-R"}, + {PG_WIN1250, "CP1250"}, + {PG_WIN1251, "CP1251"}, + {PG_WIN1252, "CP1252"}, + {PG_WIN1253, "CP1253"}, + {PG_WIN1254, "CP1254"}, + {PG_WIN1255, "CP1255"}, + {PG_WIN1256, "CP1256"}, + {PG_WIN1257, "CP1257"}, + {PG_WIN1258, "CP1258"}, + {PG_WIN866, "CP866"}, + {PG_WIN874, "CP874"}, + {PG_EUC_CN, "EUC-CN"}, + {PG_EUC_JP, "EUC-JP"}, + {PG_EUC_KR, "EUC-KR"}, + {PG_EUC_TW, "EUC-TW"}}; + #endif /* WIN32 */ + void SetDatabaseEncoding(int encoding) { + const char *target_codeset = NULL; + if (!PG_VALID_BE_ENCODING(encoding)) elog(ERROR, "invalid database encoding: %d", encoding); *************** *** 846,852 **** */ #ifdef ENABLE_NLS if (encoding == PG_UTF8) ! if (bind_textdomain_codeset("postgres", "UTF-8") == NULL) elog(LOG, "bind_textdomain_codeset failed"); #endif } --- 886,907 ---- */ #ifdef ENABLE_NLS if (encoding == PG_UTF8) ! target_codeset = "UTF-8"; ! #ifdef WIN32 ! else ! { ! int i; ! ! for (i = 0; i < sizeof(codeset_map_array) / sizeof(struct codeset_map); i++) ! if (codeset_map_array[i].encoding == encoding) ! { ! target_codeset = codeset_map_array[i].codeset; ! break; ! } ! } ! #endif /* WIN32 */ ! if (target_codeset != NULL) ! if (bind_textdomain_codeset("postgres", target_codeset) == NULL) elog(LOG, "bind_textdomain_codeset failed"); #endif }
Re: Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.
From
Magnus Hagander
Date:
Hiroshi Inoue wrote: >> I think the thing us that as long as the encodings are compatible >> (latin1 with different names for example) it worked fine. >> >>> In any case I think the problem is that gettext is >>> looking at a setting that is not what we are looking at. Particularly >>> with the 8.4 changes to allow per-database locale settings, this has >>> got to be fixed in a bulletproof way. > > Attached is a new patch to apply bind_textdomain_codeset() to most > server encodings. Exceptions are PG_SQL_ASCII, PG_MULE_INTERNAL > and PG_EUC_JIS_2004. "EUC-JP" may be OK for EUC_JIS_2004. > > Unfortunately it's hard for Saito-san and me to check encodings > other than EUC-JP. In principle this looks good, I think, but I'm a bit worried around the lack of testing. I can do some testing under LATIN1 which is what we use in Sweden (just need to get gettext working *at all* in my dev environment again - I've somehow managed to break it), and perhaps we can find someone to do a test in an eastern-european locale to get some more datapoints? Can you outline the steps one needs to go through to show the problem, so we can confirm it's fixed in these locales? //Magnus
Re: Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.
From
Hiroshi Inoue
Date:
Magnus Hagander wrote: > Hiroshi Inoue wrote: >>> I think the thing us that as long as the encodings are compatible >>> (latin1 with different names for example) it worked fine. >>> >>>> In any case I think the problem is that gettext is >>>> looking at a setting that is not what we are looking at. Particularly >>>> with the 8.4 changes to allow per-database locale settings, this has >>>> got to be fixed in a bulletproof way. >> Attached is a new patch to apply bind_textdomain_codeset() to most >> server encodings. Exceptions are PG_SQL_ASCII, PG_MULE_INTERNAL >> and PG_EUC_JIS_2004. "EUC-JP" may be OK for EUC_JIS_2004. >> >> Unfortunately it's hard for Saito-san and me to check encodings >> other than EUC-JP. > > In principle this looks good, I think, but I'm a bit worried around the > lack of testing. Thanks and I agree with you. > I can do some testing under LATIN1 which is what we use > in Sweden (just need to get gettext working *at all* in my dev > environment again - I've somehow managed to break it), and perhaps we > can find someone to do a test in an eastern-european locale to get some > more datapoints? > > Can you outline the steps one needs to go through to show the problem, > so we can confirm it's fixed in these locales? Saito-san and I have been working on another related problem about changing LC_MESSAGES locale properly under Windows and would be able to provide a patch in a few days. It seems preferable for us to apply the patch also so as to change the message catalog easily. Attached is an example in which LC_MESSAGES is cht_twn(zh_TW) and the server encoding is EUC-TW. You can see it as a UTF-8 text because the client_encoding is set to UTF-8 first. BTW you can see another problem at line 4 in the text. At the point the LC_MESSAGES is still japanese and postgres fails to convert a Japanese error message to EUC_TW encoding. There's no wonder but it doesn't seem preferable. regards, Hiroshi Inoue set client_encoding to utf_8; SET 1; psql:cmd/euctw.sql:2: ERROR: character 0xb9e6 of encoding "EUC_TW" has no equivalent in "UTF8" select current_database(); current_database ------------------ euctw (1 �s) show server_encoding; server_encoding ----------------- EUC_TW (1 �s) show lc_messages; lc_messages -------------------- Japanese_Japan.932 (1 �s) set lc_messages to cht; SET select a; psql:cmd/euctw.sql:7: 錯誤: 欄位"a"不存在 LINE 1: select a; ^ 1; psql:cmd/euctw.sql:8: 錯誤: 在"語法錯誤"附近發生 1 LINE 1: 1; ^ select * from a; psql:cmd/euctw.sql:9: 錯誤: relation "a"不存在 LINE 1: select * from a; ^
Re: Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.
From
Bruce Momjian
Date:
Hiroshi, is this patch still needed? --------------------------------------------------------------------------- Hiroshi Inoue wrote: > Magnus Hagander wrote: > > On 25 nov 2008, at 05.00, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > >> Hiroshi Inoue <inoue@tpf.co.jp> writes: > >>> Tom Lane wrote: > >>>> If that's true then this code is presently broken for *every* locale > >>>> under Windows, not only Japanese. > >> > >>> Maybe there are a few languages/countires where 2 encodings are > >>> widely used. > >> > >> UTF8 vs Latin-N? > > > > We already special-cases utf8... > > > > I think the thing us that as long as the encodings are compatible > > (latin1 with different names for example) it worked fine. > > > >> In any case I think the problem is that gettext is > >> looking at a setting that is not what we are looking at. Particularly > >> with the 8.4 changes to allow per-database locale settings, this has > >> got to be fixed in a bulletproof way. > > Attached is a new patch to apply bind_textdomain_codeset() to most > server encodings. Exceptions are PG_SQL_ASCII, PG_MULE_INTERNAL > and PG_EUC_JIS_2004. "EUC-JP" may be OK for EUC_JIS_2004. > > Unfortunately it's hard for Saito-san and me to check encodings > other than EUC-JP. > > regards, > Hiroshi Inoue > *** mbutils.c.orig Sun Nov 23 08:42:57 2008 > --- mbutils.c Wed Nov 26 12:17:12 2008 > *************** > *** 822,830 **** > --- 822,870 ---- > return clen; > } > > + #ifdef WIN32 > + static const struct codeset_map { > + int encoding; > + const char *codeset; > + } codeset_map_array[] = { > + {PG_UTF8, "UTF-8"}, > + {PG_LATIN1, "LATIN1"}, > + {PG_LATIN2, "LATIN2"}, > + {PG_LATIN3, "LATIN3"}, > + {PG_LATIN4, "LATIN4"}, > + {PG_ISO_8859_5, "ISO-8859-5"}, > + {PG_ISO_8859_6, "ISO_8859-6"}, > + {PG_ISO_8859_7, "ISO-8859-7"}, > + {PG_ISO_8859_8, "ISO-8859-8"}, > + {PG_LATIN5, "LATIN5"}, > + {PG_LATIN6, "LATIN6"}, > + {PG_LATIN7, "LATIN7"}, > + {PG_LATIN8, "LATIN8"}, > + {PG_LATIN9, "LATIN-9"}, > + {PG_LATIN10, "LATIN10"}, > + {PG_KOI8R, "KOI8-R"}, > + {PG_WIN1250, "CP1250"}, > + {PG_WIN1251, "CP1251"}, > + {PG_WIN1252, "CP1252"}, > + {PG_WIN1253, "CP1253"}, > + {PG_WIN1254, "CP1254"}, > + {PG_WIN1255, "CP1255"}, > + {PG_WIN1256, "CP1256"}, > + {PG_WIN1257, "CP1257"}, > + {PG_WIN1258, "CP1258"}, > + {PG_WIN866, "CP866"}, > + {PG_WIN874, "CP874"}, > + {PG_EUC_CN, "EUC-CN"}, > + {PG_EUC_JP, "EUC-JP"}, > + {PG_EUC_KR, "EUC-KR"}, > + {PG_EUC_TW, "EUC-TW"}}; > + #endif /* WIN32 */ > + > void > SetDatabaseEncoding(int encoding) > { > + const char *target_codeset = NULL; > + > if (!PG_VALID_BE_ENCODING(encoding)) > elog(ERROR, "invalid database encoding: %d", encoding); > > *************** > *** 846,852 **** > */ > #ifdef ENABLE_NLS > if (encoding == PG_UTF8) > ! if (bind_textdomain_codeset("postgres", "UTF-8") == NULL) > elog(LOG, "bind_textdomain_codeset failed"); > #endif > } > --- 886,907 ---- > */ > #ifdef ENABLE_NLS > if (encoding == PG_UTF8) > ! target_codeset = "UTF-8"; > ! #ifdef WIN32 > ! else > ! { > ! int i; > ! > ! for (i = 0; i < sizeof(codeset_map_array) / sizeof(struct codeset_map); i++) > ! if (codeset_map_array[i].encoding == encoding) > ! { > ! target_codeset = codeset_map_array[i].codeset; > ! break; > ! } > ! } > ! #endif /* WIN32 */ > ! if (target_codeset != NULL) > ! if (bind_textdomain_codeset("postgres", target_codeset) == NULL) > elog(LOG, "bind_textdomain_codeset failed"); > #endif > } > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Re: Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.
From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes: > Hiroshi, is this patch still needed? This patch is for a problem that's entirely separate from the LC_TIME issue, if that's what you were wondering. regards, tom lane
Re: Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.
From
Hiroshi Inoue
Date:
Bruce Momjian wrote: > Hiroshi, is this patch still needed? Yes though it should be slightly changed now. *set lc_messages does not work* issue isn't directly related to this issue. regards, Hiroshi Inoue
Re: Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.
From
"Hiroshi Saito"
Date:
Hi. My swift attack test was the MinGW environment. But, Inoue-san suggestion. 1. MinGW+gcc build HIROSHI=# set LC_TIME=Ja; SET HIROSHI=# select to_char(now(),'TMDay');to_char ---------日曜日 (1 行) HIROSHI=# set LC_TIME='Japan'; SET HIROSHI=# select to_char(Now(),'TMDay');to_char ---------日曜日 (1 行) HIROSHI=# set LC_TIME='Japanese'; SET HIROSHI=# select to_char(Now(),'TMDay');to_char ---------日曜日 (1 行) However, A setup of 'Ja' was strange.?_? http://msdn.microsoft.com/en-us/library/aa246450(VS.60).aspx 2. MSVC build HIROSHI=# set LC_TIME='Ja'; ERROR: invalid value for parameter "lc_time": "Ja" STATEMENT: set LC_TIME='Ja'; ERROR: invalid value for parameter "lc_time": "Ja" HIROSHI=# set LC_TIME='Japan'; ERROR: invalid value for parameter "lc_time": "Japan" STATEMENT: set LC_TIME='Japan'; ERROR: invalid value for parameter "lc_time": "Japan" HIROSHI=# set LC_TIME=Japanese; SET HIROSHI=# select to_char(Now(),'TMDay');to_char ---------日曜日 (1 行) Umm, Re-investigation is required for this. :-( However, If reasonable clear, it will be good for a document at suggestion. Regards, Hiroshi Saito
Re: Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.
From
Hiroshi Inoue
Date:
Hiroshi Inoue wrote: > Bruce Momjian wrote: >> Hiroshi, is this patch still needed? > > Yes though it should be slightly changed now. > *set lc_messages does not work* issue isn't directly related to this > issue. Though I'm not sure how we can test it, I can provide test results like the attached one. The attached is a test result in case LC_MESSAGES=fr causing the following 3 errors in databases with various encodings. select a; ==> column "a" does not exist 1; ==> syntax error at or near "1" select * from a; ==> relation "a" does not exist Comments? Please note the encoding of the attached file is utf-8. regards, Hiroshi Inoue SET current_database ------------------ utf8 (1 row) server_encoding ----------------- UTF8 (1 row) SET lc_messages ------------- fr (1 row) ERREUR: la colonne « a » n'existe pas LINE 1: select a; ^ ERREUR: erreur de syntaxe sur ou près de « 1 » LINE 1: 1; ^ ERREUR: la relation « a » n'existe pas LINE 1: select * from a; ^ SET current_database ------------------ eucjp (1 row) server_encoding ----------------- EUC_JP (1 row) SET lc_messages ------------- fr (1 row) ERREUR: la colonne << a >> n'existe pas LINE 1: select a; ^ ERREUR: erreur de syntaxe sur ou près de << 1 >> LINE 1: 1; ^ ERREUR: la relation << a >> n'existe pas LINE 1: select * from a; ^ SET current_database ------------------ euc-jis-2004 (1 row) server_encoding ----------------- EUC_JIS_2004 (1 row) SET lc_messages ------------- fr (1 row) ERREUR: la colonne << a >> n'existe pas LINE 1: select a; ^ ERROR: character 0x8fabb2 of encoding "EUC_JIS_2004" has no equivalent in "UTF8" ERREUR: la relation << a >> n'existe pas LINE 1: select * from a; ^ SET current_database ------------------ euccn (1 row) server_encoding ----------------- EUC_CN (1 row) SET lc_messages ------------- fr (1 row) ERREUR: la colonne << a >> n'existe pas LINE 1: select a; ^ ERREUR: erreur de syntaxe sur ou près de << 1 >> LINE 1: 1; ^ ERREUR: la relation << a >> n'existe pas LINE 1: select * from a; ^ SET current_database ------------------ euctw (1 row) server_encoding ----------------- EUC_TW (1 row) SET lc_messages ------------- fr (1 row) ERREUR: la colonne << a >> n'existe pas LINE 1: select a; ^ ERREUR: erreur de syntaxe sur ou pr`es de << 1 >> LINE 1: 1; ^ ERREUR: la relation << a >> n'existe pas LINE 1: select * from a; ^ SET current_database ------------------ euckr (1 row) server_encoding ----------------- EUC_KR (1 row) SET lc_messages ------------- fr (1 row) ERREUR: la colonne << a >> n'existe pas LINE 1: select a; ^ ERREUR: erreur de syntaxe sur ou pr`es de << 1 >> LINE 1: 1; ^ ERREUR: la relation << a >> n'existe pas LINE 1: select * from a; ^ SET current_database ------------------ latin1 (1 row) server_encoding ----------------- LATIN1 (1 row) SET lc_messages ------------- fr (1 row) ERREUR: la colonne « a » n'existe pas LINE 1: select a; ^ ERREUR: erreur de syntaxe sur ou près de « 1 » LINE 1: 1; ^ ERREUR: la relation « a » n'existe pas LINE 1: select * from a; ^ SET current_database ------------------ latin2 (1 row) server_encoding ----------------- LATIN2 (1 row) SET lc_messages ------------- fr (1 row) ERREUR: la colonne << a >> n'existe pas LINE 1: select a; ^ ERREUR: erreur de syntaxe sur ou pr`es de << 1 >> LINE 1: 1; ^ ERREUR: la relation << a >> n'existe pas LINE 1: select * from a; ^ SET current_database ------------------ latin3 (1 row) server_encoding ----------------- LATIN3 (1 row) SET lc_messages ------------- fr (1 row) ERREUR: la colonne << a >> n'existe pas LINE 1: select a; ^ ERREUR: erreur de syntaxe sur ou près de << 1 >> LINE 1: 1; ^ ERREUR: la relation << a >> n'existe pas LINE 1: select * from a; ^ SET current_database ------------------ latin4 (1 row) server_encoding ----------------- LATIN4 (1 row) SET lc_messages ------------- fr (1 row) ERREUR: la colonne << a >> n'existe pas LINE 1: select a; ^ ERREUR: erreur de syntaxe sur ou pr`es de << 1 >> LINE 1: 1; ^ ERREUR: la relation << a >> n'existe pas LINE 1: select * from a; ^ SET current_database ------------------ latin5 (1 row) server_encoding ----------------- LATIN5 (1 row) SET lc_messages ------------- fr (1 row) ERREUR: la colonne « a » n'existe pas LINE 1: select a; ^ ERREUR: erreur de syntaxe sur ou près de « 1 » LINE 1: 1; ^ ERREUR: la relation « a » n'existe pas LINE 1: select * from a; ^ SET current_database ------------------ latin6 (1 row) server_encoding ----------------- LATIN6 (1 row) SET lc_messages ------------- fr (1 row) ERREUR: la colonne << a >> n'existe pas LINE 1: select a; ^ ERREUR: erreur de syntaxe sur ou pr`es de << 1 >> LINE 1: 1; ^ ERREUR: la relation << a >> n'existe pas LINE 1: select * from a; ^ SET current_database ------------------ latin7 (1 row) server_encoding ----------------- LATIN7 (1 row) SET lc_messages ------------- fr (1 row) ERREUR: la colonne « a » n'existe pas LINE 1: select a; ^ ERREUR: erreur de syntaxe sur ou pr`es de « 1 » LINE 1: 1; ^ ERREUR: la relation « a » n'existe pas LINE 1: select * from a; ^ SET current_database ------------------ latin8 (1 row) server_encoding ----------------- LATIN8 (1 row) SET lc_messages ------------- fr (1 row) ERREUR: la colonne << a >> n'existe pas LINE 1: select a; ^ ERREUR: erreur de syntaxe sur ou près de << 1 >> LINE 1: 1; ^ ERREUR: la relation << a >> n'existe pas LINE 1: select * from a; ^ SET current_database ------------------ latin9 (1 row) server_encoding ----------------- LATIN9 (1 row) SET lc_messages ------------- fr (1 row) ERREUR: la colonne « a » n'existe pas LINE 1: select a; ^ ERREUR: erreur de syntaxe sur ou près de « 1 » LINE 1: 1; ^ ERREUR: la relation « a » n'existe pas LINE 1: select * from a; ^ SET current_database ------------------ latin10 (1 row) server_encoding ----------------- LATIN10 (1 row) SET lc_messages ------------- fr (1 row) ERREUR: la colonne « a » n'existe pas LINE 1: select a; ^ ERREUR: erreur de syntaxe sur ou près de « 1 » LINE 1: 1; ^ ERREUR: la relation « a » n'existe pas LINE 1: select * from a; ^ SET current_database ------------------ iso-8859-5 (1 row) server_encoding ----------------- ISO_8859_5 (1 row) SET lc_messages ------------- fr (1 row) ERREUR: la colonne << a >> n'existe pas LINE 1: select a; ^ ERREUR: erreur de syntaxe sur ou pr`es de << 1 >> LINE 1: 1; ^ ERREUR: la relation << a >> n'existe pas LINE 1: select * from a; ^ SET current_database ------------------ iso-8859-6 (1 row) server_encoding ----------------- ISO_8859_6 (1 row) SET lc_messages ------------- fr (1 row) ERREUR: la colonne << a >> n'existe pas LINE 1: select a; ^ ERREUR: erreur de syntaxe sur ou pr`es de << 1 >> LINE 1: 1; ^ ERREUR: la relation << a >> n'existe pas LINE 1: select * from a; ^ SET current_database ------------------ iso-8859-7 (1 row) server_encoding ----------------- ISO_8859_7 (1 row) SET lc_messages ------------- fr (1 row) ERREUR: la colonne « a » n'existe pas LINE 1: select a; ^ ERREUR: erreur de syntaxe sur ou pr`es de « 1 » LINE 1: 1; ^ ERREUR: la relation « a » n'existe pas LINE 1: select * from a; ^ SET current_database ------------------ iso-8859-8 (1 row) server_encoding ----------------- ISO_8859_8 (1 row) SET lc_messages ------------- fr (1 row) ERREUR: la colonne « a » n'existe pas LINE 1: select a; ^ ERREUR: erreur de syntaxe sur ou pr`es de « 1 » LINE 1: 1; ^ ERREUR: la relation « a » n'existe pas LINE 1: select * from a; ^ SET current_database ------------------ koi8-r (1 row) server_encoding ----------------- KOI8 (1 row) SET lc_messages ------------- fr (1 row) ERREUR: la colonne << a >> n'existe pas LINE 1: select a; ^ ERREUR: erreur de syntaxe sur ou pr`es de << 1 >> LINE 1: 1; ^ ERREUR: la relation << a >> n'existe pas LINE 1: select * from a; ^ SET current_database ------------------ win1250 (1 row) server_encoding ----------------- WIN1250 (1 row) SET lc_messages ------------- fr (1 row) ERREUR: la colonne « a » n'existe pas LINE 1: select a; ^ ERREUR: erreur de syntaxe sur ou pr`es de « 1 » LINE 1: 1; ^ ERREUR: la relation « a » n'existe pas LINE 1: select * from a; ^ SET current_database ------------------ win1251 (1 row) server_encoding ----------------- WIN1251 (1 row) SET lc_messages ------------- fr (1 row) ERREUR: la colonne « a » n'existe pas LINE 1: select a; ^ ERREUR: erreur de syntaxe sur ou pr`es de « 1 » LINE 1: 1; ^ ERREUR: la relation « a » n'existe pas LINE 1: select * from a; ^ SET current_database ------------------ win1252 (1 row) server_encoding ----------------- WIN1252 (1 row) SET lc_messages ------------- fr (1 row) ERREUR: la colonne « a » n'existe pas LINE 1: select a; ^ ERREUR: erreur de syntaxe sur ou près de « 1 » LINE 1: 1; ^ ERREUR: la relation « a » n'existe pas LINE 1: select * from a; ^ SET current_database ------------------ win1253 (1 row) server_encoding ----------------- WIN1253 (1 row) SET lc_messages ------------- fr (1 row) ERREUR: la colonne « a » n'existe pas LINE 1: select a; ^ ERREUR: erreur de syntaxe sur ou pr`es de « 1 » LINE 1: 1; ^ ERREUR: la relation « a » n'existe pas LINE 1: select * from a; ^ SET current_database ------------------ win1254 (1 row) server_encoding ----------------- WIN1254 (1 row) SET lc_messages ------------- fr (1 row) ERREUR: la colonne « a » n'existe pas LINE 1: select a; ^ ERREUR: erreur de syntaxe sur ou près de « 1 » LINE 1: 1; ^ ERREUR: la relation « a » n'existe pas LINE 1: select * from a; ^ SET current_database ------------------ win1255 (1 row) server_encoding ----------------- WIN1255 (1 row) SET lc_messages ------------- fr (1 row) ERREUR: la colonne « a » n'existe pas LINE 1: select a; ^ ERREUR: erreur de syntaxe sur ou pr`es de « 1 » LINE 1: 1; ^ ERREUR: la relation « a » n'existe pas LINE 1: select * from a; ^ SET current_database ------------------ win1256 (1 row) server_encoding ----------------- WIN1256 (1 row) SET lc_messages ------------- fr (1 row) ERREUR: la colonne « a » n'existe pas LINE 1: select a; ^ ERREUR: erreur de syntaxe sur ou près de « 1 » LINE 1: 1; ^ ERREUR: la relation « a » n'existe pas LINE 1: select * from a; ^ SET current_database ------------------ win1257 (1 row) server_encoding ----------------- WIN1257 (1 row) SET lc_messages ------------- fr (1 row) ERREUR: la colonne « a » n'existe pas LINE 1: select a; ^ ERREUR: erreur de syntaxe sur ou pr`es de « 1 » LINE 1: 1; ^ ERREUR: la relation « a » n'existe pas LINE 1: select * from a; ^ SET current_database ------------------ win1258 (1 row) server_encoding ----------------- WIN1258 (1 row) SET lc_messages ------------- fr (1 row) ERREUR: la colonne « a » n'existe pas LINE 1: select a; ^ ERREUR: erreur de syntaxe sur ou près de « 1 » LINE 1: 1; ^ ERREUR: la relation « a » n'existe pas LINE 1: select * from a; ^ SET current_database ------------------ win866 (1 row) server_encoding ----------------- WIN866 (1 row) SET lc_messages ------------- fr (1 row) ERREUR: la colonne << a >> n'existe pas LINE 1: select a; ^ ERREUR: erreur de syntaxe sur ou pr`es de << 1 >> LINE 1: 1; ^ ERREUR: la relation << a >> n'existe pas LINE 1: select * from a; ^ SET current_database ------------------ win874 (1 row) server_encoding ----------------- WIN874 (1 row) SET lc_messages ------------- fr (1 row) ERREUR: la colonne << a >> n'existe pas LINE 1: select a; ^ ERREUR: erreur de syntaxe sur ou pr`es de << 1 >> LINE 1: 1; ^ ERREUR: la relation << a >> n'existe pas LINE 1: select * from a; ^
Re: Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.
From
Magnus Hagander
Date:
Hiroshi Inoue wrote: > Hiroshi Inoue wrote: >> Bruce Momjian wrote: >>> Hiroshi, is this patch still needed? >> >> Yes though it should be slightly changed now. In what way should it be changed? //Magnus
Re: Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.
From
Hiroshi Inoue
Date:
Magnus Hagander wrote: > Hiroshi Inoue wrote: >> Hiroshi Inoue wrote: >>> Bruce Momjian wrote: >>>> Hiroshi, is this patch still needed? >>> Yes though it should be slightly changed now. > > In what way should it be changed? One is already committed by you. [COMMITTERS] pgsql: Use the new text domain names Another is to bind the codeset "EUC-JP" for PG_EUC_JIS_2004 server encoding. Though EUC_JP and EUC_JIS_2004 aren't completely compatible, it seems OK in most cases. regards, Hiroshi Inoue
Re: Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.
From
Hiroshi Inoue
Date:
Hiroshi Inoue wrote: > Magnus Hagander wrote: >> Hiroshi Inoue wrote: >>> Hiroshi Inoue wrote: >>>> Bruce Momjian wrote: >>>>> Hiroshi, is this patch still needed? >>>> Yes though it should be slightly changed now. >> >> In what way should it be changed? > > One is already committed by you. > [COMMITTERS] pgsql: Use the new text domain names > > Another is to bind the codeset "EUC-JP" for > PG_EUC_JIS_2004 server encoding. The attached is an updated patch. regards, Hiroshi Inoue Index: mbutils.c =================================================================== RCS file: /projects/cvsroot/pgsql/src/backend/utils/mb/mbutils.c,v retrieving revision 1.77 diff -c -c -r1.77 mbutils.c *** mbutils.c 19 Jan 2009 15:34:23 -0000 1.77 --- mbutils.c 20 Jan 2009 12:54:33 -0000 *************** *** 837,842 **** --- 837,881 ---- return clen; } + #ifdef WIN32 + static const struct codeset_map { + int encoding; + const char *codeset; + } codeset_map_array[] = { + {PG_UTF8, "UTF-8"}, + {PG_LATIN1, "LATIN1"}, + {PG_LATIN2, "LATIN2"}, + {PG_LATIN3, "LATIN3"}, + {PG_LATIN4, "LATIN4"}, + {PG_ISO_8859_5, "ISO-8859-5"}, + {PG_ISO_8859_6, "ISO_8859-6"}, + {PG_ISO_8859_7, "ISO-8859-7"}, + {PG_ISO_8859_8, "ISO-8859-8"}, + {PG_LATIN5, "LATIN5"}, + {PG_LATIN6, "LATIN6"}, + {PG_LATIN7, "LATIN7"}, + {PG_LATIN8, "LATIN8"}, + {PG_LATIN9, "LATIN-9"}, + {PG_LATIN10, "LATIN10"}, + {PG_KOI8R, "KOI8-R"}, + {PG_WIN1250, "CP1250"}, + {PG_WIN1251, "CP1251"}, + {PG_WIN1252, "CP1252"}, + {PG_WIN1253, "CP1253"}, + {PG_WIN1254, "CP1254"}, + {PG_WIN1255, "CP1255"}, + {PG_WIN1256, "CP1256"}, + {PG_WIN1257, "CP1257"}, + {PG_WIN1258, "CP1258"}, + {PG_WIN866, "CP866"}, + {PG_WIN874, "CP874"}, + {PG_EUC_CN, "EUC-CN"}, + {PG_EUC_JP, "EUC-JP"}, + {PG_EUC_KR, "EUC-KR"}, + {PG_EUC_TW, "EUC-TW"}, + {PG_EUC_JIS_2004, "EUC-JP"}}; + #endif /* WIN32 */ + /* mbcliplen for any single-byte encoding */ static int cliplen(const char *str, int len, int limit) *************** *** 852,857 **** --- 891,898 ---- void SetDatabaseEncoding(int encoding) { + const char *target_codeset = NULL; + if (!PG_VALID_BE_ENCODING(encoding)) elog(ERROR, "invalid database encoding: %d", encoding); *************** *** 873,879 **** */ #ifdef ENABLE_NLS if (encoding == PG_UTF8) ! if (bind_textdomain_codeset(textdomain(NULL), "UTF-8") == NULL) elog(LOG, "bind_textdomain_codeset failed"); #endif } --- 914,935 ---- */ #ifdef ENABLE_NLS if (encoding == PG_UTF8) ! target_codeset = "UTF-8"; ! #ifdef WIN32 ! else ! { ! int i; ! ! for (i = 0; i < sizeof(codeset_map_array) / sizeof(struct codeset_map); i++) ! if (codeset_map_array[i].encoding == encoding) ! { ! target_codeset = codeset_map_array[i].codeset; ! break; ! } ! } ! #endif /* WIN32 */ ! if (target_codeset != NULL) ! if (bind_textdomain_codeset(textdomain(NULL), target_codeset) == NULL) elog(LOG, "bind_textdomain_codeset failed"); #endif }
Re: Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.
From
Magnus Hagander
Date:
Hiroshi Inoue wrote: > Hiroshi Inoue wrote: >> Magnus Hagander wrote: >>> Hiroshi Inoue wrote: >>>> Hiroshi Inoue wrote: >>>>> Bruce Momjian wrote: >>>>>> Hiroshi, is this patch still needed? >>>>> Yes though it should be slightly changed now. >>> >>> In what way should it be changed? >> >> One is already committed by you. >> [COMMITTERS] pgsql: Use the new text domain names >> >> Another is to bind the codeset "EUC-JP" for >> PG_EUC_JIS_2004 server encoding. > > The attached is an updated patch. Thanks. Looking at it, the comment clearly needs updating - I'll do that. However, one question: The comment currently says it's harmless to do this on non-windows platforms. Does this still hold true? In that case, this whole thing shouldn't be #ifdef:ed to WIN32 and can be simplified. Or does the "middle part" of the comment come into play, in that the codeset names can be different on different platforms? Peter, can you comment on that? If we do keep the thing win32 only, I think we should just wrap the whole thing in #ifdef WIN32 and no longer do the codeset stuff at all on Unix - that'll make for cleaner code. //Magnus
Re: Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.
From
Peter Eisentraut
Date:
Magnus Hagander wrote: > However, one question: The comment currently says it's harmless to do > this on non-windows platforms. Does this still hold true? Yes, the non-WIN32 code path appears to be the same, still. But the ifdef WIN32 part we don't want, because that presumes something about the spelling of encoding names in the local iconv library. > If we do keep the thing win32 only, I think we should just wrap the > whole thing in #ifdef WIN32 and no longer do the codeset stuff at all on > Unix - that'll make for cleaner code. Yes, that would be much better.
Re: Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.
From
Magnus Hagander
Date:
Peter Eisentraut wrote: > Magnus Hagander wrote: >> However, one question: The comment currently says it's harmless to do >> this on non-windows platforms. Does this still hold true? > > Yes, the non-WIN32 code path appears to be the same, still. But the > ifdef WIN32 part we don't want, because that presumes something about > the spelling of encoding names in the local iconv library. > >> If we do keep the thing win32 only, I think we should just wrap the >> whole thing in #ifdef WIN32 and no longer do the codeset stuff at all on >> Unix - that'll make for cleaner code. > > Yes, that would be much better. Something like this then? //Magnus *** a/src/backend/utils/mb/mbutils.c --- b/src/backend/utils/mb/mbutils.c *************** *** 849,854 **** cliplen(const char *str, int len, int limit) --- 849,894 ---- return l; } + #if defined(ENABLE_NLS) && defined(WIN32) + static const struct codeset_map { + int encoding; + const char *codeset; + } codeset_map_array[] = { + {PG_UTF8, "UTF-8"}, + {PG_LATIN1, "LATIN1"}, + {PG_LATIN2, "LATIN2"}, + {PG_LATIN3, "LATIN3"}, + {PG_LATIN4, "LATIN4"}, + {PG_ISO_8859_5, "ISO-8859-5"}, + {PG_ISO_8859_6, "ISO_8859-6"}, + {PG_ISO_8859_7, "ISO-8859-7"}, + {PG_ISO_8859_8, "ISO-8859-8"}, + {PG_LATIN5, "LATIN5"}, + {PG_LATIN6, "LATIN6"}, + {PG_LATIN7, "LATIN7"}, + {PG_LATIN8, "LATIN8"}, + {PG_LATIN9, "LATIN-9"}, + {PG_LATIN10, "LATIN10"}, + {PG_KOI8R, "KOI8-R"}, + {PG_WIN1250, "CP1250"}, + {PG_WIN1251, "CP1251"}, + {PG_WIN1252, "CP1252"}, + {PG_WIN1253, "CP1253"}, + {PG_WIN1254, "CP1254"}, + {PG_WIN1255, "CP1255"}, + {PG_WIN1256, "CP1256"}, + {PG_WIN1257, "CP1257"}, + {PG_WIN1258, "CP1258"}, + {PG_WIN866, "CP866"}, + {PG_WIN874, "CP874"}, + {PG_EUC_CN, "EUC-CN"}, + {PG_EUC_JP, "EUC-JP"}, + {PG_EUC_KR, "EUC-KR"}, + {PG_EUC_TW, "EUC-TW"}, + {PG_EUC_JIS_2004, "EUC-JP"} + }; + #endif /* WIN32 */ + void SetDatabaseEncoding(int encoding) { *************** *** 859,880 **** SetDatabaseEncoding(int encoding) Assert(DatabaseEncoding->encoding == encoding); /* ! * On Windows, we allow UTF-8 database encoding to be used with any ! * locale setting, because UTF-8 requires special handling anyway. ! * But this means that gettext() might be misled about what output ! * encoding it should use, so we have to tell it explicitly. ! * ! * In future we might want to call bind_textdomain_codeset ! * unconditionally, but that requires knowing how to spell the codeset ! * name properly for all encodings on all platforms, which might be ! * problematic. ! * ! * This is presently unnecessary, but harmless, on non-Windows platforms. */ ! #ifdef ENABLE_NLS ! if (encoding == PG_UTF8) ! if (bind_textdomain_codeset(textdomain(NULL), "UTF-8") == NULL) ! elog(LOG, "bind_textdomain_codeset failed"); #endif } --- 899,921 ---- Assert(DatabaseEncoding->encoding == encoding); /* ! * On Windows, we need to explicitly bind gettext to the correct ! * encoding, because gettext() tends to get confused. */ ! #if defined(ENABLE_NLS) && defined(WIN32) ! { ! int i; ! ! for (i = 0; i < sizeof(codeset_map_array) / sizeof(codeset_map_array[0]); i++) ! { ! if (codeset_map_array[i].encoding == encoding) ! { ! if (bind_textdomain_codeset(textdomain(NULL), codeset_map_array[i].codeset) == NULL) ! elog(LOG, "bind_textdomain_codeset failed"); ! break; ! } ! } ! } #endif }
Re: Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.
From
Peter Eisentraut
Date:
Magnus Hagander wrote: > Peter Eisentraut wrote: >> Magnus Hagander wrote: >>> However, one question: The comment currently says it's harmless to do >>> this on non-windows platforms. Does this still hold true? >> Yes, the non-WIN32 code path appears to be the same, still. But the >> ifdef WIN32 part we don't want, because that presumes something about >> the spelling of encoding names in the local iconv library. >> >>> If we do keep the thing win32 only, I think we should just wrap the >>> whole thing in #ifdef WIN32 and no longer do the codeset stuff at all on >>> Unix - that'll make for cleaner code. >> Yes, that would be much better. > > Something like this then? Looks OK to me.
Re: Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.
From
Hiroshi Inoue
Date:
Magnus Hagander wrote: > Peter Eisentraut wrote: >> Magnus Hagander wrote: >>> However, one question: The comment currently says it's harmless to do >>> this on non-windows platforms. Does this still hold true? >> Yes, the non-WIN32 code path appears to be the same, still. But the >> ifdef WIN32 part we don't want, because that presumes something about >> the spelling of encoding names in the local iconv library. >> >>> If we do keep the thing win32 only, I think we should just wrap the >>> whole thing in #ifdef WIN32 and no longer do the codeset stuff at all on >>> Unix - that'll make for cleaner code. >> Yes, that would be much better. > > Something like this then? It seems OK to me. regards, Hiroshi Inoue
Re: Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.
From
Magnus Hagander
Date:
Hiroshi Inoue wrote: > Magnus Hagander wrote: >> Peter Eisentraut wrote: >>> Magnus Hagander wrote: >>>> However, one question: The comment currently says it's harmless to do >>>> this on non-windows platforms. Does this still hold true? >>> Yes, the non-WIN32 code path appears to be the same, still. But the >>> ifdef WIN32 part we don't want, because that presumes something about >>> the spelling of encoding names in the local iconv library. >>> >>>> If we do keep the thing win32 only, I think we should just wrap the >>>> whole thing in #ifdef WIN32 and no longer do the codeset stuff at >>>> all on >>>> Unix - that'll make for cleaner code. >>> Yes, that would be much better. >> >> Something like this then? > > It seems OK to me. Applied. //Magnus