Thread: initdb of regression test failed.
Hi Tom-san. initdb does not operate by the mismatch of LOCALE. - Running in noclean mode. Mistakes will not be cleaned up.^M The files belonging to this database system will be owned by user "hiroshi".^M This user must also own the server process.^M ^M The database cluster will be initialized with locale Japanese_Japan.932.^M initdb: could not find suitable encoding for locale "Japanese_Japan.932"^M Rerun initdb with the -E option.^M Try "initdb --help" for more information.^M Running in noclean mode. Mistakes will not be cleaned up.^M - I think this is required.... Did I miss something? Regards, Hiroshi Saito
Attachment
"Hiroshi Saito" <z-saito@guitar.ocn.ne.jp> writes: > The database cluster will be initialized with locale Japanese_Japan.932. > initdb: could not find suitable encoding for locale "Japanese_Japan.932" So, what encoding *should* we use for that locale? > I think this is required.... We are certainly not going to disable pg_regress's ability to test in non-C locales. ISTM a proper fix is an addition to the table in src/port/chklocale.c. This example suggests actually that we need a boatload more table entries to handle Windows locale names :-( (count on Microsoft to ignore standards...) regards, tom lane
Hi. From: "Tom Lane" <tgl@sss.pgh.pa.us> > "Hiroshi Saito" <z-saito@guitar.ocn.ne.jp> writes: >> The database cluster will be initialized with locale Japanese_Japan.932. >> initdb: could not find suitable encoding for locale "Japanese_Japan.932" > > So, what encoding *should* we use for that locale? > >> I think this is required.... > > We are certainly not going to disable pg_regress's ability to test in > non-C locales. ISTM a proper fix is an addition to the table in > src/port/chklocale.c. This example suggests actually that we need > a boatload more table entries to handle Windows locale names :-( > (count on Microsoft to ignore standards...) Ah Ok, Please check it. However, This problem.... - Running in noclean mode. Mistakes will not be cleaned up.^M The files belonging to this database system will be owned by user "hiroshi".^M This user must also own the server process.^M ^M The database cluster will be initialized with locale Japanese_Japan.932.^M initdb: locale Japanese_Japan.932 requires unsupported encoding SJIS^M Encoding SJIS is not allowed as a server-side encoding.^M Rerun initdb with a different locale selection.^M Running in noclean mode. Mistakes will not be cleaned up.^M - I think that the check of this server side is the right action.! I desire the further suggestion.... Regards, Hiroshi Saito
Attachment
"Hiroshi Saito" <z-saito@guitar.ocn.ne.jp> wrote: > The database cluster will be initialized with locale Japanese_Japan.932. > initdb: locale Japanese_Japan.932 requires unsupported encoding SJIS > Encoding SJIS is not allowed as a server-side encoding. > - > I think that the check of this server side is the right action.! > I desire the further suggestion.... How about changing initdb to use encoding=UTF-8 and no-locale when the encoding of default locale is not suppoted in the server? I think it is the most frequently used combination when we cannot use the default encoding in server. The present initdb without options always fails in such environments. Using UTF-8 with no-locale is better than error. (Error is better than using wrong locale, though.) Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
Hi. From: "ITAGAKI Takahiro" <itagaki.takahiro@oss.ntt.co.jp> > > "Hiroshi Saito" <z-saito@guitar.ocn.ne.jp> wrote: > >> The database cluster will be initialized with locale Japanese_Japan.932. >> initdb: locale Japanese_Japan.932 requires unsupported encoding SJIS >> Encoding SJIS is not allowed as a server-side encoding. >> - >> I think that the check of this server side is the right action.! >> I desire the further suggestion.... > > How about changing initdb to use encoding=UTF-8 and no-locale when the > encoding of default locale is not suppoted in the server? I think it is > the most frequently used combination when we cannot use the default > encoding in server. Yeah, as for Japanese, your suggestion at least is right...I think. However, how is it in other countries? I worry about it... > > The present initdb without options always fails in such environments. > Using UTF-8 with no-locale is better than error. > (Error is better than using wrong locale, though.) Is a method specified and isn't it avoided by the document, rather than ad-hoc management? Regards, Hiroshi Saito
"Hiroshi Saito" <z-saito@guitar.ocn.ne.jp> wrote: > Ah Ok, Please check it. Your patch looks useful to prevent mismatch of encoding and locale on Windows, but I found there is a limitation that user will not able to specify locale. I added an alternative of nl_langinfo(CODESET) for Win32. Please check following commands: initdb --encoding=EUC_jp --locale=Japanese_Japan.932 vs. initdb --encoding=EUC_jp --locale=Japanese_Japan.20932 One problem is that user need to know codepage numbers. It might be possible to replace the default codepage to server encodings automatically if we have a mapping table from encoding to codepage. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
Attachment
Hi. ----- Original Message ----- From: "ITAGAKI Takahiro" <itagaki.takahiro@oss.ntt.co.jp> > > "Hiroshi Saito" <z-saito@guitar.ocn.ne.jp> wrote: > >> Ah Ok, Please check it. > > Your patch looks useful to prevent mismatch of encoding and locale on Windows, > but I found there is a limitation that user will not able to specify locale. > I added an alternative of nl_langinfo(CODESET) for Win32. > > Please check following commands: > initdb --encoding=EUC_jp --locale=Japanese_Japan.932 > vs. > initdb --encoding=EUC_jp --locale=Japanese_Japan.20932 > > > One problem is that user need to know codepage numbers. It might > be possible to replace the default codepage to server encodings > automatically if we have a mapping table from encoding to codepage. Yes, I think your approach looks very good. Then, It seems that it is necessary to consider an original initial value problem again. I consider a document publication or management. Anyway, Thanks. Regards, Hiroshi Saito
ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp> writes: > Your patch looks useful to prevent mismatch of encoding and locale on Windows, > but I found there is a limitation that user will not able to specify locale. > I added an alternative of nl_langinfo(CODESET) for Win32. Applied with small correction --- it looked like you'd put in the wrong PG_ENC code for GBK and BIG5. Not terribly important since we'd reject them anyway, but we might as well reject with the correct error message. This still leaves the policy decision of whether we want to have initdb assume "-E UTF8 --no-locale" if it sees the current locale has an unusable encoding. I'm not really happy with that idea because it would disable localization of messages. I think what we want, at least on Windows, is to switch to the "corresponding" locale that uses UTF8. Is there a simple way to do that? Or at least some simple recipe we can put into the documentation? "If you get this sort of error, use this --locale setting..." regards, tom lane
Hi. regression test surely goes wrong.! hedule --multibyte=SQL_ASCII --load-language=plpgsql ============== creating temporary installation ============== ============== initializing database system ============== pg_regress: initdb failed Examine ./log/initdb.log for the reason. Command was: ""C:/MinGW/home/hiroshi/pgsql/src/test/regress/./tmp_check/install//usr/local/pgsql/bin/initdb" -D "C:/MinGW/home/hiroshi/pgsql/src/test/regress/./tmp_check/data" -L "C:/MinGW/home/hiroshi/pgsql/src/test/regress/./tmp_check/install//usr/local/pgsql/share" --noclean > "./log/initdb.log" 2>&1" make[2]: *** [check] Error 2 make[2]: Leaving directory `/home/hiroshi/pgsql/src/test/regress' make[1]: *** [check] Error 2 make[1]: Leaving directory `/home/hiroshi/pgsql/src/test' make: *** [check] Error 2 -initdb.log- Running in noclean mode. Mistakes will not be cleaned up.^M The files belonging to this database system will be owned by user "hiroshi".^M This user must also own the server process.^M ^M The database cluster will be initialized with locale Japanese_Japan.932.^M initdb: locale Japanese_Japan.932 requires unsupported encoding SJIS^M Encoding SJIS is not allowed as a server-side encoding.^M Rerun initdb with a different locale selection.^M Running in noclean mode. Mistakes will not be cleaned up.^M - after the patch.. ============== shutting down postmaster ============== server stopped ======================= All 112 tests passed. ======================= Anyway, It surely fails now.:-( Regards, Hiroshi Saito
Attachment
Oops, patch of pg_regress.c should be disregarded. Sorry, I think this is desirable. > Hi. > > regression test surely goes wrong.! > > hedule --multibyte=SQL_ASCII --load-language=plpgsql > ============== creating temporary installation ============== > ============== initializing database system ============== > > pg_regress: initdb failed > Examine ./log/initdb.log for the reason. > Command was: > ""C:/MinGW/home/hiroshi/pgsql/src/test/regress/./tmp_check/install//usr/local/pgsql/bin/initdb" > -D "C:/MinGW/home/hiroshi/pgsql/src/test/regress/./tmp_check/data" -L > "C:/MinGW/home/hiroshi/pgsql/src/test/regress/./tmp_check/install//usr/local/pgsql/share" > --noclean > > "./log/initdb.log" 2>&1" > make[2]: *** [check] Error 2 > make[2]: Leaving directory `/home/hiroshi/pgsql/src/test/regress' > make[1]: *** [check] Error 2 > make[1]: Leaving directory `/home/hiroshi/pgsql/src/test' > make: *** [check] Error 2 > > -initdb.log- > Running in noclean mode. Mistakes will not be cleaned up.^M > The files belonging to this database system will be owned by user "hiroshi".^M > This user must also own the server process.^M > ^M > The database cluster will be initialized with locale Japanese_Japan.932.^M > initdb: locale Japanese_Japan.932 requires unsupported encoding SJIS^M > Encoding SJIS is not allowed as a server-side encoding.^M > Rerun initdb with a different locale selection.^M > Running in noclean mode. Mistakes will not be cleaned up.^M > - > > after the patch.. > > ============== shutting down postmaster ============== > server stopped > > ======================= > All 112 tests passed. > ======================= > > Anyway, It surely fails now.:-( > > Regards, > Hiroshi Saito >
Attachment
"Hiroshi Saito" <z-saito@guitar.ocn.ne.jp> wrote: > regression test surely goes wrong.! This fix does nothing against the regression failure. It is probably reasonable to choose UTF-8 as a server encoding when we cannot support the encoding of the current locale. A remaining issue is which we should use no-locale, locale of another encoding, or reporting error then. At least on Windows, locale of another encoding works correctly because we've already had some Windows-specific hacks. (try grep MultiByteToWideChar) In fact, we can accept options like: initdb -E UTF8 --locale=Japanese_Japan.932 -- CP932 is SJIS in nature I'll suggest to use UTF8 if the encoding is UTF-8 or NOT specified and we don't support the locale encoding on Windows, i.e. locale is always enabled on regression tests. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
I wrote: > I'll suggest to use UTF8 if the encoding is UTF-8 or NOT specified and > we don't support the locale encoding on Windows, i.e. locale is always > enabled on regression tests. Here is a patch to do it on Windows. 1. Use UTF-8 if the locale encoding is not available for server. 2. Allow mismatch between server and locale encodings if the server encoding is UTF-8. I succeeded to run regression test on Japanese version of Windows with the patch, but please test it on other language versions. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
Attachment
Hi. Um, I thinks the examination material of 8.4 by the reason for changing the feature. Of course, your proposal can be considered to obtain one solution. Then, discussion is required more. I feel that it is dangerous for 8.3.... Regards, Hiroshi Saito > > I wrote: >> I'll suggest to use UTF8 if the encoding is UTF-8 or NOT specified and >> we don't support the locale encoding on Windows, i.e. locale is always >> enabled on regression tests. > > Here is a patch to do it on Windows. > 1. Use UTF-8 if the locale encoding is not available for server. > 2. Allow mismatch between server and locale encodings if the server > encoding is UTF-8. > > I succeeded to run regression test on Japanese version of Windows > with the patch, but please test it on other language versions. > > Regards, > --- > ITAGAKI Takahiro > NTT Open Source Software Center >
ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp> writes: > In fact, we can accept options like: > initdb -E UTF8 --locale=Japanese_Japan.932 -- CP932 is SJIS in nature Hmm, but does that really work safely? I think varstr_cmp() does work, because it forces our data into wchar format and then calls wcscoll(). The thing that scares me is that various random other operating-system calls might deliver strings in an unexpected encoding. We've been through similar problems with timezone names reported by strftime, for example. regards, tom lane
Hi Tom-san. This may be mere information... In 8.3, when it has different encoding for every database, a locale requires C. Therefore, I am the reason which desires C by regression test. -- in>initdb -E EUC_JP -D../data --locale=Japanese_Japan.20932 The files belonging to this database system will be owned by user "hiroshi". This user must also own the server process. The database cluster will be initialized with locale Japanese_Japan.20932. initdb: could not find suitable text search configuration for locale "Japanese_J apan.20932" The default text search configuration will be set to "simple". creating directory ../data ... ok creating subdirectories ... ok selecting default max_connections ... 100 selecting default shared_buffers/max_fsm_pages ... 32MB/204800 creating configuration files ... ok creating template1 database in ../data/base/1 ... ok initializing pg_authid ... ok initializing dependencies ... ok creating system views ... ok loading system objects' descriptions ... ok creating conversions ... ok creating dictionaries ... ok setting privileges on built-in objects ... ok creating information schema ... ok vacuuming database template1 ... ok copying template1 to template0 ... ok copying template1 to postgres ... ok WARNING: enabling "trust" authentication for local connections You can change this by editing pg_hba.conf or using the -A option the next time you run initdb. Success. You can now start the database server using: -- in>psql template1 Welcome to psql 8.3devel, the PostgreSQL interactive terminal. Type: \copyright for distribution terms \h for help with SQL commands \? for help with psql commands \g or terminate with semicolon to execute query \q to quit template1=# \l List of databases Name | Owner | Encoding -----------+---------+---------- postgres | hiroshi | EUC_JP template0 | hiroshi | EUC_JP template1 | hiroshi | EUC_JP (3 rows) template1=# create database hiroshi; CREATE DATABASE template1=# \l List of databases Name | Owner | Encoding -----------+---------+---------- hiroshi | hiroshi | EUC_JP postgres | hiroshi | EUC_JP template0 | hiroshi | EUC_JP template1 | hiroshi | EUC_JP (4 rows) template1=# show LC_CTYPE; lc_ctype ---------------------- Japanese_Japan.20932 (1 row) template1=# create database utfdb encoding='UTF8'; ERROR: encoding UTF8 does not match server's locale Japanese_Japan.20932 DETAIL: The server's LC_CTYPE setting requires encoding EUC_JP. template1=#
Tom Lane <tgl@sss.pgh.pa.us> wrote: > > initdb -E UTF8 --locale=Japanese_Japan.932 -- CP932 is SJIS in nature > > Hmm, but does that really work safely? I think varstr_cmp() does work, > because it forces our data into wchar format and then calls wcscoll(). > The thing that scares me is that various random other operating-system > calls might deliver strings in an unexpected encoding. We've been > through similar problems with timezone names reported by strftime, for > example. Hmm, I see we might need to replace all locale-aware functions to wchar_t versions, for example, wcsftime instead of strftime. It requires more tests. It should be saved for 8.4. The attached is the second plan. It uses UTF-8 and locale=C when the default locale encoding is not supported and none of encoding and locale are passed to initdb. It would help users who use the default settings (including regression test). At the moment, it reset all of lc_* variables, but it might be possible use the default locale at lc_messages, lc_monetary, lc_numeric and lc_time even if lc_collate and lc_ctype are reset to C. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
Attachment
ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp> writes: > The attached is the second plan. It uses UTF-8 and locale=C when > the default locale encoding is not supported and none of encoding and > locale are passed to initdb. It would help users who use the default > settings (including regression test). I'm not very happy with this proposal, because for people who don't actually care about non-ASCII data (which is still a lot of people), forcing UTF-8 as the default encoding will impose pretty substantial overhead compared to SQL_ASCII --- it turns on all those multibyte-encoding checks. Implicitly selecting --no-locale doesn't seem like a big step forward either, since then you've just given up whatever you might have learned from the locale setting. Besides, if that's the behavior the user wants, he can specify it. I still think that what we should try to do in the default case is find a locale that is the same language but UTF-8 encoding. > At the moment, it reset all of lc_* variables, but it might be possible > use the default locale at lc_messages, lc_monetary, lc_numeric and lc_time > even if lc_collate and lc_ctype are reset to C. Well, that just leaves me wondering what encoding the localized messages would be presented in ... regards, tom lane