Thread: initdb of regression test failed.

initdb of regression test failed.

From
"Hiroshi Saito"
Date:
Hi Tom-san.

initdb does not operate by the mismatch of LOCALE.

-
Running in noclean mode.  Mistakes will not be cleaned up.^M
The files belonging to this database system will be owned by user "hiroshi".^M
This user must also own the server process.^M
^M
The database cluster will be initialized with locale Japanese_Japan.932.^M
initdb: could not find suitable encoding for locale "Japanese_Japan.932"^M
Rerun initdb with the -E option.^M
Try "initdb --help" for more information.^M
Running in noclean mode.  Mistakes will not be cleaned up.^M
-

I think this is required....
Did I miss something?

Regards,
Hiroshi Saito

Attachment

Re: initdb of regression test failed.

From
Tom Lane
Date:
"Hiroshi Saito" <z-saito@guitar.ocn.ne.jp> writes:
> The database cluster will be initialized with locale Japanese_Japan.932.
> initdb: could not find suitable encoding for locale "Japanese_Japan.932"

So, what encoding *should* we use for that locale?

> I think this is required....

We are certainly not going to disable pg_regress's ability to test in
non-C locales.  ISTM a proper fix is an addition to the table in
src/port/chklocale.c.  This example suggests actually that we need
a boatload more table entries to handle Windows locale names :-(
(count on Microsoft to ignore standards...)

            regards, tom lane

Re: initdb of regression test failed.

From
"Hiroshi Saito"
Date:
Hi.

From: "Tom Lane" <tgl@sss.pgh.pa.us>


> "Hiroshi Saito" <z-saito@guitar.ocn.ne.jp> writes:
>> The database cluster will be initialized with locale Japanese_Japan.932.
>> initdb: could not find suitable encoding for locale "Japanese_Japan.932"
>
> So, what encoding *should* we use for that locale?
>
>> I think this is required....
>
> We are certainly not going to disable pg_regress's ability to test in
> non-C locales.  ISTM a proper fix is an addition to the table in
> src/port/chklocale.c.  This example suggests actually that we need
> a boatload more table entries to handle Windows locale names :-(
> (count on Microsoft to ignore standards...)

Ah Ok, Please check it.

However, This problem....
-
Running in noclean mode.  Mistakes will not be cleaned up.^M
The files belonging to this database system will be owned by user "hiroshi".^M
This user must also own the server process.^M
^M
The database cluster will be initialized with locale Japanese_Japan.932.^M
initdb: locale Japanese_Japan.932 requires unsupported encoding SJIS^M
Encoding SJIS is not allowed as a server-side encoding.^M
Rerun initdb with a different locale selection.^M
Running in noclean mode.  Mistakes will not be cleaned up.^M
-
I think that the check of this server side is the right action.!
I desire the further suggestion....

Regards,
Hiroshi Saito

Attachment

Re: initdb of regression test failed.

From
ITAGAKI Takahiro
Date:
"Hiroshi Saito" <z-saito@guitar.ocn.ne.jp> wrote:

> The database cluster will be initialized with locale Japanese_Japan.932.
> initdb: locale Japanese_Japan.932 requires unsupported encoding SJIS
> Encoding SJIS is not allowed as a server-side encoding.
> -
> I think that the check of this server side is the right action.!
> I desire the further suggestion....

How about changing initdb to use encoding=UTF-8 and no-locale when the
encoding of default locale is not suppoted in the server? I think it is
the most frequently used combination when we cannot use the default
encoding in server.

The present initdb without options always fails in such environments.
Using UTF-8 with no-locale is better than error.
(Error is better than using wrong locale, though.)

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



Re: initdb of regression test failed.

From
"Hiroshi Saito"
Date:
Hi.

From: "ITAGAKI Takahiro" <itagaki.takahiro@oss.ntt.co.jp>

>
> "Hiroshi Saito" <z-saito@guitar.ocn.ne.jp> wrote:
>
>> The database cluster will be initialized with locale Japanese_Japan.932.
>> initdb: locale Japanese_Japan.932 requires unsupported encoding SJIS
>> Encoding SJIS is not allowed as a server-side encoding.
>> -
>> I think that the check of this server side is the right action.!
>> I desire the further suggestion....
>
> How about changing initdb to use encoding=UTF-8 and no-locale when the
> encoding of default locale is not suppoted in the server? I think it is
> the most frequently used combination when we cannot use the default
> encoding in server.

Yeah, as for Japanese, your suggestion at least is right...I think.
However,  how is it in other countries?  I worry about it...

>
> The present initdb without options always fails in such environments.
> Using UTF-8 with no-locale is better than error.
> (Error is better than using wrong locale, though.)

Is a method specified and isn't it avoided by the document, rather than
ad-hoc management?

Regards,
Hiroshi Saito

Re: initdb of regression test failed.

From
ITAGAKI Takahiro
Date:
"Hiroshi Saito" <z-saito@guitar.ocn.ne.jp> wrote:

> Ah Ok, Please check it.

Your patch looks useful to prevent mismatch of encoding and locale on Windows,
but I found there is a limitation that user will not able to specify locale.
I added an alternative of nl_langinfo(CODESET) for Win32.

Please check following commands:
 initdb --encoding=EUC_jp --locale=Japanese_Japan.932
   vs.
 initdb --encoding=EUC_jp --locale=Japanese_Japan.20932


One problem is that user need to know codepage numbers. It might
be possible to replace the default codepage to server encodings
automatically if we have a mapping table from encoding to codepage.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


Attachment

Re: initdb of regression test failed.

From
"Hiroshi Saito"
Date:
Hi.

----- Original Message -----
From: "ITAGAKI Takahiro" <itagaki.takahiro@oss.ntt.co.jp>
>
> "Hiroshi Saito" <z-saito@guitar.ocn.ne.jp> wrote:
>
>> Ah Ok, Please check it.
>
> Your patch looks useful to prevent mismatch of encoding and locale on Windows,
> but I found there is a limitation that user will not able to specify locale.
> I added an alternative of nl_langinfo(CODESET) for Win32.
>
> Please check following commands:
> initdb --encoding=EUC_jp --locale=Japanese_Japan.932
>   vs.
> initdb --encoding=EUC_jp --locale=Japanese_Japan.20932
>
>
> One problem is that user need to know codepage numbers. It might
> be possible to replace the default codepage to server encodings
> automatically if we have a mapping table from encoding to codepage.

Yes, I  think your approach looks very good. Then, It seems that it is necessary
to consider an original initial value problem again. I consider a document publication
or management. Anyway, Thanks.

Regards,
Hiroshi Saito

Re: initdb of regression test failed.

From
Tom Lane
Date:
ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp> writes:
> Your patch looks useful to prevent mismatch of encoding and locale on Windows,
> but I found there is a limitation that user will not able to specify locale.
> I added an alternative of nl_langinfo(CODESET) for Win32.

Applied with small correction --- it looked like you'd put in the wrong
PG_ENC code for GBK and BIG5.  Not terribly important since we'd reject
them anyway, but we might as well reject with the correct error message.

This still leaves the policy decision of whether we want to have
initdb assume "-E UTF8 --no-locale" if it sees the current locale
has an unusable encoding.  I'm not really happy with that idea
because it would disable localization of messages.  I think what we
want, at least on Windows, is to switch to the "corresponding" locale
that uses UTF8.  Is there a simple way to do that?  Or at least some
simple recipe we can put into the documentation?  "If you get this
sort of error, use this --locale setting..."

            regards, tom lane

Re: initdb of regression test failed.

From
"Hiroshi Saito"
Date:
Hi.

regression test surely goes wrong.!

hedule --multibyte=SQL_ASCII --load-language=plpgsql
============== creating temporary installation        ==============
============== initializing database system           ==============

pg_regress: initdb failed
Examine ./log/initdb.log for the reason.
Command was:
""C:/MinGW/home/hiroshi/pgsql/src/test/regress/./tmp_check/install//usr/local/pgsql/bin/initdb"
 -D "C:/MinGW/home/hiroshi/pgsql/src/test/regress/./tmp_check/data" -L
"C:/MinGW/home/hiroshi/pgsql/src/test/regress/./tmp_check/install//usr/local/pgsql/share" --noclean
 > "./log/initdb.log" 2>&1"
make[2]: *** [check] Error 2
make[2]: Leaving directory `/home/hiroshi/pgsql/src/test/regress'
make[1]: *** [check] Error 2
make[1]: Leaving directory `/home/hiroshi/pgsql/src/test'
make: *** [check] Error 2

-initdb.log-
Running in noclean mode.  Mistakes will not be cleaned up.^M
The files belonging to this database system will be owned by user "hiroshi".^M
This user must also own the server process.^M
^M
The database cluster will be initialized with locale Japanese_Japan.932.^M
initdb: locale Japanese_Japan.932 requires unsupported encoding SJIS^M
Encoding SJIS is not allowed as a server-side encoding.^M
Rerun initdb with a different locale selection.^M
Running in noclean mode.  Mistakes will not be cleaned up.^M
-

after the patch..

============== shutting down postmaster               ==============
server stopped

=======================
 All 112 tests passed.
=======================

Anyway, It surely fails now.:-(

Regards,
Hiroshi Saito

Attachment

Re: initdb of regression test failed.

From
"Hiroshi Saito"
Date:
Oops, patch of pg_regress.c should be disregarded.
Sorry,  I think this is desirable.

> Hi.
>
> regression test surely goes wrong.!
>
> hedule --multibyte=SQL_ASCII --load-language=plpgsql
> ============== creating temporary installation        ==============
> ============== initializing database system           ==============
>
> pg_regress: initdb failed
> Examine ./log/initdb.log for the reason.
> Command was:
> ""C:/MinGW/home/hiroshi/pgsql/src/test/regress/./tmp_check/install//usr/local/pgsql/bin/initdb"
> -D "C:/MinGW/home/hiroshi/pgsql/src/test/regress/./tmp_check/data" -L
> "C:/MinGW/home/hiroshi/pgsql/src/test/regress/./tmp_check/install//usr/local/pgsql/share"
> --noclean
> > "./log/initdb.log" 2>&1"
> make[2]: *** [check] Error 2
> make[2]: Leaving directory `/home/hiroshi/pgsql/src/test/regress'
> make[1]: *** [check] Error 2
> make[1]: Leaving directory `/home/hiroshi/pgsql/src/test'
> make: *** [check] Error 2
>
> -initdb.log-
> Running in noclean mode.  Mistakes will not be cleaned up.^M
> The files belonging to this database system will be owned by user "hiroshi".^M
> This user must also own the server process.^M
> ^M
> The database cluster will be initialized with locale Japanese_Japan.932.^M
> initdb: locale Japanese_Japan.932 requires unsupported encoding SJIS^M
> Encoding SJIS is not allowed as a server-side encoding.^M
> Rerun initdb with a different locale selection.^M
> Running in noclean mode.  Mistakes will not be cleaned up.^M
> -
>
> after the patch..
>
> ============== shutting down postmaster               ==============
> server stopped
>
> =======================
> All 112 tests passed.
> =======================
>
> Anyway, It surely fails now.:-(
>
> Regards,
> Hiroshi Saito
>

Attachment

Re: initdb of regression test failed.

From
ITAGAKI Takahiro
Date:
"Hiroshi Saito" <z-saito@guitar.ocn.ne.jp> wrote:

> regression test surely goes wrong.!

This fix does nothing against the regression failure.

It is probably reasonable to choose UTF-8 as a server encoding when we cannot
support the encoding of the current locale. A remaining issue is which we
should use no-locale, locale of another encoding, or reporting error then.

At least on Windows, locale of another encoding works correctly because
we've already had some Windows-specific hacks. (try grep MultiByteToWideChar)
In fact, we can accept options like:
  initdb -E UTF8 --locale=Japanese_Japan.932 -- CP932 is SJIS in nature

I'll suggest to use UTF8 if the encoding is UTF-8 or NOT specified and
we don't support the locale encoding on Windows, i.e. locale is always
enabled on regression tests.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



Re: initdb of regression test failed.

From
ITAGAKI Takahiro
Date:
I wrote:
> I'll suggest to use UTF8 if the encoding is UTF-8 or NOT specified and
> we don't support the locale encoding on Windows, i.e. locale is always
> enabled on regression tests.

Here is a patch to do it on Windows.
  1. Use UTF-8 if the locale encoding is not available for server.
  2. Allow mismatch between server and locale encodings if the server
     encoding is UTF-8.

I succeeded to run regression test on Japanese version of Windows
with the patch, but please test it on other language versions.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


Attachment

Re: initdb of regression test failed.

From
"Hiroshi Saito"
Date:
Hi.

Um, I thinks the examination material of 8.4 by the reason for changing
the feature. Of course, your proposal can be considered to obtain one
solution. Then, discussion is required more.
I feel that it is dangerous for 8.3....

Regards,
Hiroshi Saito

>
> I wrote:
>> I'll suggest to use UTF8 if the encoding is UTF-8 or NOT specified and
>> we don't support the locale encoding on Windows, i.e. locale is always
>> enabled on regression tests.
>
> Here is a patch to do it on Windows.
>  1. Use UTF-8 if the locale encoding is not available for server.
>  2. Allow mismatch between server and locale encodings if the server
>     encoding is UTF-8.
>
> I succeeded to run regression test on Japanese version of Windows
> with the patch, but please test it on other language versions.
>
> Regards,
> ---
> ITAGAKI Takahiro
> NTT Open Source Software Center
>


Re: initdb of regression test failed.

From
Tom Lane
Date:
ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp> writes:
> In fact, we can accept options like:
>   initdb -E UTF8 --locale=Japanese_Japan.932 -- CP932 is SJIS in nature

Hmm, but does that really work safely?  I think varstr_cmp() does work,
because it forces our data into wchar format and then calls wcscoll().
The thing that scares me is that various random other operating-system
calls might deliver strings in an unexpected encoding.  We've been
through similar problems with timezone names reported by strftime, for
example.

            regards, tom lane

Re: initdb of regression test failed.

From
"Hiroshi Saito"
Date:
Hi Tom-san.

This may be mere information...

In 8.3, when it has different encoding for every database, a locale requires C.
Therefore, I am the reason which desires C by regression test.

--
in>initdb -E EUC_JP -D../data --locale=Japanese_Japan.20932

The files belonging to this database system will be owned by user "hiroshi".
This user must also own the server process.

The database cluster will be initialized with locale Japanese_Japan.20932.
initdb: could not find suitable text search configuration for locale "Japanese_J
apan.20932"
The default text search configuration will be set to "simple".

creating directory ../data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers/max_fsm_pages ... 32MB/204800
creating configuration files ... ok
creating template1 database in ../data/base/1 ... ok
initializing pg_authid ... ok
initializing dependencies ... ok
creating system views ... ok
loading system objects' descriptions ... ok
creating conversions ... ok
creating dictionaries ... ok
setting privileges on built-in objects ... ok
creating information schema ... ok
vacuuming database template1 ... ok
copying template1 to template0 ... ok
copying template1 to postgres ... ok

WARNING: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the -A option the
next time you run initdb.

Success. You can now start the database server using:

--
in>psql template1
Welcome to psql 8.3devel, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms
       \h for help with SQL commands
       \? for help with psql commands
       \g or terminate with semicolon to execute query
       \q to quit

template1=# \l
       List of databases
   Name    |  Owner  | Encoding
-----------+---------+----------
 postgres  | hiroshi | EUC_JP
 template0 | hiroshi | EUC_JP
 template1 | hiroshi | EUC_JP
(3 rows)

template1=# create database hiroshi;
CREATE DATABASE
template1=# \l
       List of databases
   Name    |  Owner  | Encoding
-----------+---------+----------
 hiroshi   | hiroshi | EUC_JP
 postgres  | hiroshi | EUC_JP
 template0 | hiroshi | EUC_JP
 template1 | hiroshi | EUC_JP
(4 rows)

template1=# show LC_CTYPE;
       lc_ctype
----------------------
 Japanese_Japan.20932
(1 row)

template1=# create database utfdb encoding='UTF8';
ERROR:  encoding UTF8 does not match server's locale Japanese_Japan.20932
DETAIL:  The server's LC_CTYPE setting requires encoding EUC_JP.
template1=#

Re: initdb of regression test failed.

From
ITAGAKI Takahiro
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> >   initdb -E UTF8 --locale=Japanese_Japan.932 -- CP932 is SJIS in nature
>
> Hmm, but does that really work safely?  I think varstr_cmp() does work,
> because it forces our data into wchar format and then calls wcscoll().
> The thing that scares me is that various random other operating-system
> calls might deliver strings in an unexpected encoding.  We've been
> through similar problems with timezone names reported by strftime, for
> example.

Hmm, I see we might need to replace all locale-aware functions to
wchar_t versions, for example, wcsftime instead of strftime.
It requires more tests. It should be saved for 8.4.

The attached is the second plan. It uses UTF-8 and locale=C when
the default locale encoding is not supported and none of encoding and
locale are passed to initdb. It would help users who use the default
settings (including regression test).

At the moment, it reset all of lc_* variables, but it might be possible
use the default locale at lc_messages, lc_monetary, lc_numeric and lc_time
even if lc_collate and lc_ctype are reset to C.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


Attachment

Re: initdb of regression test failed.

From
Tom Lane
Date:
ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp> writes:
> The attached is the second plan. It uses UTF-8 and locale=C when
> the default locale encoding is not supported and none of encoding and
> locale are passed to initdb. It would help users who use the default
> settings (including regression test).

I'm not very happy with this proposal, because for people who don't
actually care about non-ASCII data (which is still a lot of people),
forcing UTF-8 as the default encoding will impose pretty substantial
overhead compared to SQL_ASCII --- it turns on all those
multibyte-encoding checks.

Implicitly selecting --no-locale doesn't seem like a big step forward
either, since then you've just given up whatever you might have learned
from the locale setting.  Besides, if that's the behavior the user
wants, he can specify it.

I still think that what we should try to do in the default case is find
a locale that is the same language but UTF-8 encoding.

> At the moment, it reset all of lc_* variables, but it might be possible
> use the default locale at lc_messages, lc_monetary, lc_numeric and lc_time
> even if lc_collate and lc_ctype are reset to C.

Well, that just leaves me wondering what encoding the localized messages
would be presented in ...

            regards, tom lane