Thread: Re: [COMMITTERS] pgsql: Re-allow UTF8 encodings on win32.

Re: [COMMITTERS] pgsql: Re-allow UTF8 encodings on win32.

From
Tom Lane
Date:
mha@postgresql.org (Magnus Hagander) writes:
> Re-allow UTF8 encodings on win32. Since UTF8 is converted to 
> UTF16 before being used, all (valid) locales will work for this.

So where do we stand on the Windows locale/encoding business --- are
we happy with the behavior now, or does it still need work?
        regards, tom lane


Re: [COMMITTERS] pgsql: Re-allow UTF8 encodings on win32.

From
Magnus Hagander
Date:
Tom Lane wrote:
> mha@postgresql.org (Magnus Hagander) writes:
>> Re-allow UTF8 encodings on win32. Since UTF8 is converted to 
>> UTF16 before being used, all (valid) locales will work for this.
> 
> So where do we stand on the Windows locale/encoding business --- are
> we happy with the behavior now, or does it still need work?

I think we're good. But I'd like to hear some verification from somebody
else. Specifically, I'd like to hear a signoff from someone who can
actually do "real tests" on a locale that's not US and not Swedish.
Also, I'd like to hear from the Japanese people (Hiroshi? Can you do
this?) that we didn't break it for them. I don't think we did, but I
want to be sure :)

Hiroshi, and whomever else can help to test, this is only testing the
backend, not the installer. The installer may need a few minor tweaks
still once the backend is considered fixed. And what needs to be tested
is CVS HEAD as of today.

//Magnus


Re: [COMMITTERS] pgsql: Re-allow UTF8 encodings on win32.

From
"Hiroshi Saito"
Date:
Hi.

Um, It seems that it only passed the strict check of chklocale.c. Probably, It may 
enable mistaken selection...However, I will clarify a problem by the test.

Regards,
Hiroshi Saito

From: "Magnus Hagander" <magnus@hagander.net>


> Tom Lane wrote:
>> mha@postgresql.org (Magnus Hagander) writes:
>>> Re-allow UTF8 encodings on win32. Since UTF8 is converted to 
>>> UTF16 before being used, all (valid) locales will work for this.
>> 
>> So where do we stand on the Windows locale/encoding business --- are
>> we happy with the behavior now, or does it still need work?
> 
> I think we're good. But I'd like to hear some verification from somebody
> else. Specifically, I'd like to hear a signoff from someone who can
> actually do "real tests" on a locale that's not US and not Swedish.
> Also, I'd like to hear from the Japanese people (Hiroshi? Can you do
> this?) that we didn't break it for them. I don't think we did, but I
> want to be sure :)
> 
> Hiroshi, and whomever else can help to test, this is only testing the
> backend, not the installer. The installer may need a few minor tweaks
> still once the backend is considered fixed. And what needs to be tested
> is CVS HEAD as of today.
> 
> //Magnus


Re: [COMMITTERS] pgsql: Re-allow UTF8 encodings on win32.

From
"Pavel Stehule"
Date:
2007/10/16, Magnus Hagander <magnus@hagander.net>:
> Tom Lane wrote:
> > mha@postgresql.org (Magnus Hagander) writes:
> >> Re-allow UTF8 encodings on win32. Since UTF8 is converted to
> >> UTF16 before being used, all (valid) locales will work for this.
> >
> > So where do we stand on the Windows locale/encoding business --- are
> > we happy with the behavior now, or does it still need work?
>
> I think we're good. But I'd like to hear some verification from somebody
> else. Specifically, I'd like to hear a signoff from someone who can
> actually do "real tests" on a locale that's not US and not Swedish.
> Also, I'd like to hear from the Japanese people (Hiroshi? Can you do
> this?) that we didn't break it for them. I don't think we did, but I
> want to be sure :)
>
> Hiroshi, and whomever else can help to test, this is only testing the
> backend, not the installer. The installer may need a few minor tweaks
> still once the backend is considered fixed. And what needs to be tested
> is CVS HEAD as of today.
>
> //Magnus
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match
>

I can test it with czech locale. Can I download binaries anywhere?

Pavel


Re: [COMMITTERS] pgsql: Re-allow UTF8 encodings on win32.

From
"Hiroshi Saito"
Date:
Hi.

> I can test it with czech locale. Can I download binaries anywhere?
http://winpg.jp/~saito/pg83/postgresql-8.3beta-cvs.tgz
It is a thing after regression test.(MinGW+gcc)

Regards,
Hiroshi Saito


Re: [COMMITTERS] pgsql: Re-allow UTF8 encodings on win32.

From
"Hiroshi Saito"
Date:
Hi.

> Um, It seems that it only passed the strict check of chklocale.c. Probably, It may 
> enable mistaken selection...However, I will clarify a problem by the test.

First, it is one problem....
http://winpg.jp/~saito/pg83/pg83b1-err.txt

And a test continues....


Re: [COMMITTERS] pgsql: Re-allow UTF8 encodings on win32.

From
"Hiroshi Saito"
Date:
Hi.

Second, it is big problem....
http://winpg.jp/~saito/pg83/pg83b1-err2.txt
It is text serch config error.
However, It passes initdb.(locale=Japanese_Japan.932 ... This is ShiftJIS locale)

And a test continues....

Regards,
Hiroshi Saito


Re: [COMMITTERS] pgsql: Re-allow UTF8 encodings on win32.

From
Magnus Hagander
Date:
Hiroshi Saito wrote:
> Hi.
> 
>> Um, It seems that it only passed the strict check of chklocale.c.
>> Probably, It may enable mistaken selection...However, I will clarify a
>> problem by the test.
> 
> First, it is one problem....
> http://winpg.jp/~saito/pg83/pg83b1-err.txt
> 
> And a test continues....

But SJIS isn't supposed to work, no?

//Magnus


Re: [COMMITTERS] pgsql: Re-allow UTF8 encodings on win32.

From
Magnus Hagander
Date:
Hiroshi Saito wrote:
> Hi.
> 
> Second, it is big problem....
> http://winpg.jp/~saito/pg83/pg83b1-err2.txt
> It is text serch config error.
> However, It passes initdb.(locale=Japanese_Japan.932 ... This is
> ShiftJIS locale)
> 
> And a test continues....

What text search config would you expect?

//Magnus


Re: [COMMITTERS] pgsql: Re-allow UTF8 encodings on win32.

From
"Hiroshi Saito"
Date:
Hi.

> Hiroshi Saito wrote:
>> Hi.
>> 
>> Second, it is big problem....
>> http://winpg.jp/~saito/pg83/pg83b1-err2.txt
>> It is text serch config error.
>> However, It passes initdb.(locale=Japanese_Japan.932 ... This is
>> ShiftJIS locale)
>> 
>> And a test continues....
> 
> What text search config would you expect?

This problem here is that locale of initdb passes Japanese_Japan.932.

Regards,
Hiroshi Saito


Re: [COMMITTERS] pgsql: Re-allow UTF8 encodings on win32.

From
Dave Page
Date:
Hiroshi Saito wrote:
> Hi.
> 
> Second, it is big problem....
> http://winpg.jp/~saito/pg83/pg83b1-err2.txt
> It is text serch config error.
> However, It passes initdb.(locale=Japanese_Japan.932 ... This is
> ShiftJIS locale)
> 
> And a test continues....

The changes that were made were only to re-enable UTF-8.

SJIS wasn't ever supported as a server encoding
(http://www.postgresql.org/docs/8.2/interactive/multibyte.html). The
fact that initdb continues if you use Japanese_Japan.932 is an
inconsistency I reported previously but has yet to be fixed.

/D


Re: [COMMITTERS] pgsql: Re-allow UTF8 encodings on win32.

From
"Hiroshi Saito"
Date:
From: "Dave Page" <dpage@postgresql.org>

> Hiroshi Saito wrote:
>> Hi.
>> 
>> Second, it is big problem....
>> http://winpg.jp/~saito/pg83/pg83b1-err2.txt
>> It is text serch config error.
>> However, It passes initdb.(locale=Japanese_Japan.932 ... This is
>> ShiftJIS locale)
>> 
>> And a test continues....
> 
> The changes that were made were only to re-enable UTF-8.

Yes, Please see,
http://winpg.jp/~saito/pg83/pg83b1-err2.txt
Is that initdb is successful a problem as for this? 

> 
> SJIS wasn't ever supported as a server encoding
> (http://www.postgresql.org/docs/8.2/interactive/multibyte.html). The
> fact that initdb continues if you use Japanese_Japan.932 is an
> inconsistency I reported previously but has yet to be fixed.

Yes, However, Encoding and locale are not equivalent.

Regards,
Hiroshi Saito


Re: [COMMITTERS] pgsql: Re-allow UTF8 encodings on win32.

From
Dave Page
Date:
Hiroshi Saito wrote:
> From: "Dave Page" <dpage@postgresql.org>
> 
>> Hiroshi Saito wrote:
>>> Hi.
>>>
>>> Second, it is big problem....
>>> http://winpg.jp/~saito/pg83/pg83b1-err2.txt
>>> It is text serch config error.
>>> However, It passes initdb.(locale=Japanese_Japan.932 ... This is
>>> ShiftJIS locale)
>>>
>>> And a test continues....
>>
>> The changes that were made were only to re-enable UTF-8.
> 
> Yes, Please see,
> http://winpg.jp/~saito/pg83/pg83b1-err2.txt
> Is that initdb is successful a problem as for this?

Oh, sorry - misread that. I chatted with Magnus about that. It is
correct, but misleading. pg_control will say Japanese_Japan.932 as well
iirc, even though it is really Japanese_Japan.65001.

Regards, Dave



Re: [COMMITTERS] pgsql: Re-allow UTF8 encodings on win32.

From
Magnus Hagander
Date:
Dave Page wrote:
> Hiroshi Saito wrote:
>> Hi.
>>
>> Second, it is big problem....
>> http://winpg.jp/~saito/pg83/pg83b1-err2.txt
>> It is text serch config error.
>> However, It passes initdb.(locale=Japanese_Japan.932 ... This is
>> ShiftJIS locale)
>>
>> And a test continues....
> 
> The changes that were made were only to re-enable UTF-8.
> 
> SJIS wasn't ever supported as a server encoding
> (http://www.postgresql.org/docs/8.2/interactive/multibyte.html). The
> fact that initdb continues if you use Japanese_Japan.932 is an
> inconsistency I reported previously but has yet to be fixed.

That is a good point, if unrelated to this very discussion. Do we want
to change that thing to an exit instead of complain-and-continue? I
think yes?

//Magnus


Re: [COMMITTERS] pgsql: Re-allow UTF8 encodings on win32.

From
Magnus Hagander
Date:
Dave Page wrote:
> Hiroshi Saito wrote:
>> From: "Dave Page" <dpage@postgresql.org>
>>
>>> Hiroshi Saito wrote:
>>>> Hi.
>>>>
>>>> Second, it is big problem....
>>>> http://winpg.jp/~saito/pg83/pg83b1-err2.txt
>>>> It is text serch config error.
>>>> However, It passes initdb.(locale=Japanese_Japan.932 ... This is
>>>> ShiftJIS locale)
>>>>
>>>> And a test continues....
>>> The changes that were made were only to re-enable UTF-8.
>> Yes, Please see,
>> http://winpg.jp/~saito/pg83/pg83b1-err2.txt
>> Is that initdb is successful a problem as for this?
> 
> Oh, sorry - misread that. I chatted with Magnus about that. It is
> correct, but misleading. pg_control will say Japanese_Japan.932 as well
> iirc, even though it is really Japanese_Japan.65001.

Not so. The locale is Japanese_Japan, really. That's the only part
that's relevant for UTF16 encodings, which is what we use to do UTF8. We
specifically *don't* try to use Japanese_Japan.65001.

//Magnus


Re: [COMMITTERS] pgsql: Re-allow UTF8 encodings on win32.

From
"Hiroshi Saito"
Date:
Hi.

From: "Dave Page" <dpage@postgresql.org>
>> Yes, Please see,
>> http://winpg.jp/~saito/pg83/pg83b1-err2.txt
>> Is that initdb is successful a problem as for this?
> 
> Oh, sorry - misread that. I chatted with Magnus about that. It is
> correct, but misleading. pg_control will say Japanese_Japan.932 as well
> iirc, even though it is really Japanese_Japan.65001.

But, Please see.
http://winpg.jp/~saito/pg83/pg83b1-err3.txt
Japanese_Japan.65001 is error...
Japanese_Japan is true.

Regards,
Hiroshi Saito


Re: [COMMITTERS] pgsql: Re-allow UTF8 encodings on win32.

From
Dave Page
Date:
Magnus Hagander wrote:
> Not so. The locale is Japanese_Japan, really. That's the only part
> that's relevant for UTF16 encodings, which is what we use to do UTF8. We
> specifically *don't* try to use Japanese_Japan.65001.

Thats not what I mean. From a *usability* perspective, Hiroshi should
see Japanese_Japan.65001 because he's selected UTF-8 in Japanese_Japan.
He shouldn't see Japanese_Japan.932 because that definitely isn't what
he selected.

/D


Re: [COMMITTERS] pgsql: Re-allow UTF8 encodings on win32.

From
Magnus Hagander
Date:
Hiroshi Saito wrote:
> Hi.
> 
> From: "Dave Page" <dpage@postgresql.org>
>>> Yes, Please see,
>>> http://winpg.jp/~saito/pg83/pg83b1-err2.txt
>>> Is that initdb is successful a problem as for this?
>>
>> Oh, sorry - misread that. I chatted with Magnus about that. It is
>> correct, but misleading. pg_control will say Japanese_Japan.932 as well
>> iirc, even though it is really Japanese_Japan.65001.
> 
> But, Please see.
> http://winpg.jp/~saito/pg83/pg83b1-err3.txt
> Japanese_Japan.65001 is error...
> Japanese_Japan is true.

Yes, that is expected. If you explicitly ask for the .65001 locale it
will try the one that doesn't have the proper NLS files, and that
shouldn't work. If you just put in Japanese_Japan, it will use the UTF16
locale.

//Magnus


Re: [COMMITTERS] pgsql: Re-allow UTF8 encodings on win32.

From
"Hiroshi Saito"
Date:
> But, Please see.
> http://winpg.jp/~saito/pg83/pg83b1-err3.txt
> Japanese_Japan.65001 is error...
> Japanese_Japan is true.

However, The test of this state is continued.
But but but, Sorry, I face to a bed...

Regards,
Hiroshi Saito


Re: [COMMITTERS] pgsql: Re-allow UTF8 encodings on win32.

From
Magnus Hagander
Date:
Dave Page wrote:
> Magnus Hagander wrote:
>> Not so. The locale is Japanese_Japan, really. That's the only part
>> that's relevant for UTF16 encodings, which is what we use to do UTF8. We
>> specifically *don't* try to use Japanese_Japan.65001.
> 
> Thats not what I mean. From a *usability* perspective, Hiroshi should
> see Japanese_Japan.65001 because he's selected UTF-8 in Japanese_Japan.
> He shouldn't see Japanese_Japan.932 because that definitely isn't what
> he selected.

I'l grant you that from a usbility perspective, he should see
Japanese_Japan. Not the .65001 part, though.

//Magnus


Re: [COMMITTERS] pgsql: Re-allow UTF8 encodings on win32.

From
Dave Page
Date:
Hiroshi Saito wrote:
> Hi.
> 
> From: "Dave Page" <dpage@postgresql.org>
>>> Yes, Please see,
>>> http://winpg.jp/~saito/pg83/pg83b1-err2.txt
>>> Is that initdb is successful a problem as for this?
>>
>> Oh, sorry - misread that. I chatted with Magnus about that. It is
>> correct, but misleading. pg_control will say Japanese_Japan.932 as well
>> iirc, even though it is really Japanese_Japan.65001.
> 
> But, Please see.
> http://winpg.jp/~saito/pg83/pg83b1-err3.txt
> Japanese_Japan.65001 is error...
> Japanese_Japan is true.

Yes, we're faking utf-8 support using utf-16. Specifying it as you have
there bypasses the workaround and tries to use the 65001 codepage which
then fails because LC_CTYPE cannot be set to .65001 in any locale.

/D



Re: [COMMITTERS] pgsql: Re-allow UTF8 encodings on win32.

From
Dave Page
Date:
Magnus Hagander wrote:
> Dave Page wrote:
>> Magnus Hagander wrote:
>>> Not so. The locale is Japanese_Japan, really. That's the only part
>>> that's relevant for UTF16 encodings, which is what we use to do UTF8. We
>>> specifically *don't* try to use Japanese_Japan.65001.
>> Thats not what I mean. From a *usability* perspective, Hiroshi should
>> see Japanese_Japan.65001 because he's selected UTF-8 in Japanese_Japan.
>> He shouldn't see Japanese_Japan.932 because that definitely isn't what
>> he selected.
> 
> I'l grant you that from a usbility perspective, he should see
> Japanese_Japan. Not the .65001 part, though.

Well, that depends on whether we care that we're actually faking the
utf-8 support and/or we want to keep the message consistent with what
you'd see in other locales.

/D


Re: [COMMITTERS] pgsql: Re-allow UTF8 encodings on win32.

From
"Pavel Stehule"
Date:
2007/10/16, Hiroshi Saito <z-saito@guitar.ocn.ne.jp>:
> Hi.
>
> > I can test it with czech locale. Can I download binaries anywhere?
> http://winpg.jp/~saito/pg83/postgresql-8.3beta-cvs.tgz
> It is a thing after regression test.(MinGW+gcc)
>

I have problem, there isn't libintl-2.dll

Pavel


Re: [COMMITTERS] pgsql: Re-allow UTF8 encodings on win32.

From
Tom Lane
Date:
Magnus Hagander <magnus@hagander.net> writes:
> Dave Page wrote:
>> SJIS wasn't ever supported as a server encoding
>> (http://www.postgresql.org/docs/8.2/interactive/multibyte.html). The
>> fact that initdb continues if you use Japanese_Japan.932 is an
>> inconsistency I reported previously but has yet to be fixed.

> That is a good point, if unrelated to this very discussion. Do we want
> to change that thing to an exit instead of complain-and-continue? I
> think yes?

Yeah, I thought we'd agreed to that a few days ago.
        regards, tom lane


Re: [COMMITTERS] pgsql: Re-allow UTF8 encodings on win32.

From
"Hiroshi Saito"
Date:
Hi.

From: "Pavel Stehule" <pavel.stehule@gmail.com>
>> > I can test it with czech locale. Can I download binaries anywhere?
>> http://winpg.jp/~saito/pg83/postgresql-8.3beta-cvs.tgz
>> It is a thing after regression test.(MinGW+gcc)
>>
> 
> I have problem, there isn't libintl-2.dll

Ooops, sorry, it is full-build.
Please, this is minimum composition
http://winpg.jp/~saito/pg83/postgresql-8.3beta-cvs-minbin.tgz
Thanks.

Regards,
Hiroshi Saito



Re: [COMMITTERS] pgsql: Re-allow UTF8 encodings on win32.

From
"Hiroshi Saito"
Date:
Hi.

From: "Magnus Hagander" <magnus@hagander.net>
>> But, Please see.
>> http://winpg.jp/~saito/pg83/pg83b1-err3.txt
>> Japanese_Japan.65001 is error...
>> Japanese_Japan is true.
> 
> Yes, that is expected. If you explicitly ask for the .65001 locale it
> will try the one that doesn't have the proper NLS files, and that
> shouldn't work. If you just put in Japanese_Japan, it will use the UTF16
> locale.

Umm, As for result ... 
initdb -E UTF8 --locale=Japanese_Japan -D../data
http://winpg.jp/~saito/pg83/pg83b1-err4.txt
It seems that it is only complemented.

Regards,
Hiroshi Saito


Re: [COMMITTERS] pgsql: Re-allow UTF8 encodings on win32.

From
Dave Page
Date:
Hiroshi Saito wrote:
> Hi.
> 
> From: "Magnus Hagander" <magnus@hagander.net>
> 
>>> But, Please see.
>>> http://winpg.jp/~saito/pg83/pg83b1-err3.txt
>>> Japanese_Japan.65001 is error...
>>> Japanese_Japan is true.
>>
>> Yes, that is expected. If you explicitly ask for the .65001 locale it
>> will try the one that doesn't have the proper NLS files, and that
>> shouldn't work. If you just put in Japanese_Japan, it will use the UTF16
>> locale.
> 
> Umm, As for result ... initdb -E UTF8 --locale=Japanese_Japan -D../data
> http://winpg.jp/~saito/pg83/pg83b1-err4.txt
> It seems that it is only complemented.

Yes, that is expected, though not entirely to my tastes. The cluster
should still actually be in utf-8 however.

/D


Re: [COMMITTERS] pgsql: Re-allow UTF8 encodings on win32.

From
"Pavel Stehule"
Date:
I did some test, but without success,

Pavel

I have win2003 Server .. with czech locales support.

I:\PGSQL\BIN>initdb -D ../data -L i:\pgsql\share

The files belonging to this database system will be owned by user "postgres".

This user must also own the server process.



The database cluster will be initialized with locale Czech_Czech Republic.1250.

could not determine encoding for locale "Czech_Czech Republic.1250": codeset is

"CP1250"

INITDB: could not find suitable encoding for locale Czech_Czech Republic.1250

Rerun INITDB with the -E option.

Try "INITDB --help" for more information.



I:\PGSQL\BIN>





I:\PGSQL\BIN>initdb -E UTF-8 -D ../data -L i:\pgsql\share

The files belonging to this database system will be owned by user "postgres".

This user must also own the server process.



The database cluster will be initialized with locale Czech_Czech Republic.1250.

could not determine encoding for locale "Czech_Czech Republic.1250": codeset is

"CP1250"

INITDB: could not find suitable text search configuration for locale Czech_Czech
Republic.1250

The default text search configuration will be set to "simple".



fixing permissions on existing directory ../data ... ok

creating subdirectories ... ok

selecting default max_connections ... 10

selecting default shared_buffers/max_fsm_pages ... 400kB/20000

creating configuration files ... ok

creating template1 database in ../data/base/1 ... FATAL:  could not select a sui

table default timezone

DETAIL:  It appears that your GMT time zone uses leap seconds. PostgreSQL does n

ot support leap seconds.

child process exited with exit code 1

INITDB: removing contents of data directory "../data"





I:\PGSQL\BIN>initdb -E win1250 --locale="Czech_Czech Republic.1250" -D ../data -

L i:\pgsql\share

The files belonging to this database system will be owned by user "postgres".

This user must also own the server process.



The database cluster will be initialized with locale Czech_Czech Republic.1250.

could not determine encoding for locale "Czech_Czech Republic.1250": codeset is

"CP1250"

INITDB: could not find suitable text search configuration for locale Czech_Czech
Republic.1250

The default text search configuration will be set to "simple".



fixing permissions on existing directory ../data ... ok

creating subdirectories ... ok

selecting default max_connections ... 10

selecting default shared_buffers/max_fsm_pages ... 400kB/20000

creating configuration files ... ok

creating template1 database in ../data/base/1 ... FATAL:  could not select a sui

table default timezone

DETAIL:  It appears that your GMT time zone uses leap seconds. PostgreSQL does n

ot support leap seconds.

child process exited with exit code 1

INITDB: removing contents of data directory "../data"


Re: [COMMITTERS] pgsql: Re-allow UTF8 encodings on win32.

From
Tom Lane
Date:
"Pavel Stehule" <pavel.stehule@gmail.com> writes:
> could not determine encoding for locale "Czech_Czech Republic.1250": codeset is

> "CP1250"

Hm, we seem to have missed an entry for PG_WIN1250.  Fixed.
        regards, tom lane