Thread: Question for Postgres 8.3

Question for Postgres 8.3

From
Clemens Schwaighofer
Date:
hi,

on the page: <http://www.postgresql.org/docs/8.3/static/release-8-3.html>

in point E.1.2.3:

* Disallow database encodings that are inconsistent with the server's
locale setting (Tom)

does this mean, if my server LOCALE is for example UTF-8.en_US, and I want
to create a EUC_JP database it gets rejected? do I missunderstand that?

Normaly my servers have default locale set to UTF-8.en_US but also have the
locales for UTF-8.ja_JP and EUC_JP there, 99.9% of my databases are utf-8,
but I have some clients that created EUC_JP databases, will the upgrade
affect this?


[ Clemens Schwaighofer                      -----=====:::::~ ]
[ IT Engineer/Manager, TEQUILA\ Japan IT Group               ]
[                6-17-2 Ginza Chuo-ku, Tokyo 104-8167, JAPAN ]
[ Tel: +81-(0)3-3545-7703            Fax: +81-(0)3-3545-7343 ]
[ http://www.tequila.co.jp                                   ]

Re: Question for Postgres 8.3

From
Tom Lane
Date:
Clemens Schwaighofer <cs@tequila.co.jp> writes:
> * Disallow database encodings that are inconsistent with the server's
> locale setting (Tom)

> does this mean, if my server LOCALE is for example UTF-8.en_US, and I want
> to create a EUC_JP database it gets rejected? do I missunderstand that?

Nope, you have it correctly.

> Normaly my servers have default locale set to UTF-8.en_US but also have the
> locales for UTF-8.ja_JP and EUC_JP there, 99.9% of my databases are utf-8,
> but I have some clients that created EUC_JP databases, will the upgrade
> affect this?

I'm surprised your clients haven't been screaming about bogus sorting
and upper/lowercasing behavior.

If you want to support multiple encodings, the only safe locale choice
is (and always has been) C.  If you doubt this, troll the archives for
awhile --- for example, searching for locale+encoding in pgsql-bugs
should provide plenty of amusing reading matter.  8.3 is just refusing
to do things that are known to be unsafe in previous releases.

            regards, tom lane

Re: Question for Postgres 8.3

From
Clemens Schwaighofer
Date:
On 02/05/2008 11:38 AM, Tom Lane wrote:
> Clemens Schwaighofer <cs@tequila.co.jp> writes:
>> * Disallow database encodings that are inconsistent with the server's
>> locale setting (Tom)
>
>> does this mean, if my server LOCALE is for example UTF-8.en_US, and I want
>> to create a EUC_JP database it gets rejected? do I missunderstand that?
>
> Nope, you have it correctly.

okay, good, as I finally have a reason to change this then on those view
databases.

>> Normaly my servers have default locale set to UTF-8.en_US but also have the
>> locales for UTF-8.ja_JP and EUC_JP there, 99.9% of my databases are utf-8,
>> but I have some clients that created EUC_JP databases, will the upgrade
>> affect this?
>
> I'm surprised your clients haven't been screaming about bogus sorting
> and upper/lowercasing behavior.

well, originally they were on a server that is also in EUC_JP, but they
need to move away to one of my servers that is pure UTF-8, and I cannot
change this setting because of my other databases, so probably better to
convert them over to UTF-8 and adept the scripts to convert to the
target encoding, rather than going on with the same mess.

> If you want to support multiple encodings, the only safe locale choice
> is (and always has been) C.  If you doubt this, troll the archives for
> awhile --- for example, searching for locale+encoding in pgsql-bugs
> should provide plenty of amusing reading matter.  8.3 is just refusing
> to do things that are known to be unsafe in previous releases.
>
>             regards, tom lane


--
[ Clemens Schwaighofer                      -----=====:::::~ ]
[ IT Engineer/Manager, TEQUILA\ Japan IT Group               ]
[                6-17-2 Ginza Chuo-ku, Tokyo 104-8167, JAPAN ]
[ Tel: +81-(0)3-3545-7703            Fax: +81-(0)3-3545-7343 ]
[ http://www.tequila.co.jp                                   ]

Re: Question for Postgres 8.3

From
"Hiroshi Saito"
Date:
Hi.

----- Original Message -----
From: "Clemens Schwaighofer" <cs@tequila.co.jp>


> hi,
>
> on the page: <http://www.postgresql.org/docs/8.3/static/release-8-3.html>
>
> in point E.1.2.3:
>
> * Disallow database encodings that are inconsistent with the server's
> locale setting (Tom)
>
> does this mean, if my server LOCALE is for example UTF-8.en_US, and I want
> to create a EUC_JP database it gets rejected? do I missunderstand that?
>
> Normaly my servers have default locale set to UTF-8.en_US but also have the
> locales for UTF-8.ja_JP and EUC_JP there, 99.9% of my databases are utf-8,
> but I have some clients that created EUC_JP databases, will the upgrade
> affect this?

When using server encoding for EUC_JP and UTF-8 simultaneously, it is not allowed
except LOCALE=C. Moreover, it can also use SQL_ASCII satisfactory natural.
please initdb --no-locale.

Regards,
Hiroshi Saito


Re: Question for Postgres 8.3

From
David Wall
Date:
> If you want to support multiple encodings, the only safe locale choice
> is (and always has been) C.  If you doubt this, troll the archives for
> awhile --- for example, searching for locale+encoding in pgsql-bugs
> should provide plenty of amusing reading matter.
This is most interesting.  I think some of use UTF-8 under the
impression that it would support unicode/java (and thus US-ascii)
easily, and of course then allow for foreign language encodings when we
need to internationalize.  Thank goodness we only plan to I18N for a
decade and never got around to it!

David

Re: Question for Postgres 8.3

From
"Hiroshi Saito"
Date:
> I'm surprised your clients haven't been screaming about bogus sorting
> and upper/lowercasing behavior.

Umm, This is a mere information. (Since you dislike, this may be a noise.?)

As for upper/lower.
It does not change, even if uses which locale in us Japanese.
(there is nothing with the alphabet)
Then, In spite of using a small letter and a capital letter properly clearly.

As for sorting.
It has a difference, if sorting of a dictionary is desired. However, each is the not
Japanese order of the JIS X 4061:1996(Collation of Japanese character string),
probably. but, I think that LOCAL=C is clear as a sorting of a dictionary.
It seems that selection is good by the situation well.

See, Although you may be unable to see (However, this is Japanes shift-jis)
http://winpg.jp/~saito/pg_bug/Japanese_LowerUpper_sort.txt

Regards,
Hiroshi Saito





Re: Question for Postgres 8.3

From
"Hiroshi Saito"
Date:
Hi.

This reply is very glad for me!:-)
However, There is still a locale problem.
Anyway, thanks!!!!

Regards,
Hiroshi Saito

----- Original Message -----
From: "Clemens Schwaighofer" <cs@tequila.co.jp>

>>> I'm surprised your clients haven't been screaming about bogus sorting
>>> and upper/lowercasing behavior.
>>
>> Umm, This is a mere information. (Since you dislike, this may be a noise.?)
>>
>> As for upper/lower.
>> It does not change, even if uses which locale in us Japanese. (there is
>> nothing with the alphabet) Then, In spite of using a small letter and a
>> capital letter properly clearly.
>>
>> As for sorting.
>> It has a difference, if sorting of a dictionary is desired. However,
>> each is the not
>> Japanese order of the JIS X 4061:1996(Collation of Japanese character
>> string), probably. but, I think that LOCAL=C is clear as a sorting of a
>> dictionary. It seems that selection is good by the situation well.
>> See, Although you may be unable to see (However, this is Japanes shift-jis)
>> http://winpg.jp/~saito/pg_bug/Japanese_LowerUpper_sort.txt
>
> もしかしたらLinux LOCALEはUTF-8.en_USとPostgresのLOCALEはCその日本語の
> ソートは正しくない?僕はいつでもLOCAL=Cある、と今テストしました。ソート
> は正しいです。
>
> ありがとうございます。
>
> --
> [ Clemens Schwaighofer                      -----=====:::::~ ]
> [ IT Engineer/Manager, TEQUILA\ Japan IT Group               ]
> [                6-17-2 Ginza Chuo-ku, Tokyo 104-8167, JAPAN ]
> [ Tel: +81-(0)3-3545-7703            Fax: +81-(0)3-3545-7343 ]
> [ http://www.tequila.co.jp                                   ]
>
>


Re: Question for Postgres 8.3

From
Clemens Schwaighofer
Date:
On 02/05/2008 03:38 PM, Hiroshi Saito wrote:
> Hi.
>
> This reply is very glad for me!:-)
> However, There is still a locale problem.

If I use an EUC database on an UTF8 system? Well I will try out and see
if it breaks everything.

> Anyway, thanks!!!!
>
> Regards,
> Hiroshi Saito
>
> ----- Original Message ----- From: "Clemens Schwaighofer"
> <cs@tequila.co.jp>
>
>>>> I'm surprised your clients haven't been screaming about bogus sorting
>>>> and upper/lowercasing behavior.
>>>
>>> Umm, This is a mere information. (Since you dislike, this may be a
>>> noise.?)
>>>
>>> As for upper/lower.
>>> It does not change, even if uses which locale in us Japanese. (there is
>>> nothing with the alphabet) Then, In spite of using a small letter and a
>>> capital letter properly clearly.
>>>
>>> As for sorting.
>>> It has a difference, if sorting of a dictionary is desired. However,
>>> each is the not
>>> Japanese order of the JIS X 4061:1996(Collation of Japanese character
>>> string), probably. but, I think that LOCAL=C is clear as a sorting of a
>>> dictionary. It seems that selection is good by the situation well.
>>> See, Although you may be unable to see (However, this is Japanes
>>> shift-jis)
>>> http://winpg.jp/~saito/pg_bug/Japanese_LowerUpper_sort.txt
>>
>> もしかしたらLinux LOCALEはUTF-8.en_USとPostgresのLOCALEはCその日本語の
>> ソートは正しくない?僕はいつでもLOCAL=Cある、と今テストしました。ソート
>> は正しいです。
>>
>> ありがとうございます。
>>
>> --
>> [ Clemens Schwaighofer                      -----=====:::::~ ]
>> [ IT Engineer/Manager, TEQUILA\ Japan IT Group               ]
>> [                6-17-2 Ginza Chuo-ku, Tokyo 104-8167, JAPAN ]
>> [ Tel: +81-(0)3-3545-7703            Fax: +81-(0)3-3545-7343 ]
>> [ http://www.tequila.co.jp                                   ]
>>
>>
>


--
[ Clemens Schwaighofer                      -----=====:::::~ ]
[ IT Engineer/Manager, TEQUILA\ Japan IT Group               ]
[                6-17-2 Ginza Chuo-ku, Tokyo 104-8167, JAPAN ]
[ Tel: +81-(0)3-3545-7703            Fax: +81-(0)3-3545-7343 ]
[ http://www.tequila.co.jp                                   ]


Attachment

Re: Question for Postgres 8.3

From
rihad
Date:
> If you want to support multiple encodings, the only safe locale choice
> is (and always has been) C.

I should be ashamed for asking this, but would someone care to tell me
how encoding differs from locale?

My postgresql FreeBSD rcNG script reads:

postgresql_initdb_flags=${postgresql_initdb_flags:-"--encoding=utf-8
--lc-collate=C"}

As I understand it collation is part of locale, but encoding is
"something else"?

Thanks.

Re: Question for Postgres 8.3

From
Gregory Stark
Date:
"rihad" <rihad@mail.ru> writes:

>> If you want to support multiple encodings, the only safe locale choice
>> is (and always has been) C.
>
> I should be ashamed for asking this, but would someone care to tell me how
> encoding differs from locale?

One you missed is a character set, which is just a set of possible characters
(not bytes, abstract things called characters).

An encoding is a mapping from a series of binary bytes to a series of
characters from a character set, like UTF-8 or Big5 or just plain ascii.

A locale is a set of rules for how to sort (collation), format dates, numbers,
currencies, etc like es_US or jp_JP

The problem is that a locale needs to know what the string it's looking is at
to decide how to sort it, so it has to be designed for a particular encoding.
In Unix that encoding is tacked on the end like en_US.UTF-8.

C is a bit of special case since it sorts based on the binary representation
rather than the characters. That's true for any 1-byte encoding based locale
but C is more predictable when you actually have binary data.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning