Thread: Question for Postgres 8.3
hi, on the page: <http://www.postgresql.org/docs/8.3/static/release-8-3.html> in point E.1.2.3: * Disallow database encodings that are inconsistent with the server's locale setting (Tom) does this mean, if my server LOCALE is for example UTF-8.en_US, and I want to create a EUC_JP database it gets rejected? do I missunderstand that? Normaly my servers have default locale set to UTF-8.en_US but also have the locales for UTF-8.ja_JP and EUC_JP there, 99.9% of my databases are utf-8, but I have some clients that created EUC_JP databases, will the upgrade affect this? [ Clemens Schwaighofer -----=====:::::~ ] [ IT Engineer/Manager, TEQUILA\ Japan IT Group ] [ 6-17-2 Ginza Chuo-ku, Tokyo 104-8167, JAPAN ] [ Tel: +81-(0)3-3545-7703 Fax: +81-(0)3-3545-7343 ] [ http://www.tequila.co.jp ]
Clemens Schwaighofer <cs@tequila.co.jp> writes: > * Disallow database encodings that are inconsistent with the server's > locale setting (Tom) > does this mean, if my server LOCALE is for example UTF-8.en_US, and I want > to create a EUC_JP database it gets rejected? do I missunderstand that? Nope, you have it correctly. > Normaly my servers have default locale set to UTF-8.en_US but also have the > locales for UTF-8.ja_JP and EUC_JP there, 99.9% of my databases are utf-8, > but I have some clients that created EUC_JP databases, will the upgrade > affect this? I'm surprised your clients haven't been screaming about bogus sorting and upper/lowercasing behavior. If you want to support multiple encodings, the only safe locale choice is (and always has been) C. If you doubt this, troll the archives for awhile --- for example, searching for locale+encoding in pgsql-bugs should provide plenty of amusing reading matter. 8.3 is just refusing to do things that are known to be unsafe in previous releases. regards, tom lane
On 02/05/2008 11:38 AM, Tom Lane wrote: > Clemens Schwaighofer <cs@tequila.co.jp> writes: >> * Disallow database encodings that are inconsistent with the server's >> locale setting (Tom) > >> does this mean, if my server LOCALE is for example UTF-8.en_US, and I want >> to create a EUC_JP database it gets rejected? do I missunderstand that? > > Nope, you have it correctly. okay, good, as I finally have a reason to change this then on those view databases. >> Normaly my servers have default locale set to UTF-8.en_US but also have the >> locales for UTF-8.ja_JP and EUC_JP there, 99.9% of my databases are utf-8, >> but I have some clients that created EUC_JP databases, will the upgrade >> affect this? > > I'm surprised your clients haven't been screaming about bogus sorting > and upper/lowercasing behavior. well, originally they were on a server that is also in EUC_JP, but they need to move away to one of my servers that is pure UTF-8, and I cannot change this setting because of my other databases, so probably better to convert them over to UTF-8 and adept the scripts to convert to the target encoding, rather than going on with the same mess. > If you want to support multiple encodings, the only safe locale choice > is (and always has been) C. If you doubt this, troll the archives for > awhile --- for example, searching for locale+encoding in pgsql-bugs > should provide plenty of amusing reading matter. 8.3 is just refusing > to do things that are known to be unsafe in previous releases. > > regards, tom lane -- [ Clemens Schwaighofer -----=====:::::~ ] [ IT Engineer/Manager, TEQUILA\ Japan IT Group ] [ 6-17-2 Ginza Chuo-ku, Tokyo 104-8167, JAPAN ] [ Tel: +81-(0)3-3545-7703 Fax: +81-(0)3-3545-7343 ] [ http://www.tequila.co.jp ]
Hi. ----- Original Message ----- From: "Clemens Schwaighofer" <cs@tequila.co.jp> > hi, > > on the page: <http://www.postgresql.org/docs/8.3/static/release-8-3.html> > > in point E.1.2.3: > > * Disallow database encodings that are inconsistent with the server's > locale setting (Tom) > > does this mean, if my server LOCALE is for example UTF-8.en_US, and I want > to create a EUC_JP database it gets rejected? do I missunderstand that? > > Normaly my servers have default locale set to UTF-8.en_US but also have the > locales for UTF-8.ja_JP and EUC_JP there, 99.9% of my databases are utf-8, > but I have some clients that created EUC_JP databases, will the upgrade > affect this? When using server encoding for EUC_JP and UTF-8 simultaneously, it is not allowed except LOCALE=C. Moreover, it can also use SQL_ASCII satisfactory natural. please initdb --no-locale. Regards, Hiroshi Saito
> If you want to support multiple encodings, the only safe locale choice > is (and always has been) C. If you doubt this, troll the archives for > awhile --- for example, searching for locale+encoding in pgsql-bugs > should provide plenty of amusing reading matter. This is most interesting. I think some of use UTF-8 under the impression that it would support unicode/java (and thus US-ascii) easily, and of course then allow for foreign language encodings when we need to internationalize. Thank goodness we only plan to I18N for a decade and never got around to it! David
> I'm surprised your clients haven't been screaming about bogus sorting > and upper/lowercasing behavior. Umm, This is a mere information. (Since you dislike, this may be a noise.?) As for upper/lower. It does not change, even if uses which locale in us Japanese. (there is nothing with the alphabet) Then, In spite of using a small letter and a capital letter properly clearly. As for sorting. It has a difference, if sorting of a dictionary is desired. However, each is the not Japanese order of the JIS X 4061:1996(Collation of Japanese character string), probably. but, I think that LOCAL=C is clear as a sorting of a dictionary. It seems that selection is good by the situation well. See, Although you may be unable to see (However, this is Japanes shift-jis) http://winpg.jp/~saito/pg_bug/Japanese_LowerUpper_sort.txt Regards, Hiroshi Saito
Hi. This reply is very glad for me!:-) However, There is still a locale problem. Anyway, thanks!!!! Regards, Hiroshi Saito ----- Original Message ----- From: "Clemens Schwaighofer" <cs@tequila.co.jp> >>> I'm surprised your clients haven't been screaming about bogus sorting >>> and upper/lowercasing behavior. >> >> Umm, This is a mere information. (Since you dislike, this may be a noise.?) >> >> As for upper/lower. >> It does not change, even if uses which locale in us Japanese. (there is >> nothing with the alphabet) Then, In spite of using a small letter and a >> capital letter properly clearly. >> >> As for sorting. >> It has a difference, if sorting of a dictionary is desired. However, >> each is the not >> Japanese order of the JIS X 4061:1996(Collation of Japanese character >> string), probably. but, I think that LOCAL=C is clear as a sorting of a >> dictionary. It seems that selection is good by the situation well. >> See, Although you may be unable to see (However, this is Japanes shift-jis) >> http://winpg.jp/~saito/pg_bug/Japanese_LowerUpper_sort.txt > > もしかしたらLinux LOCALEはUTF-8.en_USとPostgresのLOCALEはCその日本語の > ソートは正しくない?僕はいつでもLOCAL=Cある、と今テストしました。ソート > は正しいです。 > > ありがとうございます。 > > -- > [ Clemens Schwaighofer -----=====:::::~ ] > [ IT Engineer/Manager, TEQUILA\ Japan IT Group ] > [ 6-17-2 Ginza Chuo-ku, Tokyo 104-8167, JAPAN ] > [ Tel: +81-(0)3-3545-7703 Fax: +81-(0)3-3545-7343 ] > [ http://www.tequila.co.jp ] > >
On 02/05/2008 03:38 PM, Hiroshi Saito wrote: > Hi. > > This reply is very glad for me!:-) > However, There is still a locale problem. If I use an EUC database on an UTF8 system? Well I will try out and see if it breaks everything. > Anyway, thanks!!!! > > Regards, > Hiroshi Saito > > ----- Original Message ----- From: "Clemens Schwaighofer" > <cs@tequila.co.jp> > >>>> I'm surprised your clients haven't been screaming about bogus sorting >>>> and upper/lowercasing behavior. >>> >>> Umm, This is a mere information. (Since you dislike, this may be a >>> noise.?) >>> >>> As for upper/lower. >>> It does not change, even if uses which locale in us Japanese. (there is >>> nothing with the alphabet) Then, In spite of using a small letter and a >>> capital letter properly clearly. >>> >>> As for sorting. >>> It has a difference, if sorting of a dictionary is desired. However, >>> each is the not >>> Japanese order of the JIS X 4061:1996(Collation of Japanese character >>> string), probably. but, I think that LOCAL=C is clear as a sorting of a >>> dictionary. It seems that selection is good by the situation well. >>> See, Although you may be unable to see (However, this is Japanes >>> shift-jis) >>> http://winpg.jp/~saito/pg_bug/Japanese_LowerUpper_sort.txt >> >> もしかしたらLinux LOCALEはUTF-8.en_USとPostgresのLOCALEはCその日本語の >> ソートは正しくない?僕はいつでもLOCAL=Cある、と今テストしました。ソート >> は正しいです。 >> >> ありがとうございます。 >> >> -- >> [ Clemens Schwaighofer -----=====:::::~ ] >> [ IT Engineer/Manager, TEQUILA\ Japan IT Group ] >> [ 6-17-2 Ginza Chuo-ku, Tokyo 104-8167, JAPAN ] >> [ Tel: +81-(0)3-3545-7703 Fax: +81-(0)3-3545-7343 ] >> [ http://www.tequila.co.jp ] >> >> > -- [ Clemens Schwaighofer -----=====:::::~ ] [ IT Engineer/Manager, TEQUILA\ Japan IT Group ] [ 6-17-2 Ginza Chuo-ku, Tokyo 104-8167, JAPAN ] [ Tel: +81-(0)3-3545-7703 Fax: +81-(0)3-3545-7343 ] [ http://www.tequila.co.jp ]
Attachment
> If you want to support multiple encodings, the only safe locale choice > is (and always has been) C. I should be ashamed for asking this, but would someone care to tell me how encoding differs from locale? My postgresql FreeBSD rcNG script reads: postgresql_initdb_flags=${postgresql_initdb_flags:-"--encoding=utf-8 --lc-collate=C"} As I understand it collation is part of locale, but encoding is "something else"? Thanks.
"rihad" <rihad@mail.ru> writes: >> If you want to support multiple encodings, the only safe locale choice >> is (and always has been) C. > > I should be ashamed for asking this, but would someone care to tell me how > encoding differs from locale? One you missed is a character set, which is just a set of possible characters (not bytes, abstract things called characters). An encoding is a mapping from a series of binary bytes to a series of characters from a character set, like UTF-8 or Big5 or just plain ascii. A locale is a set of rules for how to sort (collation), format dates, numbers, currencies, etc like es_US or jp_JP The problem is that a locale needs to know what the string it's looking is at to decide how to sort it, so it has to be designed for a particular encoding. In Unix that encoding is tacked on the end like en_US.UTF-8. C is a bit of special case since it sorts based on the binary representation rather than the characters. That's true for any 1-byte encoding based locale but C is more predictable when you actually have binary data. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning