Re: Unicode vs SQL_ASCII DBs - Mailing list pgsql-general
From | John Sidney-Woollett |
---|---|
Subject | Re: Unicode vs SQL_ASCII DBs |
Date | |
Msg-id | 1415.192.168.0.64.1075715841.squirrel@mercury.wardbrook.com Whole thread Raw |
In response to | Re: Unicode vs SQL_ASCII DBs (Kris Jurka <books@ejurka.com>) |
Responses |
Re: Unicode vs SQL_ASCII DBs
|
List | pgsql-general |
Kris, thanks for you feedback. Can you give me any further info on the questions below? Kris Jurka said: >> 3) If I want accented characters to sort correctly, must I select >> UNICODE >> (or the appropriate ISO 8859 char set) over SQL_ASCII? > > You are confusing encoding with locale. Locales determines the correct > sort order and you must choose an encoding that works with your locale. Except that in my test, the two differently encoded databases were in the same 7.4.1 cluster with the same locale, yet they sorted the *same* data differently - implying the encoding is a factor. Any idea why would that be? here is output from pg_controldata: pg_control version number: 72 Catalog version number: 200310211 Database cluster state: in production pg_control last modified: Mon 02 Feb 2004 11:21:29 GMT Current log file ID: 0 Next log file segment: 2 Latest checkpoint location: 0/124B958 Prior checkpoint location: 0/1149DFC Latest checkpoint's REDO location: 0/124B958 Latest checkpoint's UNDO location: 0/0 Latest checkpoint's StartUpID: 16 Latest checkpoint's NextXID: 527327 Latest checkpoint's NextOID: 26472 Time of latest checkpoint: Mon 02 Feb 2004 11:21:27 GMT Database block size: 8192 Blocks per segment of large relation: 131072 Maximum length of identifiers: 64 Maximum number of function arguments: 32 Date/time type storage: floating-point numbers Maximum length of locale name: 128 LC_COLLATE: en_GB.UTF-8 LC_CTYPE: en_GB.UTF-8 and Name | Owner | Encoding ---------------+----------+----------- johntest | postgres | UNICODE johntest2 | postgres | SQL_ASCII template0 | postgres | SQL_ASCII template1 | postgres | SQL_ASCII > Other things to note: > > LOWER()/UPPER() only work correctly in a single byte encoding (not > unicode) Are there any other gotchas that I need to be aware of with a UNICODE encoded database? I save mention by Tom Lane of a bug: [QUOTE] The bug turns out not to be Fedora-specific at all. I believe it will happen on any platform if you are using both a multibyte database encoding (such as Unicode) *and* a non-C locale. PG 7.4 has a more restricted form of the bug --- it's not locale specific but does still require a multibyte encoding. [END QUOTE] I basically need "english" sorting, and accented character support without any JDBC access/conversion problems. Do you think that my current DB locale (en_GB.UTF-8) and UNICODE encoded database the best solution? Or can you suggest something better? Thanks John Sidney-Woollett
pgsql-general by date: