Re: Unicode vs SQL_ASCII DBs - Mailing list pgsql-general

From John Sidney-Woollett
Subject Re: Unicode vs SQL_ASCII DBs
Date
Msg-id 1415.192.168.0.64.1075715841.squirrel@mercury.wardbrook.com
Whole thread Raw
In response to Re: Unicode vs SQL_ASCII DBs  (Kris Jurka <books@ejurka.com>)
Responses Re: Unicode vs SQL_ASCII DBs  (Kris Jurka <books@ejurka.com>)
List pgsql-general
Kris, thanks for you feedback. Can you give me any further info on the
questions below?

Kris Jurka said:
>> 3) If I want accented characters to sort correctly, must I select
>> UNICODE
>> (or the appropriate ISO 8859 char set) over SQL_ASCII?
>
> You are confusing encoding with locale.  Locales determines the correct
> sort order and you must choose an encoding that works with your locale.

Except that in my test, the two differently encoded databases were in the
same 7.4.1 cluster with the same locale, yet they sorted the *same* data
differently - implying the encoding is a factor.

Any idea why would that be?

here is output from pg_controldata:

pg_control version number:            72
Catalog version number:               200310211
Database cluster state:               in production
pg_control last modified:             Mon 02 Feb 2004 11:21:29 GMT
Current log file ID:                  0
Next log file segment:                2
Latest checkpoint location:           0/124B958
Prior checkpoint location:            0/1149DFC
Latest checkpoint's REDO location:    0/124B958
Latest checkpoint's UNDO location:    0/0
Latest checkpoint's StartUpID:        16
Latest checkpoint's NextXID:          527327
Latest checkpoint's NextOID:          26472
Time of latest checkpoint:            Mon 02 Feb 2004 11:21:27 GMT
Database block size:                  8192
Blocks per segment of large relation: 131072
Maximum length of identifiers:        64
Maximum number of function arguments: 32
Date/time type storage:               floating-point numbers
Maximum length of locale name:        128
LC_COLLATE:                           en_GB.UTF-8
LC_CTYPE:                             en_GB.UTF-8

and

     Name      |  Owner   | Encoding
---------------+----------+-----------
 johntest      | postgres | UNICODE
 johntest2     | postgres | SQL_ASCII
 template0     | postgres | SQL_ASCII
 template1     | postgres | SQL_ASCII

> Other things to note:
>
> LOWER()/UPPER() only work correctly in a single byte encoding (not
> unicode)

Are there any other gotchas that I need to be aware of with a UNICODE
encoded database?

I save mention by Tom Lane of a bug: [QUOTE] The bug turns out not to be
Fedora-specific at all.  I believe it will happen on any platform if you
are using both a multibyte database encoding (such as Unicode) *and* a
non-C locale. PG 7.4 has a more restricted form of the bug --- it's not
locale specific but does still require a multibyte encoding. [END QUOTE]

I basically need "english" sorting, and accented character support without
any JDBC access/conversion problems. Do you think that my current DB
locale (en_GB.UTF-8) and UNICODE encoded database the best solution? Or
can you suggest something better?

Thanks

John Sidney-Woollett




pgsql-general by date:

Previous
From: Bernd Helmle
Date:
Subject: Re: Search across multiple sources
Next
From: Kris Jurka
Date:
Subject: Re: Unicode vs SQL_ASCII DBs