Home > mailing lists

Re: Unicode vs SQL_ASCII DBs - Mailing list pgsql-general

From	John Sidney-Woollett
Subject	Re: Unicode vs SQL_ASCII DBs
Date	February 2, 2004 09:04:01
Msg-id	1415.192.168.0.64.1075715841.squirrel@mercury.wardbrook.com Whole thread Raw
In response to	Re: Unicode vs SQL_ASCII DBs (Kris Jurka <books@ejurka.com>)
Responses	Re: Unicode vs SQL_ASCII DBs
List	pgsql-general

Tree view

Kris, thanks for you feedback. Can you give me any further info on the
questions below?

Kris Jurka said:
>> 3) If I want accented characters to sort correctly, must I select
>> UNICODE
>> (or the appropriate ISO 8859 char set) over SQL_ASCII?
>
> You are confusing encoding with locale.  Locales determines the correct
> sort order and you must choose an encoding that works with your locale.

Except that in my test, the two differently encoded databases were in the
same 7.4.1 cluster with the same locale, yet they sorted the *same* data
differently - implying the encoding is a factor.

Any idea why would that be?

here is output from pg_controldata:

pg_control version number:            72
Catalog version number:               200310211
Database cluster state:               in production
pg_control last modified:             Mon 02 Feb 2004 11:21:29 GMT
Current log file ID:                  0
Next log file segment:                2
Latest checkpoint location:           0/124B958
Prior checkpoint location:            0/1149DFC
Latest checkpoint's REDO location:    0/124B958
Latest checkpoint's UNDO location:    0/0
Latest checkpoint's StartUpID:        16
Latest checkpoint's NextXID:          527327
Latest checkpoint's NextOID:          26472
Time of latest checkpoint:            Mon 02 Feb 2004 11:21:27 GMT
Database block size:                  8192
Blocks per segment of large relation: 131072
Maximum length of identifiers:        64
Maximum number of function arguments: 32
Date/time type storage:               floating-point numbers
Maximum length of locale name:        128
LC_COLLATE:                           en_GB.UTF-8
LC_CTYPE:                             en_GB.UTF-8

and

     Name      |  Owner   | Encoding
---------------+----------+-----------
 johntest      | postgres | UNICODE
 johntest2     | postgres | SQL_ASCII
 template0     | postgres | SQL_ASCII
 template1     | postgres | SQL_ASCII

> Other things to note:
>
> LOWER()/UPPER() only work correctly in a single byte encoding (not
> unicode)

Are there any other gotchas that I need to be aware of with a UNICODE
encoded database?

I save mention by Tom Lane of a bug: [QUOTE] The bug turns out not to be
Fedora-specific at all.  I believe it will happen on any platform if you
are using both a multibyte database encoding (such as Unicode) *and* a
non-C locale. PG 7.4 has a more restricted form of the bug --- it's not
locale specific but does still require a multibyte encoding. [END QUOTE]

I basically need "english" sorting, and accented character support without
any JDBC access/conversion problems. Do you think that my current DB
locale (en_GB.UTF-8) and UNICODE encoded database the best solution? Or
can you suggest something better?

Thanks

John Sidney-Woollett

pgsql-general by date:

From: Bernd Helmle
Date: 02 February 2004, 08:49:26
Subject: Re: Search across multiple sources

From: Kris Jurka
Date: 02 February 2004, 09:27:07
Subject: Re: Unicode vs SQL_ASCII DBs

Re: Unicode vs SQL_ASCII DBs - Mailing list pgsql-general

Previous

Next