Re: [GENERAL] Multi-Language Support and/or UTF-8 UNICODE - Mailing list pgsql-general

From Tatsuo Ishii
Subject Re: [GENERAL] Multi-Language Support and/or UTF-8 UNICODE
Date
Msg-id 20000215100555K.t-ishii@sra.co.jp
Whole thread Raw
In response to Multi-Language Support and/or UTF-8 UNICODE  (RK Street <R.K.Street@rl.ac.uk>)
List pgsql-general
> I have been reading in the doc directory of the 6.5.1 tree for information
> about UNICODE and UTF-8 support and still have a few questions.
> It is not clear to me whether Unicode 2.x and utf-8 or UCS-2 encodings are
> available and working okay at this time.  Can anyone explain?

As stated in README.mb, we support UTF-8, not UCS-2.

> I get the impression that UTF-8 is available for the backend but not the
> frontend.  I also get the impression that only ISO 8895-1 through 5 so far
> work.  If UTF-8 and ISO-8859-7 are not available on the client, how do you
> get the non ISO-8859-1 data into and out of the database ?

Sorry, but I don't understand your point. Which one are you talking
about UNICODE or ISO 8859-X? Or do you expect UNICODE <--> ISO 8859-X
automatic encoding conversion? It's not available right now. If you
build your database with UNICODE encoding (createdb - E UNICODE for
example), you must use UTF-8 both for backend and frontend.

> Could I build the database so that the default format is UNICODE if the
> user takes no further action regardless of any locale settings ?

If you build PostgreSQL by using "configure --with-mb=UNICODE", then
you don't need to worry about it.  If you did configure other than
UNICODE, still you could do:

    initdb -e UNICODE

Lastly, if you did initdb other than UNICODE, still you could make a
UNICODE database by:

    createdb -E UNICODE

Note that these above will be changed in coming 7.0 release.

> What happens when you do backups, searches and sorting ?  Are
> there any restrictions on table and column names (do they have to be
> 7-bit ASCII for instance) ?

No restrictions, I believe. Notice that sorting is done according to
the phisical value of the UTF-8 bytes.
--
Tatsuo Ishii

pgsql-general by date:

Previous
From:
Date:
Subject: pg_dump of int8 with "?
Next
From: "John Henderson"
Date:
Subject: Re: [GENERAL] typecast for index