Thread: RE: [INTERFACES] JDBC and character sets

RE: [INTERFACES] JDBC and character sets

From
Peter Mount
Date:
If I understand this correctly, if I make sure the driver converts the
strings (in the correct methods) into UTF-8, then unicode support will
work?

I'm wondering, as I haven't delved into Unicode with the driver yet. If
this is the case, it will be a simple thing to implement.

Peter

-- 
Peter Mount
Enterprise Support
Maidstone Borough Council
Any views stated are my own, and not those of Maidstone Borough Council.


-----Original Message-----
From: Tatsuo Ishii [mailto:t-ishii@sra.co.jp]
Sent: Tuesday, June 22, 1999 3:14 PM
To: David Warnock
Cc: t-ishii@sra.co.jp; pgsql-interfaces@postgreSQL.org
Subject: Re: [INTERFACES] JDBC and character sets 


>I think I saw from the list of developers that you wrote a lot of the
>multiu-byte code. Is that correct? If so my grateful thanks.

Yes, I'm responsible for the code multi-byte.

>Are there any limitations or gotchas about using unicode everywhere?
>
>Specifically
>
>1. Column length. Is this measured in unicode characters or do I need
to
>increase the length of Varchars? ie is a varchar(10) certain to hold 10
>unicode characters?

When you define varchar(n), n should be counted in bytes, not
characters.
We assume Unicode is input as UTF-8 encoding. In UTF-8, 10 ASCII chars
take 10 bytes. So varchar(10) will hold 10 Unicode chars if they are
all ASCII. However, if you use ISO8859 chars they will take 2 bytes
for each letter. If you use KANJI, 3 bytes for each letter. You could
use octet_length() to measure the size of a Unicode string in bytes.

>2. Indexing. What sort order will I get from an index or an order by
for
>unicode characters.

It will sorted in the order of Unicode code point.

>Can this be customised.

Currently no.

>Generally I try to do any
>really important sorting in Java where I can use the correct sort order
>for the locale.

>3. Upper/lowercase. I have been using separate columns for uppercase
>versions of names etc again so that the case changes can be done by the
>client which will know the correct rules for the locale where the data
>is entered. What do upper/lower case functions in Postgresql do with
>unicode?

I think it will related to locale. I'm not sure but I've heard about
Unicode locale. If it really exists, you could do:

configure --with-mb=UNICODE --with-locale

so that upper/lower works for Unicode.

>4. Are there any limitations on what I use to write triggers? Can all
>the different ways work reliably with unicode?

I'm not sure but it should work with triggers.
--
Tatsuo Ishii


Re: [INTERFACES] JDBC and character sets

From
David Warnock
Date:
Peter,

> If I understand this correctly, if I make sure the driver converts the
> strings (in the correct methods) into UTF-8, then unicode support will
> work?
> 
> I'm wondering, as I haven't delved into Unicode with the driver yet. If
> this is the case, it will be a simple thing to implement.

Ooooh yes please. This would be a great thing for us. If you need any
help testing please tell us.

Regards

Dave

-- 
David Warnock
Sundayta Ltd


Re: [INTERFACES] JDBC and character sets

From
Herouth Maoz
Date:
At 18:22 +0300 on 22/06/1999, David Warnock wrote:


> > If I understand this correctly, if I make sure the driver converts the
> > strings (in the correct methods) into UTF-8, then unicode support will
> > work?
> >
> > I'm wondering, as I haven't delved into Unicode with the driver yet. If
> > this is the case, it will be a simple thing to implement.
>
> Ooooh yes please. This would be a great thing for us. If you need any
> help testing please tell us.

Well, what if we haven't compiled the system for unicode? Make sure the
driver doesn't kill us in such a case. We use ISO8859-8 for our data. If
you were to convert it to utf-8, it would certainly not work as we expect.
Especially if the database is not compiled for unicode.

Herouth

--
Herouth Maoz, Internet developer.
Open University of Israel - Telem project
http://telem.openu.ac.il/~herutma