Thread: JDBC and character sets

JDBC and character sets

From
David Warnock
Date:
Hi,

We have java applications currently using either Postgresql 6.4 or
Interbase.  We use Interbase as we need unicode.

The support for unicode in Postgresql 6.5 is unclear to me. I can also
see no mention of Unicode in the jdbc driver documentation.

Can I install postgresql 6.5 so as to be able to save and restore the
unicode from java?

If so how?

Thanks

Dave

-- 
David Warnock
Sundayta Ltd


Re: [INTERFACES] JDBC and character sets

From
Tatsuo Ishii
Date:
>We have java applications currently using either Postgresql 6.4 or
>Interbase.  We use Interbase as we need unicode.
>
>The support for unicode in Postgresql 6.5 is unclear to me. I can also
>see no mention of Unicode in the jdbc driver documentation.
>
>Can I install postgresql 6.5 so as to be able to save and restore the
>unicode from java?
>
>If so how?

I'm not sure about JDBC, but definitely PostgreSQL supports Unicode if 
you enable MB option.

configure --with-mb=UNICODE
--
Tatsuo Ishii


Re: [INTERFACES] JDBC and character sets

From
David Warnock
Date:
Tatsuo,

Many thanks for your reply.

I think I saw from the list of developers that you wrote a lot of the
multiu-byte code. Is that correct? If so my grateful thanks.

Are there any limitations or gotchas about using unicode everywhere?

Specifically

1. Column length. Is this measured in unicode characters or do I need to
increase the length of Varchars? ie is a varchar(10) certain to hold 10
unicode characters?

2. Indexing. What sort order will I get from an index or an order by for
unicode characters. Can this be customised. Generally I try to do any
really important sorting in Java where I can use the correct sort order
for the locale.

3. Upper/lowercase. I have been using separate columns for uppercase
versions of names etc again so that the case changes can be done by the
client which will know the correct rules for the locale where the data
is entered. What do upper/lower case functions in Postgresql do with
unicode?

4. Are there any limitations on what I use to write triggers? Can all
the different ways work reliably with unicode?

Many many thanks.

Dave

-- 
David Warnock
Sundayta Ltd


Re: [INTERFACES] JDBC and character sets

From
Tatsuo Ishii
Date:
>I think I saw from the list of developers that you wrote a lot of the
>multiu-byte code. Is that correct? If so my grateful thanks.

Yes, I'm responsible for the code multi-byte.

>Are there any limitations or gotchas about using unicode everywhere?
>
>Specifically
>
>1. Column length. Is this measured in unicode characters or do I need to
>increase the length of Varchars? ie is a varchar(10) certain to hold 10
>unicode characters?

When you define varchar(n), n should be counted in bytes, not
characters.
We assume Unicode is input as UTF-8 encoding. In UTF-8, 10 ASCII chars
take 10 bytes. So varchar(10) will hold 10 Unicode chars if they are
all ASCII. However, if you use ISO8859 chars they will take 2 bytes
for each letter. If you use KANJI, 3 bytes for each letter. You could
use octet_length() to measure the size of a Unicode string in bytes.

>2. Indexing. What sort order will I get from an index or an order by for
>unicode characters.

It will sorted in the order of Unicode code point.

>Can this be customised.

Currently no.

>Generally I try to do any
>really important sorting in Java where I can use the correct sort order
>for the locale.

>3. Upper/lowercase. I have been using separate columns for uppercase
>versions of names etc again so that the case changes can be done by the
>client which will know the correct rules for the locale where the data
>is entered. What do upper/lower case functions in Postgresql do with
>unicode?

I think it will related to locale. I'm not sure but I've heard about
Unicode locale. If it really exists, you could do:

configure --with-mb=UNICODE --with-locale

so that upper/lower works for Unicode.

>4. Are there any limitations on what I use to write triggers? Can all
>the different ways work reliably with unicode?

I'm not sure but it should work with triggers.
--
Tatsuo Ishii


Re: [INTERFACES] JDBC and character sets

From
David Warnock
Date:
Tatsuo,

Many thanks for all your answers. We will be giving postgresql a full
trial soon and then hopefully the last fully closed source component of
our own software will be gone so we can look at releasing more than just
tools as open source.

Regards

Dave

-- 
David Warnock
Sundayta Ltd


Re: [INTERFACES] JDBC and character sets

From
Tatsuo Ishii
Date:
>Many thanks for all your answers. We will be giving postgresql a full
>trial soon and then hopefully the last fully closed source component of
>our own software will be gone so we can look at releasing more than just
>tools as open source.

It's my pleasure to support you regarding Unicode. There seems very
few people who try to use MB+Unicode.  Please let me know if you have
questions or problems so that I could enhance MB/Unicode support!

regards,
--
Tatsuo Ishii