Thread: JDBC and character sets
Hi, We have java applications currently using either Postgresql 6.4 or Interbase. We use Interbase as we need unicode. The support for unicode in Postgresql 6.5 is unclear to me. I can also see no mention of Unicode in the jdbc driver documentation. Can I install postgresql 6.5 so as to be able to save and restore the unicode from java? If so how? Thanks Dave -- David Warnock Sundayta Ltd
>We have java applications currently using either Postgresql 6.4 or >Interbase. We use Interbase as we need unicode. > >The support for unicode in Postgresql 6.5 is unclear to me. I can also >see no mention of Unicode in the jdbc driver documentation. > >Can I install postgresql 6.5 so as to be able to save and restore the >unicode from java? > >If so how? I'm not sure about JDBC, but definitely PostgreSQL supports Unicode if you enable MB option. configure --with-mb=UNICODE -- Tatsuo Ishii
Tatsuo, Many thanks for your reply. I think I saw from the list of developers that you wrote a lot of the multiu-byte code. Is that correct? If so my grateful thanks. Are there any limitations or gotchas about using unicode everywhere? Specifically 1. Column length. Is this measured in unicode characters or do I need to increase the length of Varchars? ie is a varchar(10) certain to hold 10 unicode characters? 2. Indexing. What sort order will I get from an index or an order by for unicode characters. Can this be customised. Generally I try to do any really important sorting in Java where I can use the correct sort order for the locale. 3. Upper/lowercase. I have been using separate columns for uppercase versions of names etc again so that the case changes can be done by the client which will know the correct rules for the locale where the data is entered. What do upper/lower case functions in Postgresql do with unicode? 4. Are there any limitations on what I use to write triggers? Can all the different ways work reliably with unicode? Many many thanks. Dave -- David Warnock Sundayta Ltd
>I think I saw from the list of developers that you wrote a lot of the >multiu-byte code. Is that correct? If so my grateful thanks. Yes, I'm responsible for the code multi-byte. >Are there any limitations or gotchas about using unicode everywhere? > >Specifically > >1. Column length. Is this measured in unicode characters or do I need to >increase the length of Varchars? ie is a varchar(10) certain to hold 10 >unicode characters? When you define varchar(n), n should be counted in bytes, not characters. We assume Unicode is input as UTF-8 encoding. In UTF-8, 10 ASCII chars take 10 bytes. So varchar(10) will hold 10 Unicode chars if they are all ASCII. However, if you use ISO8859 chars they will take 2 bytes for each letter. If you use KANJI, 3 bytes for each letter. You could use octet_length() to measure the size of a Unicode string in bytes. >2. Indexing. What sort order will I get from an index or an order by for >unicode characters. It will sorted in the order of Unicode code point. >Can this be customised. Currently no. >Generally I try to do any >really important sorting in Java where I can use the correct sort order >for the locale. >3. Upper/lowercase. I have been using separate columns for uppercase >versions of names etc again so that the case changes can be done by the >client which will know the correct rules for the locale where the data >is entered. What do upper/lower case functions in Postgresql do with >unicode? I think it will related to locale. I'm not sure but I've heard about Unicode locale. If it really exists, you could do: configure --with-mb=UNICODE --with-locale so that upper/lower works for Unicode. >4. Are there any limitations on what I use to write triggers? Can all >the different ways work reliably with unicode? I'm not sure but it should work with triggers. -- Tatsuo Ishii
Tatsuo, Many thanks for all your answers. We will be giving postgresql a full trial soon and then hopefully the last fully closed source component of our own software will be gone so we can look at releasing more than just tools as open source. Regards Dave -- David Warnock Sundayta Ltd
>Many thanks for all your answers. We will be giving postgresql a full >trial soon and then hopefully the last fully closed source component of >our own software will be gone so we can look at releasing more than just >tools as open source. It's my pleasure to support you regarding Unicode. There seems very few people who try to use MB+Unicode. Please let me know if you have questions or problems so that I could enhance MB/Unicode support! regards, -- Tatsuo Ishii