Thread: JDBC encoding problem

JDBC encoding problem

From
Kurt Overberg
Date:
I'm having a rather strange problem that I'm hoping someone can help me
with.  I'm using Struts 1.0/jsp on Debian linux under Tomcat 4.1.x and
the blackdown JVM  .  I'm attempting to convert my current SQL_ASCII
database to UNICODE.  I'm new to this, so am most likely making a few
mistakes.  Here's what I've done so far:

o  Converted database encoding to be UNICODE.  I'm pretty sure this part
worked okay.  (did a pg_dump, then iconv -f 8859_1 -t UTF-8, then
created new db with encoding UNICODE and reloaded- no errors upon reload)

sparky:~$ psql -l
         List of databases
    Name    |  Owner   | Encoding
-----------+----------+-----------
  unitest   | kurt     | UNICODE
  template1 | postgres | SQL_ASCII
(2 rows)


o  set client_encoding to 'UTF8';

o  In my JSP files, I set the following at the top of each:

<%@ page lanuage="java" pageEncoding="UTF-8" %>


Now, to test this, I go to a japanese page, copy some text, then paste
it into a form, that gets submitted to the server and saved into the DB.
Then I try to display what I got back from the database.  It comes out
garbled.  HOWEVER- if I leave the 'pageEncoding' out of my display .jsp
file it still comes out garbled, UNTIL I set UTF-8 manually in my
browsers Character Encoding settings (both mozilla and IE).  Then the
japanese characters render fine (just like I entered them).

Very strange.  What's confusing is that when I set the pageEncoding to
'UTF-8', the characters don't render properly, and as far as I can tell,
thats the same as manually setting the browser manually.  I must be
doing something wrong because I get the same results in IE and mozilla
(recent build).

What may be the problem- I don't do anything differently when getting
the data out of the database, just standard
resultset.getString("column");  Do I need to change that call, to handle
the potentially UTF-8 encoded strings?  I can't find anything on that at
all with google/usenet.

Any and all help, suggestions or pointers would be greatly appreciated.

Thanks!

/kurt




Re: JDBC encoding problem

From
Anders Hermansen
Date:
* Kurt Overberg (kurt@hotdogrecords.com) wrote:
> I'm having a rather strange problem that I'm hoping someone can help me
> with.  I'm using Struts 1.0/jsp on Debian linux under Tomcat 4.1.x and
> the blackdown JVM  .  I'm attempting to convert my current SQL_ASCII
> database to UNICODE.  I'm new to this, so am most likely making a few
> mistakes.  Here's what I've done so far:
>
> o  Converted database encoding to be UNICODE.  I'm pretty sure this part
> worked okay.  (did a pg_dump, then iconv -f 8859_1 -t UTF-8, then
> created new db with encoding UNICODE and reloaded- no errors upon reload)
>
> sparky:~$ psql -l
>         List of databases
>    Name    |  Owner   | Encoding
> -----------+----------+-----------
>  unitest   | kurt     | UNICODE
>  template1 | postgres | SQL_ASCII
> (2 rows)

Ok.

> o  set client_encoding to 'UTF8';

As I read in another thread, client_encoding does not matter for the
JDBC driver. It will change it to UNICODE when you connect. It probably
does this because all java strings are unicode.

But it will probably matter for your psql connections, if any.

> o  In my JSP files, I set the following at the top of each:
>
> <%@ page lanuage="java" pageEncoding="UTF-8" %>

Try to change this to

<%@ page lanuage="java" pageEncoding="UTF-8" contentType="text/html;charset=UTF-8" %>

> Now, to test this, I go to a japanese page, copy some text, then paste
> it into a form, that gets submitted to the server and saved into the DB.
> Then I try to display what I got back from the database.  It comes out
> garbled.  HOWEVER- if I leave the 'pageEncoding' out of my display .jsp
> file it still comes out garbled, UNTIL I set UTF-8 manually in my
> browsers Character Encoding settings (both mozilla and IE).  Then the
> japanese characters render fine (just like I entered them).
>
> Very strange.  What's confusing is that when I set the pageEncoding to
> 'UTF-8', the characters don't render properly, and as far as I can tell,
> thats the same as manually setting the browser manually.  I must be
> doing something wrong because I get the same results in IE and mozilla
> (recent build).
>
> What may be the problem- I don't do anything differently when getting
> the data out of the database, just standard
> resultset.getString("column");  Do I need to change that call, to handle
> the potentially UTF-8 encoded strings?  I can't find anything on that at
> all with google/usenet.

Have you tried putting unicode characters inside the db using pgsql? And
then showing them using the web-tier?

I have used tomcat as the webapp-server for many applications, and it
defaults to ISO-8859-1 character set for POST forms. Strange it is.
You can change this by calling
request.setCharacterEncoding("UTF-8");
before you get any data from your form.

Maybe the pageEncoding="UTF-8" changes this? I have not used that option
before.

Please check again that the data that you put in the database using JDBC
is not garbage due to characterset conversion.

> Any and all help, suggestions or pointers would be greatly appreciated.


I hope this helps,
Anders

--
Anders Hermansen
YoYo Mobile as