Strange encoding problems - Mailing list pgsql-jdbc

From Shilad Sen
Subject Strange encoding problems
Date
Msg-id 20030910052721.GA5724@nokomis.shilad.com
Whole thread Raw
List pgsql-jdbc
Greetings,

I'm trying to track down an encoding problem I'm having.

I've created a database encoded in latin1 (createdb -E latin1).  I've imported
some data.  I believe this all worked, because when I use psql, the encoded
characters "look" correct.

When a query returns the funny character (in this case Latin1 233, an e with
an acute accent), it looks to me like it returns a utf-8 encoding of the
latin1 encodings of the character.

For example, under Postgresql 7.2 when I set the charSet property to latin1, I
get back the latin1 characters 195 and 169 where 233 should be.  Note that
these two characters form the UTF-8 encoding for character 233.

Under Postgresql 7.3, I get back the UTF-8 characters 195 and 169.  This
probably has to do with the fact that the server is multi-byte.

I'm using the 7.3.4 jdbc drivers.

I've poked through the source, and an even stranger thing is that under 7.2,
the wire-encoding for the character is 2 bytes long, while in 7.3 the wire
encoding is 4 bytes.  Again, in both cases, the decoded value is two
characters that, when treated as bytes would form a utf-8 encoding for 233.

Wow - this is kind of confusing to explain.

It's possible that I totally missed the boat on some configuration step.
Anybody have any insights?

Shilad

pgsql-jdbc by date:

Previous
From: Barry Lind
Date:
Subject: Re: contrib/ltree
Next
From: Andreas Prohaska
Date:
Subject: Streaming binary data into db, difference between Blob and LargeO bject