Greetings,
I'm trying to track down an encoding problem I'm having.
I've created a database encoded in latin1 (createdb -E latin1). I've imported
some data. I believe this all worked, because when I use psql, the encoded
characters "look" correct.
When a query returns the funny character (in this case Latin1 233, an e with
an acute accent), it looks to me like it returns a utf-8 encoding of the
latin1 encodings of the character.
For example, under Postgresql 7.2 when I set the charSet property to latin1, I
get back the latin1 characters 195 and 169 where 233 should be. Note that
these two characters form the UTF-8 encoding for character 233.
Under Postgresql 7.3, I get back the UTF-8 characters 195 and 169. This
probably has to do with the fact that the server is multi-byte.
I'm using the 7.3.4 jdbc drivers.
I've poked through the source, and an even stranger thing is that under 7.2,
the wire-encoding for the character is 2 bytes long, while in 7.3 the wire
encoding is 4 bytes. Again, in both cases, the decoded value is two
characters that, when treated as bytes would form a utf-8 encoding for 233.
Wow - this is kind of confusing to explain.
It's possible that I totally missed the boat on some configuration step.
Anybody have any insights?
Shilad