How are Unicode characters stored internally, in Postgres? - Mailing list pgsql-jdbc

From Kyung Lee
Subject How are Unicode characters stored internally, in Postgres?
Date
Msg-id 20030309044313.26775.qmail@web40307.mail.yahoo.com
Whole thread Raw
Responses Re: How are Unicode characters stored internally, in
List pgsql-jdbc
I have come across an interesting problem, that I
hope someone can help me solve.

PROBLEM: (short version)
I can/have entered unicode characters (more
specifically, Chinese characters) into a postgres
db, in 2 "different" formats. One works for some
applications, and one works for others. So, I
would like some additional information as to how
Chinese (or Unicode, in general) characters are
stored internally in postgres.

ENVIRONMENT:
I have postgres 7.3.2. My database is encoded as
UNICODE. Using java and jdbc3 driver.

PROBLEM: (Extended Version)
I have entered Chinese characters into a
unicode-encoded postgres db in 2 different
"ways". Let me explain. When I parse a file,
containing Chinese characters, those characters
go into the db one "way". When I use an HTML form
to submit characters into the db, those
characters go into the db a different "way". How
do I know this? When I retrieve the characters,
and try to display them in a browser, the first
way (from a parsed file) just shows question
marks, but the second way (from an HTML form)
shows the characters correctly. When I use psql
to view the way that was parsed by a file, it is
not question marks, but looks like some sort of
encoding. That encoding, is different from the
encoding of the way submitted by the HTML form.
Now ultimately, I am trying to display the
Chinese characters in Flash. Flash has Unicode
support, and assumes UTF-8 character encoding.
Now, when I send the characters from the first
way, it displays all the chinese characters
correctly/perfectly. When I send the chinese
characters from the second way, it only shows
some of the characters, and the others are just
not displayed at all. Let's forget about the
Flash issue, it was just mentioned to point out
the 2 different ways (I think) the Chinese
characters are stored in postgres.
So, this leads me to a few questions:

1. If I don't specify a client-encoding param,
via an environment variable, or as a param on the
postgres driver, what is the default, when the db
is encoded as UNICODE?

2. I noticed something in the postgres
documentation. In the section discussing Multibye
Support
(http://www.postgresql.org/docs/view.php?version=7.3&file=multibyte.html,
Table 7-2), it shows UNICODE as an available
client encoding, but not when the server is
encoded as UNICODE. Why is that? Other server
encodings have the same listed as client
encodings (i.e. SQL_ASCII as a server encoding
can have SQL_ASCII as the client encoding as
well).


Sorry for the long message, and thanks in advance
for any help.


__________________________________________________
Do you Yahoo!?
Yahoo! Tax Center - forms, calculators, tips, more
http://taxes.yahoo.com/

pgsql-jdbc by date:

Previous
From: "Carl Olivier"
Date:
Subject: Re: Recommended Data Mappings
Next
From: Dave Cramer
Date:
Subject: Re: How are Unicode characters stored internally, in