Thread: How are Unicode characters stored internally, in Postgres?

How are Unicode characters stored internally, in Postgres?

From
Kyung Lee
Date:
I have come across an interesting problem, that I
hope someone can help me solve.

PROBLEM: (short version)
I can/have entered unicode characters (more
specifically, Chinese characters) into a postgres
db, in 2 "different" formats. One works for some
applications, and one works for others. So, I
would like some additional information as to how
Chinese (or Unicode, in general) characters are
stored internally in postgres.

ENVIRONMENT:
I have postgres 7.3.2. My database is encoded as
UNICODE. Using java and jdbc3 driver.

PROBLEM: (Extended Version)
I have entered Chinese characters into a
unicode-encoded postgres db in 2 different
"ways". Let me explain. When I parse a file,
containing Chinese characters, those characters
go into the db one "way". When I use an HTML form
to submit characters into the db, those
characters go into the db a different "way". How
do I know this? When I retrieve the characters,
and try to display them in a browser, the first
way (from a parsed file) just shows question
marks, but the second way (from an HTML form)
shows the characters correctly. When I use psql
to view the way that was parsed by a file, it is
not question marks, but looks like some sort of
encoding. That encoding, is different from the
encoding of the way submitted by the HTML form.
Now ultimately, I am trying to display the
Chinese characters in Flash. Flash has Unicode
support, and assumes UTF-8 character encoding.
Now, when I send the characters from the first
way, it displays all the chinese characters
correctly/perfectly. When I send the chinese
characters from the second way, it only shows
some of the characters, and the others are just
not displayed at all. Let's forget about the
Flash issue, it was just mentioned to point out
the 2 different ways (I think) the Chinese
characters are stored in postgres.
So, this leads me to a few questions:

1. If I don't specify a client-encoding param,
via an environment variable, or as a param on the
postgres driver, what is the default, when the db
is encoded as UNICODE?

2. I noticed something in the postgres
documentation. In the section discussing Multibye
Support
(http://www.postgresql.org/docs/view.php?version=7.3&file=multibyte.html,
Table 7-2), it shows UNICODE as an available
client encoding, but not when the server is
encoded as UNICODE. Why is that? Other server
encodings have the same listed as client
encodings (i.e. SQL_ASCII as a server encoding
can have SQL_ASCII as the client encoding as
well).


Sorry for the long message, and thanks in advance
for any help.


__________________________________________________
Do you Yahoo!?
Yahoo! Tax Center - forms, calculators, tips, more
http://taxes.yahoo.com/

Re: How are Unicode characters stored internally, in

From
Dave Cramer
Date:
I think your observation 2 below refers to client translation

the table shows what is possible to be translated which is why UNICODE
-> UNICODE isn't there

Dave
On Sat, 2003-03-08 at 23:43, Kyung Lee wrote:
> I have come across an interesting problem, that I
> hope someone can help me solve.
>
> PROBLEM: (short version)
> I can/have entered unicode characters (more
> specifically, Chinese characters) into a postgres
> db, in 2 "different" formats. One works for some
> applications, and one works for others. So, I
> would like some additional information as to how
> Chinese (or Unicode, in general) characters are
> stored internally in postgres.
>
> ENVIRONMENT:
> I have postgres 7.3.2. My database is encoded as
> UNICODE. Using java and jdbc3 driver.
>
> PROBLEM: (Extended Version)
> I have entered Chinese characters into a
> unicode-encoded postgres db in 2 different
> "ways". Let me explain. When I parse a file,
> containing Chinese characters, those characters
> go into the db one "way". When I use an HTML form
> to submit characters into the db, those
> characters go into the db a different "way". How
> do I know this? When I retrieve the characters,
> and try to display them in a browser, the first
> way (from a parsed file) just shows question
> marks, but the second way (from an HTML form)
> shows the characters correctly. When I use psql
> to view the way that was parsed by a file, it is
> not question marks, but looks like some sort of
> encoding. That encoding, is different from the
> encoding of the way submitted by the HTML form.
> Now ultimately, I am trying to display the
> Chinese characters in Flash. Flash has Unicode
> support, and assumes UTF-8 character encoding.
> Now, when I send the characters from the first
> way, it displays all the chinese characters
> correctly/perfectly. When I send the chinese
> characters from the second way, it only shows
> some of the characters, and the others are just
> not displayed at all. Let's forget about the
> Flash issue, it was just mentioned to point out
> the 2 different ways (I think) the Chinese
> characters are stored in postgres.
> So, this leads me to a few questions:
>
> 1. If I don't specify a client-encoding param,
> via an environment variable, or as a param on the
> postgres driver, what is the default, when the db
> is encoded as UNICODE?
>
> 2. I noticed something in the postgres
> documentation. In the section discussing Multibye
> Support
> (http://www.postgresql.org/docs/view.php?version=7.3&file=multibyte.html,
> Table 7-2), it shows UNICODE as an available
> client encoding, but not when the server is
> encoded as UNICODE. Why is that? Other server
> encodings have the same listed as client
> encodings (i.e. SQL_ASCII as a server encoding
> can have SQL_ASCII as the client encoding as
> well).
>
>
> Sorry for the long message, and thanks in advance
> for any help.
>
>
> __________________________________________________
> Do you Yahoo!?
> Yahoo! Tax Center - forms, calculators, tips, more
> http://taxes.yahoo.com/
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faqs/FAQ.html
--
Dave Cramer <Dave@micro-automation.net>