Thread: PostgreSQL JDBC Driver versus Encoding

PostgreSQL JDBC Driver versus Encoding

From
João Paulo Pires
Date:

Hello,

 

I have a PostgreSQL database which encoding is SQL_ASCII.

I inserted information into that database from a MySQL database, using Java code, which encoding is UTF-8, and the information in the destination database has some strange characters related with encoding problems.

 

There is any additional parameter that we can consider in the PostrgreSQL connection string to deal with those problems?

The problem should be related with the encoding used to connect to MySQL database?

 

Best regards,

 

João Paulo Pires

Developer

 

Attachment

Re: PostgreSQL JDBC Driver versus Encoding

From
Oliver Jowett
Date:
João Paulo Pires wrote:

> I have a PostgreSQL database which encoding is SQL_ASCII.

This is the cause of your problem. SQL_ASCII provides essentially no
encoding information at all, so the JDBC driver does not know how to
translate between the database and Java's internal UTF-16
representation. If you are going to be inserting anything other than
7-bit-ASCII text into it, you are going to have problems.

If you can, try using a database encoding that can represent the data
you are inserting. UTF8 / UNICODE is probably a good choice unless you
have special requirements. See
http://www.postgresql.org/docs/8.2/static/multibyte.html especially the
bit that says:

> The SQL_ASCII setting behaves considerably differently from the other settings. When the server character set is
SQL_ASCII,the server interprets byte values 0-127 according to the ASCII standard, while byte values 128-255 are taken
asuninterpreted characters. No encoding conversion will be done when the setting is SQL_ASCII. Thus, this setting is
notso much a declaration that a specific encoding is in use, as a declaration of ignorance about the encoding. In most
cases,if you are working with any non-ASCII data, it is unwise to use the SQL_ASCII setting, because PostgreSQL will be
unableto help you by converting or validating non-ASCII characters. 

-O

Re: PostgreSQL JDBC Driver versus Encoding

From
John R Pierce
Date:
João Paulo Pires wrote:
>
> Hello,
>
>
>
> I have a PostgreSQL database which encoding is SQL_ASCII.
>
> I inserted information into that database from a MySQL database, using
> Java code, which encoding is UTF-8, and the information in the
> destination database has some strange characters related with encoding
> problems.
>
>
>
> There is any additional parameter that we can consider in the
> PostrgreSQL connection string to deal with those problems?
>
> The problem should be related with the encoding used to connect to
> MySQL database?
>
>
>
you can't encode all symbols in UTF-8 with simple ASCII.   Symbols that
are the same will translate but ASCII only has 200 or so glyphs.  you'll
need to convert the database to UTF-8, then redo your inserts.