Thread: Character Encoding Confusion

Character Encoding Confusion

From
"Markus Wollny"
Date:
Hi!

We've been using PostgreSQL (currently 7.4) via ODBC with ColdFusion
until now and didn't experience any problems. But now we want to move
from ColdFusion 4.5 to ColdFusion MX, thus abandoning ODBC and migrating
to JDBC.

As ODBC seems to be blissfully unaware of any character encodings
whatsoever, so were we - our databases are encoded in SQL_ASCII,
although we have stored german special chars (ÄÖÜäöü and ß), and from
what I have read so far, these are stored as multibyte and thus exceed
the SQL-ASCII specification.

With ODBC we never noticed the mistake we'd made. Now with
JDBC/ColdFusion MX 6.1, we see all sorts of weird characters on our
web-application, but not the ones which are stored in the database.

I tried setting different character sets for the JDBC-driver, using the
URL-syntax
jdbc:postgresql://123.456.789.012:5432/database?charSet=characterSet
with charSet=iso-8859-1 or charSet=UTF-8 for example, but that just
change anything.

Now is there some way to elegantly resolve the issue without dropping
and recreating the databases in order to change the encoding? Can we
somehow get the JDBC-driver to act just as the ODBC-driver did -
silently passing on the "bad" characters without changing anything?

And if there is just no way to avoid that, what's the correct procedure
for changing the encoding anyway? How would I be able to migrate the
current data without any data-loss and with the least possible downtime?

Kind regards

    Markus

Re: Character Encoding Confusion

From
Kris Jurka
Date:

On Mon, 8 Mar 2004, Markus Wollny wrote:

> Hi!
>
> As ODBC seems to be blissfully unaware of any character encodings
> whatsoever, so were we - our databases are encoded in SQL_ASCII,
> although we have stored german special chars (ÄÖÜäöü and ß), and from
> what I have read so far, these are stored as multibyte and thus exceed
> the SQL-ASCII specification.
>
> With ODBC we never noticed the mistake we'd made. Now with
> JDBC/ColdFusion MX 6.1, we see all sorts of weird characters on our
> web-application, but not the ones which are stored in the database.
>
> I tried setting different character sets for the JDBC-driver, using the
> URL-syntax
> jdbc:postgresql://123.456.789.012:5432/database?charSet=characterSet
> with charSet=iso-8859-1 or charSet=UTF-8 for example, but that just
> change anything.
>
> Now is there some way to elegantly resolve the issue without dropping
> and recreating the databases in order to change the encoding? Can we
> somehow get the JDBC-driver to act just as the ODBC-driver did -
> silently passing on the "bad" characters without changing anything?
>

The JDBC driver needs the data encoded correctly, the ?charSet= option
only works on 7.2 and earlier databases because then multibyte was not
compiled in by default.  This will require a dump and reload.

Kris Jurka

Re: Character Encoding Confusion

From
Peter Eisentraut
Date:
Markus Wollny wrote:
> As ODBC seems to be blissfully unaware of any character encodings
> whatsoever, so were we - our databases are encoded in SQL_ASCII,
> although we have stored german special chars (ÄÖÜäöü and ß), and from
> what I have read so far, these are stored as multibyte and thus
> exceed the SQL-ASCII specification.

SQL_ASCII is not a real encoding, it simply means to pass bytes through
without looking at them.  If you want to get sensible behavior with
German characters, you should use LATIN9 as the server encoding.