Thread: Mixed client_encoding database question

Mixed client_encoding database question

From

"J. Hondius"

Date:

11 January 2011, 05:01:15

Hi all,

We have a product  with a PostgreSQLdatabase (unicode)
It has a (legacy) client app that uses client_encoding = win1521
The legacy app will be around for several years.
It cannot change to unicode.

There also is a new client app, written in java.
It uses client_encoding = unicode.

The problem: if someone inputs a non-win1252 character via the new java
app, the legacy app fails.
PostgreSQL returns an error: character blabla of encodig UTF-8 has no
equivalent in WIN1252.

What are my options to stop the legacy app from failing?
-Is there a setting to tell PostgreSQL to fail the character gracefully
by substuting it?
-Should i filter out all non win1252 charachters at the new java app and
substitute them? (not so nice, i'd like to have the chars in the db)
-?more?

Greetings, Joek

Re: Mixed client_encoding database question

From

Craig Ringer

Date:

11 January 2011, 06:00:13

On 01/11/2011 02:32 AM, J. Hondius wrote:

> What are my options to stop the legacy app from failing?

IMO there aren't any good ones except "fix the legacy app not to be
brain-dead about character encodings".

> -Is there a setting to tell PostgreSQL to fail the character gracefully
> by substuting it?

The problem there is that a round-trip via the client is no longer a
safe no-change operation.

UPDATE tablename SET x = x WHERE ....;

no longer means the same as:

SELECT x FROM tablename WHERE ....;
-- client reads value of 'x'
UPDATE tablename SET x = 'value-from-client';

This is particularly critical in cases where the client may issue an
UPDATE that lists values for all fields, changed and not, when making a
change to a field. This is common in ORM systems, which usually list all
fields mapped by the object without caring which ones might actually be
different from the DB.

If you can verify that the legacy app never UPDATEs fields that might
contain textual data containing unicode chars inserted/updated via the
Java app, you could probably get away with this approach. Alas, I don't
think you can do it via Pg settings, you'd have to modify your legacy
app's SQL to remap the chars, which you probably can't do. Maybe you
could modify the Pg ODBC driver to do the remapping you desire -
assuming your legacy app talks to the database via ODBC.

> -Should i filter out all non win1252 charachters at the new java app and

... thus prolonging your pain. Once  the legacy app is gone, you'll be
stuck with the limitations of the Java app, which by then you probably
won't able to or allowed to change to permit unicode for one reason or
another.

If you *do* decide to filter out unsupported chars, make sure it's
configurable in the app without source code changes and that both modes
get covered by your unit tests. Be sure it's ALWAYS ready to switch to
unicode as soon as the legacy stuff is gone.

This is probably what I'd do in your situation, if I couldn't fix or
retire the legacy app. I'd just make really, really sure I could always
enable full char encoding support later.

> substitute them? (not so nice, i'd like to have the chars in the db)

Ick.

--
Craig Ringer

Re: Mixed client_encoding database question

From

rsmogura

Date:

11 January 2011, 10:33:15

 Hi,

 I suggest you to recreate database in WIN1521 encoding, so Java will
 not write unsupported characters. As well you can configure JDBC in way
 it will allow You to change client encoding, so you can change it to
 WIN. In both of above You will get error when you will try to write
 unsupported characters.

 You may also do
 http://www.postgresql.org/docs/8.4/interactive/sql-createconversion.html

 Radosław Smogura,
 http://softperience.eu

 On Mon, 10 Jan 2011 16:32:13 +0100, J. Hondius wrote:
> Hi all,
>
> We have a product  with a PostgreSQLdatabase (unicode)
> It has a (legacy) client app that uses client_encoding = win1521
> The legacy app will be around for several years.
> It cannot change to unicode.
>
> There also is a new client app, written in java.
> It uses client_encoding = unicode.
>
> The problem: if someone inputs a non-win1252 character via the new
> java app, the legacy app fails.
> PostgreSQL returns an error: character blabla of encodig UTF-8 has no
> equivalent in WIN1252.
>
> What are my options to stop the legacy app from failing?
> -Is there a setting to tell PostgreSQL to fail the character
> gracefully by substuting it?

> -Should i filter out all non win1252 charachters at the new java app
> and substitute them? (not so nice, i'd like to have the chars in the
> db)
> -?more?
>
> Greetings, Joek

Re: Mixed client_encoding database question

From

Alban Hertroys

Date:

11 January 2011, 14:01:56

On 10 Jan 2011, at 16:32, J. Hondius wrote:

> Hi all,
>
> We have a product  with a PostgreSQLdatabase (unicode)
> It has a (legacy) client app that uses client_encoding = win1521
> The legacy app will be around for several years.
> It cannot change to unicode.
>
> There also is a new client app, written in java.
> It uses client_encoding = unicode.

I think you may be able to set the client encoding at the user level. In that case you could use a different user for
thelegacy app than you use for the java app and have the appropriate client encoding for both cases. 

try: ALTER ROLE foo SET client_encoding TO 'win1251';

Note: This is just off the top of my head, I haven't actually verified any of this.

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.

!DSPAM:737,4d2c9b0a11871070112136!