Re: Why is an ISO-8859-8 database allowing values not within that set? - Mailing list pgsql-general

From Craig Ringer
Subject Re: Why is an ISO-8859-8 database allowing values not within that set?
Date
Msg-id 500BED19.1050208@ringerc.id.au
Whole thread Raw
In response to Re: Why is an ISO-8859-8 database allowing values not within that set?  (Herouth Maoz <herouth@unicell.co.il>)
List pgsql-general
On 07/22/2012 03:58 PM, Herouth Maoz wrote:
Thanks. That makes sense. The default client encoding on the reports database is ISO-8859-8, so I guess when I don't set it using \encoding, it does exactly what you say.

OK, so I'm still looking for a way to convert illegal characters into something that won't collide with my encoding (asterisks or whatever).


As far as I know, PostgreSQL's encoding handling functions do not offer substitution for unsupported characters, nor does the built-in client<->server charset translation feature. You could do it with a regular expression replacement of any character not in a class that contains every char in valid in the target encoding. It feels like a very clunky approach though.

An alternative is to use a procedural language that DOES support lossy character encoding conversions. I don't think plpython does and plpgsql certainly doesn't if PostgreSQL its self doesn't. I'd be amazed if plperl didn't support lossy conversions, but I haven't done much with Perl in years.

It'd be handy if Pg's client<->server conversion supported lossy conversions for this kind of thing. Honestly I'm not sad it doesn't, because it'd be something people would misuse to make the error messages they didn't understand go away - then come back and complain that PostgreSQL ate their data later.

--
Craig Ringer

pgsql-general by date:

Previous
From: Herouth Maoz
Date:
Subject: Re: Why is an ISO-8859-8 database allowing values not within that set?
Next
From: Berend Tober
Date:
Subject: How to ;ist all table foreign key dependency relationships