Re: [GENERAL] postgres & server encodings - Mailing list pgsql-admin

From Tom Lane
Subject Re: [GENERAL] postgres & server encodings
Date
Msg-id 17412.1123608663@sss.pgh.pa.us
Whole thread Raw
In response to Re: [GENERAL] postgres & server encodings  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
List pgsql-admin
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> The problem only shows up when you have mixed data -- say, you have two
> applications, one website in PHP which inserts data in Latin-1, and a
> Windows app which inserts in UTF-8.  In this case your data will be a
> mess to fix, and there's no way a single conversion will get it right.
> You will have to manually separate the parts that are UTF8 from the
> Latin1, and import them separately.  Not a position I'd like to be in.

The only helpful tip I can think of is that you can try to import data
into a UTF8 database and see if it gets rejected as badly encoded; this
will at least give you a weak tool to separate what's what.

I'm afraid the reverse direction won't help much --- in single-byte
encodings such as Latin1 there are no encoding errors, and so you can't
do any simple filtering to check in that direction.  In the end you're
going to have to eyeball a lot of data for plausibility :-(

            regards, tom lane

pgsql-admin by date:

Previous
From: Chris Hoover
Date:
Subject: PG 7.3.4 VS PG 8.0.3 Problem
Next
From: Greg Stark
Date:
Subject: Re: [GENERAL] postgres & server encodings