Re: Locale/encoding problem/question - Mailing list pgsql-general

From Tom Lane
Subject Re: Locale/encoding problem/question
Date
Msg-id 5345.1154696391@sss.pgh.pa.us
Whole thread Raw
In response to Re: Locale/encoding problem/question  (henka@cityweb.co.za)
Responses Re: Locale/encoding problem/question  (henka@cityweb.co.za)
List pgsql-general
henka@cityweb.co.za writes:
>> It should be in the dump file, almost the first line. Locale is of no
>> interest to pg_dump, you'll have to decide how you want it.

> Yes:  UTF-8 and the other is LATIN1

Note that this represents what the original server *thought* the
encoding was.  But it's not at all impossible that the server thought
the data was LATIN1 when it was really UTF8.  (The other way around is
less plausible because the server would have been able to detect
encoding errors.)  If you were using clients that treated the data
as UTF8 without paying attention to what the server thought, you'd
not have realized you were mislabeling the data.

But, if you tried to load data marked as LATIN1 into a server using
UTF8, it'd have applied a LATIN1 to UTF8 conversion, and then
everything's hosed.

I'd suggest actually inspecting the data in the dump file: it's not that
hard to tell UTF8 from LATIN1 if you look at the byte sequences.

Or you could just take the file marked LATIN1, edit it to change the
client_encoding setting to say the data is UTF8, and see if you can
load it.  If it's not UTF8, 8.1.4 will almost certainly detect a ton of
encoding errors.

            regards, tom lane

pgsql-general by date:

Previous
From: henka@cityweb.co.za
Date:
Subject: Re: Locale/encoding problem/question
Next
From: Q Beukes
Date:
Subject: pg_dump sequence problem