> henka@cityweb.co.za writes:
>>> It should be in the dump file, almost the first line. Locale is of no
>>> interest to pg_dump, you'll have to decide how you want it.
>
>> Yes: UTF-8 and the other is LATIN1
>
> Note that this represents what the original server *thought* the
> encoding was. But it's not at all impossible that the server thought
> the data was LATIN1 when it was really UTF8. (The other way around is
> less plausible because the server would have been able to detect
> encoding errors.) If you were using clients that treated the data
> as UTF8 without paying attention to what the server thought, you'd
> not have realized you were mislabeling the data.
>
> But, if you tried to load data marked as LATIN1 into a server using
> UTF8, it'd have applied a LATIN1 to UTF8 conversion, and then
> everything's hosed.
>
> I'd suggest actually inspecting the data in the dump file: it's not that
> hard to tell UTF8 from LATIN1 if you look at the byte sequences.
>
> Or you could just take the file marked LATIN1, edit it to change the
> client_encoding setting to say the data is UTF8, and see if you can
> load it. If it's not UTF8, 8.1.4 will almost certainly detect a ton of
> encoding errors.
Thanks Tom, your suggestion worked.
Just to document this for others, this is what I did:
- created a new empty DB: initdb -ELATIN1 -D data.
- edited dump file with UTF8 encoding and changed to LATIN1 (doing the
reverse resulted in encoding errors during restore).
- restored database
So, it looks like it was the reverse: the db thought it was UTF8, when in
fact it was LATIN1.
Regards