Latin1 to UTF-8 ? - Mailing list pgsql-general

From Aarni Ruuhimäki
Subject Latin1 to UTF-8 ?
Date
Msg-id 200708031537.20276.aarni@kymi.com
Whole thread Raw
Responses Re: Latin1 to UTF-8 ?  (Peter Eisentraut <peter_e@gmx.net>)
List pgsql-general
Hi,

I've set up a new CentOs server with PostgreSQL 8.2.4 and initdb'ed it with
UTF-8.

Ok, and runs fine.

I have a problem with encodings, however. And mainly with the russian cyrillic
characters.

When I testdumped some dbs from the old FC / Pg 8.0.2, all Latin1, I noticed
that some of the dumps show in the Konqueror file browser as 'Plain Text
Documents' and some as 'C++ Source Files'. Both have Latin1 as client
encoding at the top of the files. Changing that gives errors, as expected.

Looking in to the plain text dumps I see all cyrillic characters as Р...
and these go in display fine from the new server's UTF-8 environment.

Some of the 'C++' files have the cyrillics as 'îñåòèòåëåé'. Some have both
'îñåòèòåëåé' and Р... and ofcourse the 'îñåò' characters come out wrong
and unreadable to the browser. (not sure if you an see single quoted ones,
but they look something like hebrew or similar)

I have no idea what browsers / encodings or even keyboard layouts have been
used when the data has been inserted by users through their web
interfaces ...

I tried the -F p switch as the earlier version has no -E for dumps. Same
output. Also with pg_dumpall.

I tried various encodings with iconv too.

So, what would be the proper way to convert the dumps to UTF-8 ? Or any other
solution ? Any other tool to work with the problem files ?

BR,

Aarni
--
Aarni Ruuhimäki


pgsql-general by date:

Previous
From: "Gavin M. Roy"
Date:
Subject: Re: What do people like to monitor (or in other words, what might be nice in pgsnmpd)?
Next
From: Devrim GÜNDÜZ
Date:
Subject: Re: Suse RPM's