Thread: Character encoding conversion

Character encoding conversion

From
Mike Blackwell
Date:
I have a database which was originally created with LATIN1 encoding.
I'd like to move it to UTF8.  The data will load ok (COPY)  but I am
getting 'invalid byte sequence for encoding..." messages when accessing
the data.

Is there a way to automatically convert the offending characters, or to
easily locate them in a pg_dump file so they can be converted by hand?

Mike



Re: Character encoding conversion

From
"Shoaib Mir"
Date:
On Feb 9, 2008 7:28 PM, Mike Blackwell <maiku41@sbcglobal.net> wrote:
I have a database which was originally created with LATIN1 encoding.
I'd like to move it to UTF8.  The data will load ok (COPY)  but I am
getting 'invalid byte sequence for encoding..." messages when accessing
the data.

Is there a way to automatically convert the offending characters, or to
easily locate them in a pg_dump file so they can be converted by hand?



Try using 'iconv' for your dump file and convert it all to UTF8 first before restoring to a UTF8 database.

--
Shoaib Mir
Fujitsu Australia Software Technology
shoaibm[@]fast.fujitsu.com.au

Re: Character encoding conversion

From
Michael Fuhr
Date:
On Sat, Feb 09, 2008 at 08:28:58AM -0600, Mike Blackwell wrote:
> I have a database which was originally created with LATIN1 encoding.
> I'd like to move it to UTF8.  The data will load ok (COPY)  but I am
> getting 'invalid byte sequence for encoding..." messages when accessing
> the data.

What's the complete error message?  What's the output of the following
query in a session that gets the error?

select name, setting from pg_settings where name ~ 'encoding';

> Is there a way to automatically convert the offending characters, or to
> easily locate them in a pg_dump file so they can be converted by hand?

You said the data loaded without error, right?  Let's see if we can
figure out why queries are failing before considering what corrective
action to take.

--
Michael Fuhr

Re: Character encoding conversion

From
Tom Lane
Date:
Mike Blackwell <maiku41@sbcglobal.net> writes:
> I have a database which was originally created with LATIN1 encoding.
> I'd like to move it to UTF8.  The data will load ok (COPY)  but I am
> getting 'invalid byte sequence for encoding..." messages when accessing
> the data.

How exactly did you move the data?  And what PG version are we talking
about?

> Is there a way to automatically convert the offending characters, or to
> easily locate them in a pg_dump file so they can be converted by hand?

If you've got client_encoding and server_encoding set up correctly,
it should happen automatically during the COPY.  I think you messed
that up somehow, but it's not clear just how.

            regards, tom lane