Thread: Character encoding conversion
I have a database which was originally created with LATIN1 encoding. I'd like to move it to UTF8. The data will load ok (COPY) but I am getting 'invalid byte sequence for encoding..." messages when accessing the data. Is there a way to automatically convert the offending characters, or to easily locate them in a pg_dump file so they can be converted by hand? Mike
On Feb 9, 2008 7:28 PM, Mike Blackwell <maiku41@sbcglobal.net> wrote:
Try using 'iconv' for your dump file and convert it all to UTF8 first before restoring to a UTF8 database.
--
Shoaib Mir
Fujitsu Australia Software Technology
shoaibm[@]fast.fujitsu.com.au
I have a database which was originally created with LATIN1 encoding.
I'd like to move it to UTF8. The data will load ok (COPY) but I am
getting 'invalid byte sequence for encoding..." messages when accessing
the data.
Is there a way to automatically convert the offending characters, or to
easily locate them in a pg_dump file so they can be converted by hand?
Try using 'iconv' for your dump file and convert it all to UTF8 first before restoring to a UTF8 database.
--
Shoaib Mir
Fujitsu Australia Software Technology
shoaibm[@]fast.fujitsu.com.au
On Sat, Feb 09, 2008 at 08:28:58AM -0600, Mike Blackwell wrote: > I have a database which was originally created with LATIN1 encoding. > I'd like to move it to UTF8. The data will load ok (COPY) but I am > getting 'invalid byte sequence for encoding..." messages when accessing > the data. What's the complete error message? What's the output of the following query in a session that gets the error? select name, setting from pg_settings where name ~ 'encoding'; > Is there a way to automatically convert the offending characters, or to > easily locate them in a pg_dump file so they can be converted by hand? You said the data loaded without error, right? Let's see if we can figure out why queries are failing before considering what corrective action to take. -- Michael Fuhr
Mike Blackwell <maiku41@sbcglobal.net> writes: > I have a database which was originally created with LATIN1 encoding. > I'd like to move it to UTF8. The data will load ok (COPY) but I am > getting 'invalid byte sequence for encoding..." messages when accessing > the data. How exactly did you move the data? And what PG version are we talking about? > Is there a way to automatically convert the offending characters, or to > easily locate them in a pg_dump file so they can be converted by hand? If you've got client_encoding and server_encoding set up correctly, it should happen automatically during the COPY. I think you messed that up somehow, but it's not clear just how. regards, tom lane