Thread: How do I change the server encoding?
I have a server that has LATIN1 encoding. I want to convert it to run UTF encoding. How do I do that? Simply changing the encoding in a dump file does not work.
On Mon, 24 Feb 2003, Joseph Shraibman wrote: > I have a server that has LATIN1 encoding. I want to convert it to run UTF encoding. How > do I do that? Simply changing the encoding in a dump file does not work. So have you done both of these: - dropped and recreated your db with encoding 'utf-8' - converted your dumps to utf-8 or added set client_encoding to 'latin1' in the dumps -- Antti Haapala
Hello I have the same question that Joseph Shraibman. I have dump the db, created a new db with utf-8 encoding My database should be transform from SQL_ASCII to utf-8 I have added that line to my dumps: SET CLIENT_ENCODING TO 'SQL_ASCII'; Now when I load the dump into my db, I get that error on tables with text: psql:tcom-database.sql:7111: ERROR: copy: line 1, Invalid UNICODE character sequence found (0xe96500) psql:tcom-database.sql:7111: lost synchronization with server, resetting connection psql:tcom-database.sql:7409: ERROR: copy: line 1, Invalid UNICODE character sequence found (0xe97265) psql:tcom-database.sql:7409: lost synchronization with server, resetting connection psql:tcom-database.sql:7456: ERROR: copy: line 3, Invalid UNICODE character sequence found (0xe90007) psql:tcom-database.sql:7456: lost synchronization with server, resetting connection psql:tcom-database.sql:7468: ERROR: copy: line 6, Invalid UNICODE character sequence found (0xe97300) Any ideas? Thanks for your help. Philippe Kiener Le 25.2.2003 8:55, "Antti Haapala" <antti.haapala@iki.fi> wrote: > > On Mon, 24 Feb 2003, Joseph Shraibman wrote: > >> I have a server that has LATIN1 encoding. I want to convert it to run UTF >> encoding. How >> do I do that? Simply changing the encoding in a dump file does not work. > > So have you done both of these: > - dropped and recreated your db with encoding 'utf-8' > - converted your dumps to utf-8 or > added set client_encoding to 'latin1' in the dumps
Philippe Kiener writes: > My database should be transform from SQL_ASCII to utf-8 > > I have added that line to my dumps: > > SET CLIENT_ENCODING TO 'SQL_ASCII'; > > Now when I load the dump into my db, I get that error on tables with text: > > psql:tcom-database.sql:7111: ERROR: copy: line 1, Invalid UNICODE character > sequence found (0xe96500) The client encoding SQL_ASCII means that the data will be passed through unchanged. Try setting it to LATIN1. -- Peter Eisentraut peter_e@gmx.net
Peter Eisentraut wrote: > Philippe Kiener writes: > > >>My database should be transform from SQL_ASCII to utf-8 >> >>I have added that line to my dumps: >> >>SET CLIENT_ENCODING TO 'SQL_ASCII'; >> >>Now when I load the dump into my db, I get that error on tables with text: >> >>psql:tcom-database.sql:7111: ERROR: copy: line 1, Invalid UNICODE character >>sequence found (0xe96500) > > > The client encoding SQL_ASCII means that the data will be passed through > unchanged. Try setting it to LATIN1. > I tried with latin1 and it didn't work.
Joseph Shraibman wrote: After further experimenting I think the problem is in psql. When I try update mytable set firstname = 'Oné' where ukey = 12911; It works with a latin1 database, but when I try it on a unicode database: utfowl=# update mytable set firstname = 'Oné' where ukey = 12911; utfowl'# It thinks there is an open quote or something. This is even if I set the client encoding to be latin1. Of course dumps are read with the copy command but maybe it is the same problem.
On Tue, 25 Feb 2003, Joseph Shraibman wrote: > Peter Eisentraut wrote: > > Philippe Kiener writes: > >> > >>My database should be transform from SQL_ASCII to utf-8 > >> > >>I have added that line to my dumps: > >> > >>SET CLIENT_ENCODING TO 'SQL_ASCII'; > >> > >>Now when I load the dump into my db, I get that error on tables with text: > >> > >>psql:tcom-database.sql:7111: ERROR: copy: line 1, Invalid UNICODE character > >>sequence found (0xe96500) > > > > > > The client encoding SQL_ASCII means that the data will be passed through > > unchanged. Try setting it to LATIN1. > > > I tried with latin1 and it didn't work. Hmm... still caused errors? I think that because newer dumps have those \connects, you need to add explicit char set settings after all of those. The better way would be converting the whole dump with iconv, though. Iconv comes by default with many unixen. For example command iconv -f iso-8859-1 -t utf-8 < text_dump > text_dump_converted will convert your dump from latin1 to utf-8. -- Antti Haapala
Joseph Shraibman wrote: > Joseph Shraibman wrote: > After further experimenting I think the problem is in psql. When I try > update mytable set firstname = 'Oné' where ukey = 12911; > > It works with a latin1 database, but when I try it on a unicode database: > > utfowl=# update mytable set firstname = 'Oné' where ukey = 12911; > utfowl'# > > It thinks there is an open quote or something. This is even if I set > the client encoding to be latin1. Of course dumps are read with the > copy command but maybe it is the same problem. > I solved the problem. "set client_encoding = 'latin1';" does not work, but "\encoding latin1" does. I suggest that pg_dump put a "\encoding <encoding>" after every \connect in the dump. I would do this myself but I can't figure out where that is done in the dump program. I did modify pg_dump.c so the encoding used during the dump can be specified on the command line, but since that isn't what solved the problem I'm not sure there is a point to having it. Is anyone interested?