Thread: How do I change the server encoding?

How do I change the server encoding?

From
Joseph Shraibman
Date:
I have a server that has LATIN1 encoding.  I want to convert it to run UTF encoding.  How
do I do that?  Simply changing the encoding in a dump file does not work.


Re: How do I change the server encoding?

From
Antti Haapala
Date:
On Mon, 24 Feb 2003, Joseph Shraibman wrote:

> I have a server that has LATIN1 encoding.  I want to convert it to run UTF encoding.  How
> do I do that?  Simply changing the encoding in a dump file does not work.

So have you done both of these:
    - dropped and recreated your db with encoding 'utf-8'
    - converted your dumps to utf-8 or
      added set client_encoding to 'latin1' in the dumps

--
Antti Haapala


Re: How do I change the server encoding?

From
Philippe Kiener
Date:
Hello
I have the same question that Joseph Shraibman.
I have dump the db, created a new db with utf-8 encoding

My database should be transform from SQL_ASCII to utf-8

I have added that line to my dumps:

SET CLIENT_ENCODING TO 'SQL_ASCII';

Now when I load the dump into my db, I get that error on tables with text:

psql:tcom-database.sql:7111: ERROR:  copy: line 1, Invalid UNICODE character
sequence found (0xe96500)
psql:tcom-database.sql:7111: lost synchronization with server, resetting
connection
psql:tcom-database.sql:7409: ERROR:  copy: line 1, Invalid UNICODE character
sequence found (0xe97265)
psql:tcom-database.sql:7409: lost synchronization with server, resetting
connection
psql:tcom-database.sql:7456: ERROR:  copy: line 3, Invalid UNICODE character
sequence found (0xe90007)
psql:tcom-database.sql:7456: lost synchronization with server, resetting
connection
psql:tcom-database.sql:7468: ERROR:  copy: line 6, Invalid UNICODE character
sequence found (0xe97300)



Any ideas?

Thanks for your help.

Philippe Kiener


Le  25.2.2003 8:55, "Antti Haapala" <antti.haapala@iki.fi> wrote:

>
> On Mon, 24 Feb 2003, Joseph Shraibman wrote:
>
>> I have a server that has LATIN1 encoding.  I want to convert it to run UTF
>> encoding.  How
>> do I do that?  Simply changing the encoding in a dump file does not work.


>
> So have you done both of these:
> - dropped and recreated your db with encoding 'utf-8'
> - converted your dumps to utf-8 or
>  added set client_encoding to 'latin1' in the dumps


Re: How do I change the server encoding?

From
Peter Eisentraut
Date:
Philippe Kiener writes:

> My database should be transform from SQL_ASCII to utf-8
>
> I have added that line to my dumps:
>
> SET CLIENT_ENCODING TO 'SQL_ASCII';
>
> Now when I load the dump into my db, I get that error on tables with text:
>
> psql:tcom-database.sql:7111: ERROR:  copy: line 1, Invalid UNICODE character
> sequence found (0xe96500)

The client encoding SQL_ASCII means that the data will be passed through
unchanged.  Try setting it to LATIN1.

--
Peter Eisentraut   peter_e@gmx.net


Re: How do I change the server encoding?

From
Joseph Shraibman
Date:
Peter Eisentraut wrote:
> Philippe Kiener writes:
>
>
>>My database should be transform from SQL_ASCII to utf-8
>>
>>I have added that line to my dumps:
>>
>>SET CLIENT_ENCODING TO 'SQL_ASCII';
>>
>>Now when I load the dump into my db, I get that error on tables with text:
>>
>>psql:tcom-database.sql:7111: ERROR:  copy: line 1, Invalid UNICODE character
>>sequence found (0xe96500)
>
>
> The client encoding SQL_ASCII means that the data will be passed through
> unchanged.  Try setting it to LATIN1.
>
I tried with latin1 and it didn't work.


Re: How do I change the server encoding?

From
Joseph Shraibman
Date:
Joseph Shraibman wrote:
After further experimenting I think the problem is in psql.  When I try
update mytable set firstname = 'Oné' where ukey =  12911;

It works with a latin1 database, but when I try it on a unicode database:

utfowl=# update mytable set firstname = 'Oné' where ukey =  12911;
utfowl'#

It thinks there is an open quote or something.  This is even if I set the client encoding
to be latin1.  Of course dumps are read with the copy command but maybe it is the same
problem.


Re: How do I change the server encoding?

From
Antti Haapala
Date:
On Tue, 25 Feb 2003, Joseph Shraibman wrote:

> Peter Eisentraut wrote:
> > Philippe Kiener writes:
> >>
> >>My database should be transform from SQL_ASCII to utf-8
> >>
> >>I have added that line to my dumps:
> >>
> >>SET CLIENT_ENCODING TO 'SQL_ASCII';
> >>
> >>Now when I load the dump into my db, I get that error on tables with text:
> >>
> >>psql:tcom-database.sql:7111: ERROR:  copy: line 1, Invalid UNICODE character
> >>sequence found (0xe96500)
> >
> >
> > The client encoding SQL_ASCII means that the data will be passed through
> > unchanged.  Try setting it to LATIN1.
> >
> I tried with latin1 and it didn't work.

Hmm... still caused errors? I think that because newer dumps have those
\connects, you need to add explicit char set settings after all of those.

The better way would be converting the whole dump with iconv, though.
Iconv comes by default with many unixen. For example command

    iconv -f iso-8859-1 -t utf-8 < text_dump > text_dump_converted

will convert your dump from latin1 to utf-8.

--
Antti Haapala


Re: How do I change the server encoding? SOLVED

From
Joseph Shraibman
Date:
Joseph Shraibman wrote:
> Joseph Shraibman wrote:
> After further experimenting I think the problem is in psql.  When I try
> update mytable set firstname = 'Oné' where ukey =  12911;
>
> It works with a latin1 database, but when I try it on a unicode database:
>
> utfowl=# update mytable set firstname = 'Oné' where ukey =  12911;
> utfowl'#
>
> It thinks there is an open quote or something.  This is even if I set
> the client encoding to be latin1.  Of course dumps are read with the
> copy command but maybe it is the same problem.
>
I solved the problem.  "set client_encoding = 'latin1';" does not work, but "\encoding
latin1" does.  I suggest that pg_dump put a  "\encoding <encoding>" after every \connect
in the dump.  I would do this myself but I can't figure out where that is done in the dump
program.

I did modify pg_dump.c so the encoding used during the dump can be specified on the
command line, but since that isn't what solved the problem I'm not sure there is a point
to having it.  Is anyone interested?