Thread: Questions about encoding between two databases

Questions about encoding between two databases

From
Archibald Zimonyi
Date:
Hello,

I am sitting on version 7.4.x and am going to upgrade to version 8.3.x.
From all I can read I should have no problem with actual format of the
pgdump file (for actual dumping and restoring purposes) but I am
having problems with encoding (which I was fairly sure I would). I have
searched the web for solutions and one solution given (in one thread where
Tom Lane answered) was to set the correct encoding in the version 8.3.x
database.

However, the default encoding in the version 8.3.x instance is
currently UTF8 and I am happy with that. The encoding for most of the
databases in the version 7.4.x was LATIN1. Is there any way I can ignore
the LATIN1 encoding and force the database to accept the UTF8 encoding of
the new version 8.3.x instance?

I get the below message when I try the psql -f <file> <database> command.

psql:aranzo20090812:30: ERROR:  encoding LATIN1 does not match server's
locale en_US.UTF-8
DETAIL:  The server's LC_CTYPE setting requires encoding UTF8.

Any help would be appreciated.

Archie

Re: Questions about encoding between two databases

From
Adrian Klaver
Date:
On Thursday 20 August 2009 11:45:30 pm Archibald Zimonyi wrote:
> Hello,
>
> I am sitting on version 7.4.x and am going to upgrade to version 8.3.x.
> From all I can read I should have no problem with actual format of the
> pgdump file (for actual dumping and restoring purposes) but I am
> having problems with encoding (which I was fairly sure I would). I have
> searched the web for solutions and one solution given (in one thread where
> Tom Lane answered) was to set the correct encoding in the version 8.3.x
> database.
>
> However, the default encoding in the version 8.3.x instance is
> currently UTF8 and I am happy with that. The encoding for most of the
> databases in the version 7.4.x was LATIN1. Is there any way I can ignore
> the LATIN1 encoding and force the database to accept the UTF8 encoding of
> the new version 8.3.x instance?
>
> I get the below message when I try the psql -f <file> <database> command.
>
> psql:aranzo20090812:30: ERROR:  encoding LATIN1 does not match server's
> locale en_US.UTF-8
> DETAIL:  The server's LC_CTYPE setting requires encoding UTF8.
>
> Any help would be appreciated.
>
> Archie

To get the question out of the way, is there a reason you are not upgrading to
latest version, 8.4?

Suggestion below is untested:
Use pg_dump from 8.3.x to dump from 7.4 database.

From here:
http://www.postgresql.org/docs/8.3/interactive/app-pgdump.html

"
-E encoding
--encoding=encoding

    Create the dump in the specified character set encoding. By default, the
dump is created in the database encoding. (Another way to get the same result
is to set the PGCLIENTENCODING environment variable to the desired dump
encoding.)  "

Use the encoding switch to create the dump in UTF8.


--
Adrian Klaver
aklaver@comcast.net

Re: Questions about encoding between two databases

From
Archibald Zimonyi
Date:

On Fri, 21 Aug 2009, Adrian Klaver wrote:

> On Thursday 20 August 2009 11:45:30 pm Archibald Zimonyi wrote:
>> Hello,
>>
>> I am sitting on version 7.4.x and am going to upgrade to version 8.3.x.
>> From all I can read I should have no problem with actual format of the
>> pgdump file (for actual dumping and restoring purposes) but I am
>> having problems with encoding (which I was fairly sure I would). I have
>> searched the web for solutions and one solution given (in one thread where
>> Tom Lane answered) was to set the correct encoding in the version 8.3.x
>> database.
>>
>> However, the default encoding in the version 8.3.x instance is
>> currently UTF8 and I am happy with that. The encoding for most of the
>> databases in the version 7.4.x was LATIN1. Is there any way I can ignore
>> the LATIN1 encoding and force the database to accept the UTF8 encoding of
>> the new version 8.3.x instance?
>>
>> I get the below message when I try the psql -f <file> <database> command.
>>
>> psql:aranzo20090812:30: ERROR:  encoding LATIN1 does not match server's
>> locale en_US.UTF-8
>> DETAIL:  The server's LC_CTYPE setting requires encoding UTF8.
>>
>> Any help would be appreciated.
>>
>> Archie
>
> To get the question out of the way, is there a reason you are not upgrading to
> latest version, 8.4?
>
Yes, I use Debian stable which which as far as I know only has 8.3.x as
its latest version. But it shouldn't really matter in this case as I would
most likely have the same problem with 8.4.x.

> Suggestion below is untested:
> Use pg_dump from 8.3.x to dump from 7.4 database.
>
The two version are located on two different machines, so probably not
possible.

> From here:
> http://www.postgresql.org/docs/8.3/interactive/app-pgdump.html
>
> "
> -E encoding
> --encoding=encoding
>
>    Create the dump in the specified character set encoding. By default, the
> dump is created in the database encoding. (Another way to get the same result
> is to set the PGCLIENTENCODING environment variable to the desired dump
> encoding.)  "
>
> Use the encoding switch to create the dump in UTF8.
>
I will look at this PGCLIENTENCODING variable to see if I can set that in
7.4.x but does anyone know the answer to it already? Would it work?

Will that also work with pg_dumpall?

Thanks for the response so far.

Archie

Re: Questions about encoding between two databases

From
Archibald Zimonyi
Date:
Hello,

I tired changing the client_encoding setting but there was no differance
in the result.

I went into the generated dump file and (more wish then anything else)
tried to simply change the encoding from LATIN1 to UTF8 and then load the
file, it did not complain about incorrect encoding setting for the load,
however it complained that the characters did not match true UTF8
characters (which was almost what I guessed would happen).

So back to square one again.

Archie

>
> On Fri, 21 Aug 2009, Adrian Klaver wrote:
>
>> On Thursday 20 August 2009 11:45:30 pm Archibald Zimonyi wrote:
>>> Hello,
>>>
>>> I am sitting on version 7.4.x and am going to upgrade to version 8.3.x.
>>> From all I can read I should have no problem with actual format of the
>>> pgdump file (for actual dumping and restoring purposes) but I am
>>> having problems with encoding (which I was fairly sure I would). I have
>>> searched the web for solutions and one solution given (in one thread where
>>> Tom Lane answered) was to set the correct encoding in the version 8.3.x
>>> database.
>>>
>>> However, the default encoding in the version 8.3.x instance is
>>> currently UTF8 and I am happy with that. The encoding for most of the
>>> databases in the version 7.4.x was LATIN1. Is there any way I can ignore
>>> the LATIN1 encoding and force the database to accept the UTF8 encoding of
>>> the new version 8.3.x instance?
>>>
>>> I get the below message when I try the psql -f <file> <database> command.
>>>
>>> psql:aranzo20090812:30: ERROR:  encoding LATIN1 does not match server's
>>> locale en_US.UTF-8
>>> DETAIL:  The server's LC_CTYPE setting requires encoding UTF8.
>>>
>>> Any help would be appreciated.
>>>
>>> Archie
>>
>> To get the question out of the way, is there a reason you are not upgrading
>> to
>> latest version, 8.4?
>>
> Yes, I use Debian stable which which as far as I know only has 8.3.x as its
> latest version. But it shouldn't really matter in this case as I would most
> likely have the same problem with 8.4.x.
>
>> Suggestion below is untested:
>> Use pg_dump from 8.3.x to dump from 7.4 database.
>>
> The two version are located on two different machines, so probably not
> possible.
>
>> From here:
>> http://www.postgresql.org/docs/8.3/interactive/app-pgdump.html
>>
>> "
>> -E encoding
>> --encoding=encoding
>>
>>    Create the dump in the specified character set encoding. By default, the
>> dump is created in the database encoding. (Another way to get the same
>> result
>> is to set the PGCLIENTENCODING environment variable to the desired dump
>> encoding.)  "
>>
>> Use the encoding switch to create the dump in UTF8.
>>
> I will look at this PGCLIENTENCODING variable to see if I can set that in
> 7.4.x but does anyone know the answer to it already? Would it work?
>
> Will that also work with pg_dumpall?
>
> Thanks for the response so far.
>
> Archie
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: Questions about encoding between two databases

From
Tom Lane
Date:
Archibald Zimonyi <arsi@aranzo.netg.se> writes:
> I went into the generated dump file and (more wish then anything else)
> tried to simply change the encoding from LATIN1 to UTF8 and then load the
> file, it did not complain about incorrect encoding setting for the load,
> however it complained that the characters did not match true UTF8
> characters (which was almost what I guessed would happen).

Indeed.  Do *not* change the client_encoding setting in the dump file.
You can edit the ENCODING options in the CREATE DATABASE commands
though.  (Didn't we explain this to you already?)

            regards, tom lane

Re: Questions about encoding between two databases

From
Archibald Zimonyi
Date:
Hello,

> Archibald Zimonyi <arsi@aranzo.netg.se> writes:
>> I went into the generated dump file and (more wish then anything else)
>> tried to simply change the encoding from LATIN1 to UTF8 and then load the
>> file, it did not complain about incorrect encoding setting for the load,
>> however it complained that the characters did not match true UTF8
>> characters (which was almost what I guessed would happen).
>
> Indeed.  Do *not* change the client_encoding setting in the dump file.
> You can edit the ENCODING options in the CREATE DATABASE commands
> though.  (Didn't we explain this to you already?)
>
>             regards, tom lane
>
Well, I did send this query with an incorrect email address so it got
stuck and was never posted properly, so I have not seen any such reply.
Can you please explain again?

The ENCODING options in the CREATE DATABASE commands, yet these commands
exist in the dump file. I don't understand.

But yes, after my change, the databases schemas were all created with UTF8
so that part worked, but of course the actual text which was LATIN1 before
failed for those character sets where UTF8 differs from LATIN1, so it
still fails.

I will try using iconv as suggested in another reply, but shouldn't that
then mean I need to change the client_encoding (so that it matches)?

Archie

Re: Questions about encoding between two databases

From
Alvaro Herrera
Date:
Archibald Zimonyi wrote:
>
> Hello,
>
> >Archibald Zimonyi <arsi@aranzo.netg.se> writes:
> >>I went into the generated dump file and (more wish then anything else)
> >>tried to simply change the encoding from LATIN1 to UTF8 and then load the
> >>file, it did not complain about incorrect encoding setting for the load,
> >>however it complained that the characters did not match true UTF8
> >>characters (which was almost what I guessed would happen).
> >
> >Indeed.  Do *not* change the client_encoding setting in the dump file.
> >You can edit the ENCODING options in the CREATE DATABASE commands
> >though.  (Didn't we explain this to you already?)
> >

> Well, I did send this query with an incorrect email address so it
> got stuck and was never posted properly, so I have not seen any such
> reply. Can you please explain again?

Search the archives: http://archives.postgresql.org/

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Questions about encoding between two databases

From
Archibald Zimonyi
Date:
Hello,

iconv seemed to work fine. I converted the dump file from LATIN1 to UFT8
and kept the changes in the client_encoding (in the dump file) and loaded
them all into the database.

No complains. I still need to verify the result but at least I got no
restore errors based on character encoding.

Thanks for the tips.

Archie

> Archibald Zimonyi wrote:
>>
>> Hello,
>>
>>> Archibald Zimonyi <arsi@aranzo.netg.se> writes:
>>>> I went into the generated dump file and (more wish then anything else)
>>>> tried to simply change the encoding from LATIN1 to UTF8 and then load the
>>>> file, it did not complain about incorrect encoding setting for the load,
>>>> however it complained that the characters did not match true UTF8
>>>> characters (which was almost what I guessed would happen).
>>>
>>> Indeed.  Do *not* change the client_encoding setting in the dump file.
>>> You can edit the ENCODING options in the CREATE DATABASE commands
>>> though.  (Didn't we explain this to you already?)
>>>
>
>> Well, I did send this query with an incorrect email address so it
>> got stuck and was never posted properly, so I have not seen any such
>> reply. Can you please explain again?
>
> Search the archives: http://archives.postgresql.org/
>
> --
> Alvaro Herrera                                http://www.CommandPrompt.com/
> PostgreSQL Replication, Consulting, Custom Development, 24x7 support
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>