Thread: Questions about encoding between two databases
Hello, I am sitting on version 7.4.x and am going to upgrade to version 8.3.x. From all I can read I should have no problem with actual format of the pgdump file (for actual dumping and restoring purposes) but I am having problems with encoding (which I was fairly sure I would). I have searched the web for solutions and one solution given (in one thread where Tom Lane answered) was to set the correct encoding in the version 8.3.x database. However, the default encoding in the version 8.3.x instance is currently UTF8 and I am happy with that. The encoding for most of the databases in the version 7.4.x was LATIN1. Is there any way I can ignore the LATIN1 encoding and force the database to accept the UTF8 encoding of the new version 8.3.x instance? I get the below message when I try the psql -f <file> <database> command. psql:aranzo20090812:30: ERROR: encoding LATIN1 does not match server's locale en_US.UTF-8 DETAIL: The server's LC_CTYPE setting requires encoding UTF8. Any help would be appreciated. Archie
On Thursday 20 August 2009 11:45:30 pm Archibald Zimonyi wrote: > Hello, > > I am sitting on version 7.4.x and am going to upgrade to version 8.3.x. > From all I can read I should have no problem with actual format of the > pgdump file (for actual dumping and restoring purposes) but I am > having problems with encoding (which I was fairly sure I would). I have > searched the web for solutions and one solution given (in one thread where > Tom Lane answered) was to set the correct encoding in the version 8.3.x > database. > > However, the default encoding in the version 8.3.x instance is > currently UTF8 and I am happy with that. The encoding for most of the > databases in the version 7.4.x was LATIN1. Is there any way I can ignore > the LATIN1 encoding and force the database to accept the UTF8 encoding of > the new version 8.3.x instance? > > I get the below message when I try the psql -f <file> <database> command. > > psql:aranzo20090812:30: ERROR: encoding LATIN1 does not match server's > locale en_US.UTF-8 > DETAIL: The server's LC_CTYPE setting requires encoding UTF8. > > Any help would be appreciated. > > Archie To get the question out of the way, is there a reason you are not upgrading to latest version, 8.4? Suggestion below is untested: Use pg_dump from 8.3.x to dump from 7.4 database. From here: http://www.postgresql.org/docs/8.3/interactive/app-pgdump.html " -E encoding --encoding=encoding Create the dump in the specified character set encoding. By default, the dump is created in the database encoding. (Another way to get the same result is to set the PGCLIENTENCODING environment variable to the desired dump encoding.) " Use the encoding switch to create the dump in UTF8. -- Adrian Klaver aklaver@comcast.net
On Fri, 21 Aug 2009, Adrian Klaver wrote: > On Thursday 20 August 2009 11:45:30 pm Archibald Zimonyi wrote: >> Hello, >> >> I am sitting on version 7.4.x and am going to upgrade to version 8.3.x. >> From all I can read I should have no problem with actual format of the >> pgdump file (for actual dumping and restoring purposes) but I am >> having problems with encoding (which I was fairly sure I would). I have >> searched the web for solutions and one solution given (in one thread where >> Tom Lane answered) was to set the correct encoding in the version 8.3.x >> database. >> >> However, the default encoding in the version 8.3.x instance is >> currently UTF8 and I am happy with that. The encoding for most of the >> databases in the version 7.4.x was LATIN1. Is there any way I can ignore >> the LATIN1 encoding and force the database to accept the UTF8 encoding of >> the new version 8.3.x instance? >> >> I get the below message when I try the psql -f <file> <database> command. >> >> psql:aranzo20090812:30: ERROR: encoding LATIN1 does not match server's >> locale en_US.UTF-8 >> DETAIL: The server's LC_CTYPE setting requires encoding UTF8. >> >> Any help would be appreciated. >> >> Archie > > To get the question out of the way, is there a reason you are not upgrading to > latest version, 8.4? > Yes, I use Debian stable which which as far as I know only has 8.3.x as its latest version. But it shouldn't really matter in this case as I would most likely have the same problem with 8.4.x. > Suggestion below is untested: > Use pg_dump from 8.3.x to dump from 7.4 database. > The two version are located on two different machines, so probably not possible. > From here: > http://www.postgresql.org/docs/8.3/interactive/app-pgdump.html > > " > -E encoding > --encoding=encoding > > Create the dump in the specified character set encoding. By default, the > dump is created in the database encoding. (Another way to get the same result > is to set the PGCLIENTENCODING environment variable to the desired dump > encoding.) " > > Use the encoding switch to create the dump in UTF8. > I will look at this PGCLIENTENCODING variable to see if I can set that in 7.4.x but does anyone know the answer to it already? Would it work? Will that also work with pg_dumpall? Thanks for the response so far. Archie
Hello, I tired changing the client_encoding setting but there was no differance in the result. I went into the generated dump file and (more wish then anything else) tried to simply change the encoding from LATIN1 to UTF8 and then load the file, it did not complain about incorrect encoding setting for the load, however it complained that the characters did not match true UTF8 characters (which was almost what I guessed would happen). So back to square one again. Archie > > On Fri, 21 Aug 2009, Adrian Klaver wrote: > >> On Thursday 20 August 2009 11:45:30 pm Archibald Zimonyi wrote: >>> Hello, >>> >>> I am sitting on version 7.4.x and am going to upgrade to version 8.3.x. >>> From all I can read I should have no problem with actual format of the >>> pgdump file (for actual dumping and restoring purposes) but I am >>> having problems with encoding (which I was fairly sure I would). I have >>> searched the web for solutions and one solution given (in one thread where >>> Tom Lane answered) was to set the correct encoding in the version 8.3.x >>> database. >>> >>> However, the default encoding in the version 8.3.x instance is >>> currently UTF8 and I am happy with that. The encoding for most of the >>> databases in the version 7.4.x was LATIN1. Is there any way I can ignore >>> the LATIN1 encoding and force the database to accept the UTF8 encoding of >>> the new version 8.3.x instance? >>> >>> I get the below message when I try the psql -f <file> <database> command. >>> >>> psql:aranzo20090812:30: ERROR: encoding LATIN1 does not match server's >>> locale en_US.UTF-8 >>> DETAIL: The server's LC_CTYPE setting requires encoding UTF8. >>> >>> Any help would be appreciated. >>> >>> Archie >> >> To get the question out of the way, is there a reason you are not upgrading >> to >> latest version, 8.4? >> > Yes, I use Debian stable which which as far as I know only has 8.3.x as its > latest version. But it shouldn't really matter in this case as I would most > likely have the same problem with 8.4.x. > >> Suggestion below is untested: >> Use pg_dump from 8.3.x to dump from 7.4 database. >> > The two version are located on two different machines, so probably not > possible. > >> From here: >> http://www.postgresql.org/docs/8.3/interactive/app-pgdump.html >> >> " >> -E encoding >> --encoding=encoding >> >> Create the dump in the specified character set encoding. By default, the >> dump is created in the database encoding. (Another way to get the same >> result >> is to set the PGCLIENTENCODING environment variable to the desired dump >> encoding.) " >> >> Use the encoding switch to create the dump in UTF8. >> > I will look at this PGCLIENTENCODING variable to see if I can set that in > 7.4.x but does anyone know the answer to it already? Would it work? > > Will that also work with pg_dumpall? > > Thanks for the response so far. > > Archie > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >
Archibald Zimonyi <arsi@aranzo.netg.se> writes: > I went into the generated dump file and (more wish then anything else) > tried to simply change the encoding from LATIN1 to UTF8 and then load the > file, it did not complain about incorrect encoding setting for the load, > however it complained that the characters did not match true UTF8 > characters (which was almost what I guessed would happen). Indeed. Do *not* change the client_encoding setting in the dump file. You can edit the ENCODING options in the CREATE DATABASE commands though. (Didn't we explain this to you already?) regards, tom lane
Hello, > Archibald Zimonyi <arsi@aranzo.netg.se> writes: >> I went into the generated dump file and (more wish then anything else) >> tried to simply change the encoding from LATIN1 to UTF8 and then load the >> file, it did not complain about incorrect encoding setting for the load, >> however it complained that the characters did not match true UTF8 >> characters (which was almost what I guessed would happen). > > Indeed. Do *not* change the client_encoding setting in the dump file. > You can edit the ENCODING options in the CREATE DATABASE commands > though. (Didn't we explain this to you already?) > > regards, tom lane > Well, I did send this query with an incorrect email address so it got stuck and was never posted properly, so I have not seen any such reply. Can you please explain again? The ENCODING options in the CREATE DATABASE commands, yet these commands exist in the dump file. I don't understand. But yes, after my change, the databases schemas were all created with UTF8 so that part worked, but of course the actual text which was LATIN1 before failed for those character sets where UTF8 differs from LATIN1, so it still fails. I will try using iconv as suggested in another reply, but shouldn't that then mean I need to change the client_encoding (so that it matches)? Archie
Archibald Zimonyi wrote: > > Hello, > > >Archibald Zimonyi <arsi@aranzo.netg.se> writes: > >>I went into the generated dump file and (more wish then anything else) > >>tried to simply change the encoding from LATIN1 to UTF8 and then load the > >>file, it did not complain about incorrect encoding setting for the load, > >>however it complained that the characters did not match true UTF8 > >>characters (which was almost what I guessed would happen). > > > >Indeed. Do *not* change the client_encoding setting in the dump file. > >You can edit the ENCODING options in the CREATE DATABASE commands > >though. (Didn't we explain this to you already?) > > > Well, I did send this query with an incorrect email address so it > got stuck and was never posted properly, so I have not seen any such > reply. Can you please explain again? Search the archives: http://archives.postgresql.org/ -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Hello, iconv seemed to work fine. I converted the dump file from LATIN1 to UFT8 and kept the changes in the client_encoding (in the dump file) and loaded them all into the database. No complains. I still need to verify the result but at least I got no restore errors based on character encoding. Thanks for the tips. Archie > Archibald Zimonyi wrote: >> >> Hello, >> >>> Archibald Zimonyi <arsi@aranzo.netg.se> writes: >>>> I went into the generated dump file and (more wish then anything else) >>>> tried to simply change the encoding from LATIN1 to UTF8 and then load the >>>> file, it did not complain about incorrect encoding setting for the load, >>>> however it complained that the characters did not match true UTF8 >>>> characters (which was almost what I guessed would happen). >>> >>> Indeed. Do *not* change the client_encoding setting in the dump file. >>> You can edit the ENCODING options in the CREATE DATABASE commands >>> though. (Didn't we explain this to you already?) >>> > >> Well, I did send this query with an incorrect email address so it >> got stuck and was never posted properly, so I have not seen any such >> reply. Can you please explain again? > > Search the archives: http://archives.postgresql.org/ > > -- > Alvaro Herrera http://www.CommandPrompt.com/ > PostgreSQL Replication, Consulting, Custom Development, 24x7 support > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >