Thread: v8.3 + UTF8 errors when restoring DB
New to the list, so please forgive me in advance :)
I've been running 8.2 on windows server 2003 for quite some time now. The database that I take care of stores records with various languages in it (russian, chinese, etc) and has been working fine.
I did a PG_dump of the database tonight, and went to restore it to my newly created database (WIN2152 encoding), which worked fine, but when I go to retrieve the data via my code (.NET c#), I get errors like :
character 0x8f of encoding "WIN1252" has no equivalent in "UTF8"
character 0x81 of encoding "WIN1252" has no equivalent in "UTF8"
character 0x9d of encoding "WIN1252" has no equivalent in "UTF8"
The database driver is set to unicode mode, so I turned that off, but of course end up with gibberish instead of the russian and chinese that were there before.
So then I tried creating a new database with unicode formatting and restoring my 8.2 dump file to that, when I do that, I get these errors (using PSQL):
ERROR: invalid byte sequence for encoding "UTF8": 0x92
HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".
CONTEXT: COPY cin_tbl_data, line 6
ERROR: invalid byte sequence for encoding "UTF8": 0x99
HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".
so I just don't know what to do. my 8.2 database used SET client_encoding = 'SQL_ASCII'; according to the dump file - so I can't wrap my head around why I can't get this data to come out.
Any help would be very much appreciated.
Just as an update to this matter, I created another new database with SQL_ASCII encoding and imported the data, and it worked fine, however, I don’t think that’s the best solution because SQL_ASCII’s not a good way to have the database encoding set, right?
Thanks again for any suggestions.
Mitch
From: Mitchell D. Russell [mailto:mitchell.russell@gmail.com]
Sent: Sunday, March 09, 2008 5:46 AM
To: 'pgsql-general@postgresql.org'
Subject: v8.3 + UTF8 errors when restoring DB
New to the list, so please forgive me in advance :)
I've been running 8.2 on windows server 2003 for quite some time now. The database that I take care of stores records with various languages in it (russian, chinese, etc) and has been working fine.
I did a PG_dump of the database tonight, and went to restore it to my newly created database (WIN2152 encoding), which worked fine, but when I go to retrieve the data via my code (.NET c#), I get errors like :
character 0x8f of encoding "WIN1252" has no equivalent in "UTF8"
character 0x81 of encoding "WIN1252" has no equivalent in "UTF8"
character 0x9d of encoding "WIN1252" has no equivalent in "UTF8"
The database driver is set to unicode mode, so I turned that off, but of course end up with gibberish instead of the russian and chinese that were there before.
So then I tried creating a new database with unicode formatting and restoring my 8.2 dump file to that, when I do that, I get these errors (using PSQL):
ERROR: invalid byte sequence for encoding "UTF8": 0x92
HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".
CONTEXT: COPY cin_tbl_data, line 6
ERROR: invalid byte sequence for encoding "UTF8": 0x99
HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".
so I just don't know what to do. my 8.2 database used SET client_encoding = 'SQL_ASCII'; according to the dump file - so I can't wrap my head around why I can't get this data to come out.
Any help would be very much appreciated.
On Sun, Mar 09, 2008 at 06:12:14AM -0400, Mitchell D. Russell wrote: > Just as an update to this matter, I created another new database with > SQL_ASCII encoding and imported the data, and it worked fine, however, I > don't think that's the best solution because SQL_ASCII's not a good way to > have the database encoding set, right? SQL_ASCII means "no encoding". Everything becomes a bunch of bytes and there's no checking whatsoever. > I did a PG_dump of the database tonight, and went to restore it to my newly > created database (WIN2152 encoding), which worked fine, but when I go to > retrieve the data via my code (.NET c#), I get errors like : Any particular reason you're using that encoding? Why not just use UTF-8? Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Please line up in a tree and maintain the heap invariant while > boarding. Thank you for flying nlogn airlines.
Attachment
New to the list, so please forgive me in advance :)
I've been running 8.2 on windows server 2003 for quite some time now. The database that I take care of stores records with various languages in it (russian, chinese, etc) and has been working fine.
I did a PG_dump of the database tonight, and went to restore it to my newly created database (WIN2152 encoding), which worked fine, but when I go to retrieve the data via my code (.NET c#), I get errors like :
character 0x8f of encoding "WIN1252" has no equivalent in "UTF8"
character 0x81 of encoding "WIN1252" has no equivalent in "UTF8"
character 0x9d of encoding "WIN1252" has no equivalent in "UTF8"
The database driver is set to unicode mode, so I turned that off, but of course end up with gibberish instead of the russian and chinese that were there before.
So then I tried creating a new database with unicode formatting and restoring my 8.2 dump file to that, when I do that, I get these errors (using PSQL):
ERROR: invalid byte sequence for encoding "UTF8": 0x92
HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".
CONTEXT: COPY cin_tbl_data, line 6
ERROR: invalid byte sequence for encoding "UTF8": 0x99
HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".
so I just don't know what to do. my 8.2 database used SET client_encoding = 'SQL_ASCII'; according to the dump file - so I can't wrap my head around why I can't get this data to come out.
Any help would be very much appreciated.
Well, you partially hosed yourself when you did the dump using a 'client_encoding' of 'SQL_ASCII'. If the 8.2 database is still around, redo pg_dump using the '-E' parameter to set the client_encoding to UTF-8, then import the data into your 8.3 database created with a UTF-8 encoding.
If you don't have the 8.2 server around any more, it's a bit more complicated. How did you do the pg_dump (what format)?
-- Mail to my list address MUST be sent via the mailing list. All other mail to my list address will bounce.
Dean:
I did the dump as so:
Psql –Upostgres databasename > c:\temp\dump.sql
I think the database was set to SQL_ASCII before I dumped it, because when I did the 2nd restore last night to a new SQL_ASCII database, it worked fine, so that’s where I am at right now, the database is running fine on 8.3 as SQL_ASCII but I don’t want to keep it like that.
So what do you suggest I do now? Make another 8.3 database in UTF8 and dump the 8.3 SQL_ASCII one using –E with UTF8? What will it do to the characters that can’t be interpreted?
Thanks
Mitch
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Dean Gibson (DB Administrator)
Sent: Sunday, March 09, 2008 9:58 AM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] v8.3 + UTF8 errors when restoring DB
Well, you partially hosed yourself when you did the dump using a 'client_encoding' of 'SQL_ASCII'. If the 8.2 database is still around, redo pg_dump using the '-E' parameter to set the client_encoding to UTF-8, then import the data into your 8.3 database created with a UTF-8 encoding.
If you don't have the 8.2 server around any more, it's a bit more complicated. How did you do the pg_dump (what format)?
--
Mail to my list address MUST be sent via the mailing list.
All other mail to my list address will bounce.
I assume you meant pg_dump, not psql.Dean:
I did the dump as so: psql –Upostgres databasename > c:\temp\dump.sql
I don't think your idea will work.
I think the database was set to SQL_ASCII before I dumped it, because when I did the 2nd restore last night to a new SQL_ASCII database, it worked fine, so that’s where I am at right now, the database is running fine on 8.3 as SQL_ASCII but I don’t want to keep it like that.
So what do you suggest I do now? Make another 8.3 database in UTF8 and dump the 8.3 SQL_ASCII one using –E with UTF8? What will it do to the characters that can’t be interpreted?
Here's what I would try: Your 'dump.sql' file is an ASCII file, which you can edit. Hopefully you have an editor that can handle it (or if on Linux, you can use the 'sed' program to do what I'm about to suggest):
The file should contain a "SET client_encoding = SQL_ASCII" line near the beginning. Make a copy of that file, and change that line to "SET client_encoding = WIN1252" in the copied file, and then try importing the modified dump file into a UTF-8 database.
If that doesn't work, I have another suggestion if you are running on Linux.
-- Mail to my list address MUST be sent via the mailing list. All other mail to my list address will bounce.
On 09/03/2008 18:49, Mitchell D. Russell wrote: > I think the database was set to SQL_ASCII before I dumped it, because > when I did the 2^nd restore last night to a new SQL_ASCII database, it > worked fine, so that’s where I am at right now, the database is running > fine on 8.3 as SQL_ASCII but I don’t want to keep it like that. As an aside, I'm wondering if you're confusing the encoding of the database and the encoding used by the client (in this case pg_dump). They are two different things; the database encoding is set at creation time, while the encoding used by the client is usually set when that client connects (with a "SET client_encoding" statement); and Postgres translates happily between the two. Forgive me if I'm wrong....just reading between the lines. :-) Ray. --------------------------------------------------------------- Raymond O'Donnell, Director of Music, Galway Cathedral, Ireland rod@iol.ie ---------------------------------------------------------------
Mitchell D. Russell escribió: > I did a PG_dump of the database tonight, and went to restore it to my newly > created database (WIN2152 encoding), which worked fine, but when I go to > retrieve the data via my code (.NET c#), I get errors like : > > character 0x8f of encoding "WIN1252" has no equivalent in "UTF8" > character 0x81 of encoding "WIN1252" has no equivalent in "UTF8" > character 0x9d of encoding "WIN1252" has no equivalent in "UTF8" The problem is that your data is invalid, according to Wikipedia: According to the information on Microsoft's and the Unicode Consortium's websites, positions 81, 8D, 8F, 90, and 9D are unused. So obviously they cannot be converted into any other encoding. See the table of Win-1252 chars here: http://en.wikipedia.org/wiki/Windows-1252 Most likely, your source data is not Win1252 but some other different encoding, so the pg_dump/pg_restore into a Win1252 database mangled it. Perhaps you oughta initdb the new database with the same encoding that the original database had, or declare the client_encoding on the pg_restore step. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.