The following bug has been logged online:
Bug reference: 1076
Logged by: mike
Email address: michael_godshall@gmachs.com
PostgreSQL version: 7.4
Operating system: Windows/Cygwin
Description: Unicode Errors using Copy command
Details:
Hello,
I have a database I upgraded from 7.3 to 7.4.1. When I restored the backups
I received some error messages while the script was restoring a few
tables(unicode errors). The tables were created successfully but had no
data in them.
I dropped the database with the errors and re-created it using sql-ascii as
the encoding, re-issued the restore command, everything was restored
successfully.
Next in Psql I did the following:
1)set client_environment = 'unicode';
2)Create Table unicode.Foo(
copied the sql statement to create one of the tables it failed to import
when the default encoding was unicode but changed the table name);
3)Insert into unicode.Foo
Select * from sql_ascii.Foo;
The statements executed without error and the data from my sql_ascii encoded
table was successfully copied into the new unicode table. I did a select *
from unicode.foo and can see the non-english punctuation in the table now.
Thus there seems to be a problem with converting sql-ascii to unicode within
the Copy command. I found a few postings in pgsql-bugs questioning whether
or not this was a problem in 7.4 but no confirmation. No word if this is
being worked on by anyone currently either.
Examples of error messages I received when issuing the Copy command are the
following:
1)
ERROR: invalid byte sequence for encoding "UNICODE": 0XE56C73
CONTEXT: COPY volume_reports_copy_of_public_table, line 18808, column
transfereename: "Vralstad"(Please note I do not know how to reproduce the
small "o" that is supposed to appear above the first letter ,a, in this
name).
2)
ERROR: Unicode characters greater than or equal to 0x10000 are not supported
CONTEXT: COPY merged_results, line 1150, column how_make_better: " ...Konig
was..."(again I do not know how to reproduce the two small dots that should
appear above the letter "o" in that name/word.
Version: Postgresql 7.4.1 on i686-pc-cygwin, compiled by GCC gcc (GCC) 3.3.1
(cygming special).
OS - Windows 2000 SP3.
I would like to make the default encoding for this database Unicode. Would
it best to do what I did above for every table in the database, drop the
original tables, rename the new versions to the same as the original name,
backup the database, restore the backup as a new database with the default
Unicode encoding?
Any other suggestions?
Mike