BUG #1076: Unicode Errors using Copy command - Mailing list pgsql-bugs

From PostgreSQL Bugs List
Subject BUG #1076: Unicode Errors using Copy command
Date
Msg-id 20040210032746.7D014CF486E@www.postgresql.com
Whole thread Raw
List pgsql-bugs
The following bug has been logged online:

Bug reference:      1076
Logged by:          mike

Email address:      michael_godshall@gmachs.com

PostgreSQL version: 7.4

Operating system:   Windows/Cygwin

Description:        Unicode Errors using Copy command

Details:

Hello,

I have a database I upgraded from 7.3 to 7.4.1.  When I restored the backups
I received some error messages while the script was restoring a few
tables(unicode errors).  The tables were created successfully but had no
data in them.

I dropped the database with the errors and re-created it using sql-ascii as
the encoding, re-issued the restore command, everything was restored
successfully.

Next in Psql I did the following:
1)set client_environment = 'unicode';
2)Create Table unicode.Foo(
copied the sql statement to create one of the tables it failed to import
when the default encoding was unicode but changed the table name);
3)Insert into unicode.Foo
  Select * from sql_ascii.Foo;

The statements executed without error and the data from my sql_ascii encoded
table was successfully copied into the new unicode table.  I did a select *
from unicode.foo and can see the non-english punctuation in the table now.

Thus there seems to be a problem with converting sql-ascii to unicode within
the Copy command.  I found a few postings in pgsql-bugs questioning whether
or not this was a problem in 7.4 but no confirmation.  No word if this is
being worked on by anyone currently either.

Examples of error messages I received when issuing the Copy command are the
following:
1)
ERROR:  invalid byte sequence for encoding "UNICODE": 0XE56C73
CONTEXT: COPY volume_reports_copy_of_public_table, line 18808, column
transfereename: "Vralstad"(Please note I do not know how to reproduce the
small "o" that is supposed to appear above the first letter ,a, in this
name).

2)
ERROR: Unicode characters greater than or equal to 0x10000 are not supported
CONTEXT: COPY merged_results, line 1150, column how_make_better: " ...Konig
was..."(again I do not know how to reproduce the two small dots that should
appear above the letter "o" in that name/word.

Version: Postgresql 7.4.1 on i686-pc-cygwin, compiled by GCC gcc (GCC) 3.3.1
(cygming special).
OS - Windows 2000 SP3.


I would like to make the default encoding for this database Unicode.  Would
it best to do what I did above for every table in the database, drop the
original tables, rename the new versions to the same as the original name,
backup the database, restore the backup as a new database with the default
Unicode encoding?
Any other suggestions?

Mike

pgsql-bugs by date:

Previous
From: "PostgreSQL Bugs List"
Date:
Subject: BUG #1075: ecpg rejects C keywords in SQL context
Next
From: Sean Chittenden
Date:
Subject: Expected regression test difference on Mac OSX...