Multibyte problem with COPY FROM [Fwd: Re: postgres 7.2 and unicode] - Mailing list pgsql-general
From | Oliver Elphick |
---|---|
Subject | Multibyte problem with COPY FROM [Fwd: Re: postgres 7.2 and unicode] |
Date | |
Msg-id | 1017324778.1228.389.camel@linda Whole thread Raw |
List | pgsql-general |
I have confirmed that this problem occurs for me as well. On trying to import the 7.1 pg_dump data in the attachment I get $ psql junk </tmp/linksdb DROP DATABASE CREATE DATABASE You are now connected to database comanagers. CREATE CREATE ERROR: copy: line 1, Unicode >= 0x10000 is not supoorted lost synchronization with server, resetting connection The line where the error occurs includes this character sequence (according to od -xc, with words reversed into string order): 28 45 73 70 61 F1 61 29 ( E s p a a ) (I think that F1 is supposed to be n~ in the middle of Espan~a.) I guess that F1 61 29 is being interpreted as a single character, since three bytes would be needed for it to be above 0x10000. So for some reason the Unicode dumped by 7.1 is not the same as the Unicode expected by 7.2. Can anyone offer a solution, please? PostgreSQL has been configured thus: $ /usr/lib/postgresql/bin/pg_config --configure --with-template=linux --prefix=/usr/lib/postgresql --enable-unicode-conversion --with-includes=/usr/include/tcl8.3 --includedir=/usr/include/postgresql --with-python --with-openssl --with-gnu-ld --disable-rpath --enable-odbc --with-unixodbc --with-CXX --enable-recode --with-tcl --with-perl --with-pam --enable-multibyte --enable-debug --enable-syslog --enable-locale --with-tclconfig=/usr/lib/tcl8.3 --with-tkconfig=/usr/lib/tk8.3 --with-maxbackends=64 --with-pgport=5432 -----Forwarded Message----- From: Craig Sanders <cas@taz.net.au> To: Oliver Elphick <olly@lfix.co.uk> Subject: Re: postgres 7.2 and unicode Date: 28 Mar 2002 22:50:51 +1100 On Thu, Mar 28, 2002 at 10:13:35AM +0000, Oliver Elphick wrote: > I haven't heard of such a problem i've been searching the web and list archives since i discovered this. haven't seen anything even remotely related to it. > Could you extract the data properly before the upgrade? Perhaps the > pg_dump format is wrong? yes, the data was dumped properly. there's no problem dumping the data. the problem occurs when trying to read it back in with COPY (as is done by the postgres package upgrade scripts). > Can you (in the new database) insert and extract data through the CGI > forms as you did before? i believe so, but i haven't confirmed this for myself yet (i didn't write the database or the CGI scripts, i just look after the server it's on). > Please send me an extract from the dump, showing the creation of the > database and the table, and some of the dud lines i have attached a file called linksdb containing the sql code to create a database, sequence and table, and some sample records. these were extracted from the db.out file created by the upgrade procedure. there were several hundred records in the linksdb table, but i've extracted only the ones with characters between 0xe1 and 0xfa. AFAIK, not all of them cause a problem. some do. the first line (containing "Espaa") definitely causes the COPY command to die with "copy: line 190, Unicode >= 0x10000 is not supoorted" i forced an import by writing a little perl script which used s/// and tr/// to translate away the bad characters - but that is no solution...one of the databases is specifically for a web site promoting multi-lingual web sites for ethnic community groups, so unicode is essential. in case it is of use, i have also attached linksdb.orig which is the complete contents of the linksdb table. craig -- craig sanders <cas@taz.net.au> Fabricati Diem, PVNC. -- motto of the Ankh-Morpork City Watch
Attachment
pgsql-general by date: