Thread: COPY doesn't works when containing 'ñ' or 'à' characters on db
I finally percated that when data contains 'ñ' or 'à' it's impossible to parse trought: COPY products FROM '/var/lib/postgres/dadesi.txt' USING DELIMITERS '|' \g it causes: SELECT edicion FROM products; edicion ----------------- España|Nacional <-------puts on the same cell either there's an '|' in the middle!!! but changing 'ñ' for n SELECT edicion FROM products; edicion ----------------- Espana <---------------it separates cells ok so what's my solution for a text to COPY containing such characters? best regards, jaume
Jaume Teixi <teixi@6tems.com> writes: > I finally percated that when data contains '�' or '�' it's impossible to > parse trought: > COPY products FROM '/var/lib/postgres/dadesi.txt' USING DELIMITERS '|' \g > it causes: > SELECT edicion FROM products; > edicion > ----------------- > Espa�a|Nacional <-------puts on the same cell either there's an '|' in > the middle!!! Very odd. What LOCALE and multibyte encodings are you using, if any? This seems like it must be a multibyte issue, but I can't guess what. Also, which Postgres version are you running? If you said, I missed it. regards, tom lane
On Mon, 26 Feb 2001 22:16:35 -0500 Tom Lane <tgl@sss.pgh.pa.us> wrote: > Jaume Teixi <teixi@6tems.com> writes: > > I finally percated that when data contains 'ñ' or 'à' it's impossible to > > parse trought: > > > COPY products FROM '/var/lib/postgres/dadesi.txt' USING DELIMITERS '|' \g > > > it causes: > > > SELECT edicion FROM products; > > edicion > > ----------------- > > España|Nacional <-------puts on the same cell either there's an '|' in > > the middle!!! I finally, thanks to Oliver Elphick, managed to create database with: CREATE DATABASE "demo" WITH ENCODING = 'SQL_ASCII' and data was imported OK, great, thanks!
Tom Lane wrote: >Jaume Teixi <teixi@6tems.com> writes: >> I finally percated that when data contains '' or '' it's impossible to >> parse trought: > >> COPY products FROM '/var/lib/postgres/dadesi.txt' USING DELIMITERS '|' \g > >> it causes: > >> SELECT edicion FROM products; >> edicion >> ----------------- >> Espaa|Nacional <-------puts on the same cell either there's an '|' in >> the middle!!! > >Very odd. What LOCALE and multibyte encodings are you using, if any? >This seems like it must be a multibyte issue, but I can't guess what. > >Also, which Postgres version are you running? If you said, I missed it. I think this happens when the front-end encoding is SQL_ASCII and the database is using UNICODE. Then, there are misunderstandings between front-end and back-end, so that a single character with the eighth bit set may be sent by the front-end and interpreted by the back-end as the first half of a UNICODE two-byte character. -- Oliver Elphick Oliver.Elphick@lfix.co.uk Isle of Wight http://www.lfix.co.uk/oliver PGP: 1024R/32B8FAA1: 97 EA 1D 47 72 3F 28 47 6B 7E 39 CC 56 E4 C1 47 GPG: 1024D/3E1D0C1C: CA12 09E0 E8D5 8870 5839 932A 614D 4C34 3E1D 0C1C ======================================== "If we confess our sins, he is faithful and just to forgive us our sins, and to cleanse us from all unrighteousness." I John 1:9
"Oliver Elphick" <olly@lfix.co.uk> writes: > I think this happens when the front-end encoding is SQL_ASCII and the > database is using UNICODE. Then, there are misunderstandings between > front-end and back-end, so that a single character with the eighth bit > set may be sent by the front-end and interpreted by the back-end as the > first half of a UNICODE two-byte character. I wondered about that, but his examples had one or more characters between the eighth-bit-set character and the '|', so this doesn't seem to explain the problem. Still, if it went away after moving to ASCII encoding, it clearly is a multibyte issue of some sort. regards, tom lane
Re: [HACKERS] Re: COPY doesn't works when containing ' ' or ' ' characters on db
From
Tatsuo Ishii
Date:
> "Oliver Elphick" <olly@lfix.co.uk> writes: > > I think this happens when the front-end encoding is SQL_ASCII and the > > database is using UNICODE. Then, there are misunderstandings between > > front-end and back-end, so that a single character with the eighth bit > > set may be sent by the front-end and interpreted by the back-end as the > > first half of a UNICODE two-byte character. > > I wondered about that, but his examples had one or more characters > between the eighth-bit-set character and the '|', so this doesn't seem > to explain the problem. No. From Jaume's example: > SELECT edicion FROM products; > edicion > ----------------- > Espa�a|Nacional <-------puts on the same cell either there's an '|' in > the middle!!! \361 == 0xf1. UTF-8 assumes that: if (the first byte) & 0xe0 == 0xe0, then the letter consists of 3 bytes. So PostgreSQL believes that "�a|" is one UTF-8 letter and eat up '|'. My guess is Jaume made an UNICODE database but provided it ISO 8859-1 or that kind of single-byte latin encoding data. I'm wondering why so many people are using UTF-8 database even he does not understand what UTF-8 is:-) I hope 7.1 would solve this kind of confusion by enabling an automatic encoding conversion between UTF-8 and others. -- Tatsuo Ishii
I haven't been following this thread very carefully but I just remembered a similar problem we had that is probably related. We did a dump from a UTF-8 db containind English, Japanese, and Korean data. When the dump was done in the default mode (e.g., via COPY statements) then we could no restore it. It would die on certain characters. We then tried dumping in with -nd flags. This fixed the problem for us although the restore is a lot slower. --Rainer > -----Original Message----- > From: pgsql-admin-owner@postgresql.org > [mailto:pgsql-admin-owner@postgresql.org]On Behalf Of Tom Lane > Sent: Tuesday, February 27, 2001 12:17 PM > To: Jaume Teixi > Cc: pgsql-hackers@postgresql.org; pgsql-admin@postgresql.org; Richard T. > Robino; Stefan Huber > Subject: Re: [ADMIN] COPY doesn't works when containing ' ' or ' ' > characters on db > > > Jaume Teixi <teixi@6tems.com> writes: > > I finally percated that when data contains '・ or '・ it's impossible to > > parse trought: > > > COPY products FROM '/var/lib/postgres/dadesi.txt' USING > DELIMITERS '|' \g > > > it causes: > > > SELECT edicion FROM products; > > edicion > > ----------------- > > Espa\x81\xC2|Nacional <-------puts on the same cell either there's an '|' in > > the middle!!! > > Very odd. What LOCALE and multibyte encodings are you using, if any? > This seems like it must be a multibyte issue, but I can't guess what. > > Also, which Postgres version are you running? If you said, I missed it. > > regards, tom lane
Hi all, Is there anyway to get the debug (-d2) log files to mark each transaction with a unique ID. We're trying to debug dead locks and the transactions seem to be mixed together somewhat. Thanks, --Rainer
"Rainer Mager" <rmager@vgkk.com> writes: > Is there anyway to get the debug (-d2) log files to mark each transaction > with a unique ID. Not per-transaction, but there's an option to include the backend PID, which should help. regards, tom lane
Hi all, We have an application that runs on both Postgres and Oracle. One problem we've been facing as maintaining the the installed/default database for the application. Once it is up and running, things are fine, but since we primarily develop on Postgres we sometimes hit problems when it is time to convert all of our work to Oracle. I was wondering if anyone knows of any tools that take a Postgres dump and convert it to something Oracle can accept? Thanks, --Rainer