Thread: COPY doesn't works when containing 'ñ' or 'à' characters on db

COPY doesn't works when containing 'ñ' or 'à' characters on db

From

Jaume Teixi

Date:

26 February 2001, 08:26:19

I finally percated that when data contains 'ñ' or 'à' it's impossible to
parse trought:

COPY products FROM '/var/lib/postgres/dadesi.txt' USING DELIMITERS '|' \g

it causes:

SELECT edicion FROM products;
     edicion
-----------------
 España|Nacional <-------puts on the same cell either there's an '|' in
the middle!!!


but changing 'ñ' for n

SELECT edicion FROM products;
     edicion
-----------------
 Espana <---------------it separates cells ok


so what's my solution for a text to COPY containing such characters?


best regards,
jaume

Re: COPY doesn't works when containing ' ' or ' ' characters on db

From

Tom Lane

Date:

26 February 2001, 22:16:48

Jaume Teixi <teixi@6tems.com> writes:
> I finally percated that when data contains '�' or '�' it's impossible to
> parse trought:

> COPY products FROM '/var/lib/postgres/dadesi.txt' USING DELIMITERS '|' \g

> it causes:

> SELECT edicion FROM products;
>      edicion
> -----------------
>  Espa�a|Nacional <-------puts on the same cell either there's an '|' in
> the middle!!!

Very odd.  What LOCALE and multibyte encodings are you using, if any?
This seems like it must be a multibyte issue, but I can't guess what.

Also, which Postgres version are you running?  If you said, I missed it.

            regards, tom lane

Re: SOLVED: COPY doesn't works when containing ' ' or ' ' characters on db

From

Jaume Teixi

Date:

27 February 2001, 04:16:23

On Mon, 26 Feb 2001 22:16:35 -0500 Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Jaume Teixi <teixi@6tems.com> writes:
> > I finally percated that when data contains 'ñ' or 'à' it's impossible
to
> > parse trought:
>
> > COPY products FROM '/var/lib/postgres/dadesi.txt' USING DELIMITERS '|'
\g
>
> > it causes:
>
> > SELECT edicion FROM products;
> >      edicion
> > -----------------
> >  España|Nacional <-------puts on the same cell either there's an '|'
in
> > the middle!!!


I finally, thanks to Oliver Elphick,

managed to create database with:
    CREATE DATABASE "demo" WITH ENCODING = 'SQL_ASCII'

and data was imported OK, great, thanks!

Re: COPY doesn't works when containing ' ' or ' ' characters on db

From

"Oliver Elphick"

Date:

27 February 2001, 11:47:15

Tom Lane wrote:
  >Jaume Teixi <teixi@6tems.com> writes:
  >> I finally percated that when data contains '' or '' it's impossible to
  >> parse trought:
  >
  >> COPY products FROM '/var/lib/postgres/dadesi.txt' USING DELIMITERS '|' \g
  >
  >> it causes:
  >
  >> SELECT edicion FROM products;
  >>      edicion
  >> -----------------
  >>  Espaa|Nacional <-------puts on the same cell either there's an '|' in
  >> the middle!!!
  >
  >Very odd.  What LOCALE and multibyte encodings are you using, if any?
  >This seems like it must be a multibyte issue, but I can't guess what.
  >
  >Also, which Postgres version are you running?  If you said, I missed it.

I think this happens when the front-end encoding is SQL_ASCII and the
database is using UNICODE.  Then, there are misunderstandings between
front-end and back-end, so that a single character with the eighth bit
set may be sent by the front-end and interpreted by the back-end as the
first half of a UNICODE two-byte character.

--
Oliver Elphick                                Oliver.Elphick@lfix.co.uk
Isle of Wight                              http://www.lfix.co.uk/oliver
PGP: 1024R/32B8FAA1: 97 EA 1D 47 72 3F 28 47  6B 7E 39 CC 56 E4 C1 47
GPG: 1024D/3E1D0C1C: CA12 09E0 E8D5 8870 5839  932A 614D 4C34 3E1D 0C1C
                 ========================================
     "If we confess our sins, he is faithful and just to
      forgive us our sins, and to cleanse us from all
      unrighteousness."       I John 1:9

Re: COPY doesn't works when containing ' ' or ' ' characters on db

From

Tom Lane

Date:

27 February 2001, 12:30:58

"Oliver Elphick" <olly@lfix.co.uk> writes:
> I think this happens when the front-end encoding is SQL_ASCII and the
> database is using UNICODE.  Then, there are misunderstandings between
> front-end and back-end, so that a single character with the eighth bit
> set may be sent by the front-end and interpreted by the back-end as the
> first half of a UNICODE two-byte character.

I wondered about that, but his examples had one or more characters
between the eighth-bit-set character and the '|', so this doesn't seem
to explain the problem.

Still, if it went away after moving to ASCII encoding, it clearly is
a multibyte issue of some sort.

            regards, tom lane

Re: [HACKERS] Re: COPY doesn't works when containing ' ' or ' ' characters on db

From

Tatsuo Ishii

Date:

28 February 2001, 09:41:14

> "Oliver Elphick" <olly@lfix.co.uk> writes:
> > I think this happens when the front-end encoding is SQL_ASCII and the
> > database is using UNICODE.  Then, there are misunderstandings between
> > front-end and back-end, so that a single character with the eighth bit
> > set may be sent by the front-end and interpreted by the back-end as the
> > first half of a UNICODE two-byte character.
>
> I wondered about that, but his examples had one or more characters
> between the eighth-bit-set character and the '|', so this doesn't seem
> to explain the problem.

No.

From Jaume's example:

> SELECT edicion FROM products;
>      edicion
> -----------------
>  Espa�a|Nacional <-------puts on the same cell either there's an '|' in
> the middle!!!

\361 == 0xf1. UTF-8 assumes that:

     if (the first byte) & 0xe0 == 0xe0, then the letter consists of 3
     bytes.

So PostgreSQL believes that "�a|" is one UTF-8 letter and eat up
'|'.

My guess is Jaume made an UNICODE database but provided it ISO 8859-1
or that kind of single-byte latin encoding data.

I'm wondering why so many people are using UTF-8 database even he does
not understand what UTF-8 is:-) I hope 7.1 would solve this kind of
confusion by enabling an automatic encoding conversion between UTF-8
and others.
--
Tatsuo Ishii

RE: COPY doesn't works when containing ' ' or ' ' characters on db

From

"Rainer Mager"

Date:

28 February 2001, 18:02:01

I haven't been following this thread very carefully but I just remembered a
similar problem we had that is probably related. We did a dump from a UTF-8
db containind English, Japanese, and Korean data. When the dump was done in
the default mode (e.g., via COPY statements) then we could no restore it. It
would die on certain characters. We then tried dumping in with -nd flags.
This fixed the problem for us although the restore is a lot slower.

--Rainer

> -----Original Message-----
> From: pgsql-admin-owner@postgresql.org
> [mailto:pgsql-admin-owner@postgresql.org]On Behalf Of Tom Lane
> Sent: Tuesday, February 27, 2001 12:17 PM
> To: Jaume Teixi
> Cc: pgsql-hackers@postgresql.org; pgsql-admin@postgresql.org; Richard T.
> Robino; Stefan Huber
> Subject: Re: [ADMIN] COPY doesn't works when containing ' ' or ' '
> characters on db
>
>
> Jaume Teixi <teixi@6tems.com> writes:
> > I finally percated that when data contains '・ or '・ it's impossible to
> > parse trought:
>
> > COPY products FROM '/var/lib/postgres/dadesi.txt' USING
> DELIMITERS '|' \g
>
> > it causes:
>
> > SELECT edicion FROM products;
> >      edicion
> > -----------------
> >  Espa\x81\xC2|Nacional <-------puts on the same cell either there's an '|' in
> > the middle!!!
>
> Very odd.  What LOCALE and multibyte encodings are you using, if any?
> This seems like it must be a multibyte issue, but I can't guess what.
>
> Also, which Postgres version are you running?  If you said, I missed it.
>
>             regards, tom lane

log files

From

"Rainer Mager"

Date:

04 May 2001, 01:16:55

Hi all,

    Is there anyway to get the debug (-d2) log files to mark each transaction
with a unique ID. We're trying to debug dead locks and the transactions seem
to be mixed together somewhat.

Thanks,

--Rainer

Re: log files

From

Tom Lane

Date:

04 May 2001, 10:04:59

"Rainer Mager" <rmager@vgkk.com> writes:
>     Is there anyway to get the debug (-d2) log files to mark each transaction
> with a unique ID.

Not per-transaction, but there's an option to include the backend PID,
which should help.

            regards, tom lane

Postgres <-> Oracle

From

"Rainer Mager"

Date:

13 May 2001, 18:56:46

Hi all,

    We have an application that runs on both Postgres and Oracle. One problem
we've been facing as maintaining the the installed/default database for the
application. Once it is up and running, things are fine, but since we
primarily develop on Postgres we sometimes hit problems when it is time to
convert all of our work to Oracle. I was wondering if anyone knows of any
tools that take a Postgres dump and convert it to something Oracle can
accept?

    Thanks,

--Rainer