Thread: Two small questions re/ COPY CSV data into table

Two small questions re/ COPY CSV data into table

From
Matthias Apitz
Date:
Hello,

I'm using COPY to load CSV data into the table in PG 10.x and have to simple
questions:

1. I do use as command:

   COPY adm_cat FROM '/home/sisis/PostgreSQL/PDcsv/adm_cat.load' WITH DELIMITER '|' CSV ;

   which works fine. When I read the PG docs the syntax should be

   ... WITH DELIMITER '|', FORMAT CSV ;

   But this gives an syntax error as:

   LINE 1: .../PostgreSQL/PDcsv/adm_cat.load' WITH DELIMITER '|', FORMAT C...

   and the ',' is marked as its place. Why?
  
2. The CSV export is done by some Sybase tool which escapes the
   delimiter as '\|', i.e. putting a backslash before the delimiter. I
   found no way that COPY understands this excaping. Any ideas?

Thanks

    matthias
-- 
Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub
May, 9: Спаси́бо освободители! Thank you very much, Russian liberators!



Re: Two small questions re/ COPY CSV data into table

From
Andrew Gierth
Date:
>>>>> "Matthias" == Matthias Apitz <guru@unixarea.de> writes:

 Matthias>    ... WITH DELIMITER '|', FORMAT CSV ;

I think you misread the docs; the new-style syntax would be

COPY ... WITH (DELIMITER '|', FORMAT CSV);

where the parens are not optional. The old-style syntax with no parens
after WITH is what you were using before.

 Matthias> 2. The CSV export is done by some Sybase tool which escapes
 Matthias> the delimiter as '\|', i.e. putting a backslash before the
 Matthias> delimiter. I found no way that COPY understands this
 Matthias> excaping. Any ideas?

That sounds like the file is not actually a CSV - why do you think it
is?

PG accepts two formats (actually 3 if you count binary format which is
rarely used):

TEXT: records are delimited by newlines, columns are delimited by a
delimiter character (default tab), there are no quotation marks, any
newline, backslash, or delimiter in the data must be escaped as a
backslash-sequence (e.g. \n or \012)

CSV: columns may be quoted (in which case delimiters and newlines inside
them are ignored), records are delimited by newlines _outside_ quoted
fields, there are no backslash-sequences or escapes outside of quoted
fields, quote characters inside quoted fields are doubled (though
there's an option to change this).

PG follows the CSV spec at https://www.ietf.org/rfc/rfc4180.txt fairly
closely.

-- 
Andrew (irc:RhodiumToad)



Re: Two small questions re/ COPY CSV data into table

From
Matthias Apitz
Date:
El día Tuesday, June 04, 2019 a las 11:32:45AM +0100, Andrew Gierth escribió:

>  Matthias> 2. The CSV export is done by some Sybase tool which escapes
>  Matthias> the delimiter as '\|', i.e. putting a backslash before the
>  Matthias> delimiter. I found no way that COPY understands this
>  Matthias> excaping. Any ideas?
> 
> That sounds like the file is not actually a CSV - why do you think it
> is?
> 
> ...

Well, it's not strictly CSV, but it is what the Sybase tool produces. The
delimiter is set to '|' and a data example line looks like:

0|1| cat $1 \| lpr -Pprinter |3|4

I do load this now with COPY in mode TEXT and modify the data before
with:

 sed 's/|/\v/g' < table-from-Sybase  | sed 's/\\\v/|/g' > table-for-copy

Works fine.

    matthias

-- 
Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub
May, 9: Спаси́бо освободители! Thank you very much, Russian liberators!



Re: Two small questions re/ COPY CSV data into table

From
Andrew Gierth
Date:
>>>>> "Matthias" == Matthias Apitz <guru@unixarea.de> writes:

 >> That sounds like the file is not actually a CSV - why do you think
 >> it is?

 Matthias> Well, it's not strictly CSV,

In other words it's not a CSV at all.

 Matthias> I do load this now with COPY in mode TEXT and modify the data
 Matthias> before with:

 Matthias>  sed 's/|/\v/g' < table-from-Sybase  | sed 's/\\\v/|/g' > table-for-copy

What on earth is this supposed to achieve?

-- 
Andrew (irc:RhodiumToad)



Re: Two small questions re/ COPY CSV data into table

From
Matthias Apitz
Date:
El día Tuesday, June 04, 2019 a las 04:41:47PM +0100, Andrew Gierth escribió:

>  Matthias> I do load this now with COPY in mode TEXT and modify the data
>  Matthias> before with:
> 
>  Matthias>  sed 's/|/\v/g' < table-from-Sybase  | sed 's/\\\v/|/g' > table-for-copy
> 
> What on earth is this supposed to achieve?

It first translates any char '|' to vtab and then any '\vtab' (i.e. a
backslash followed by a vtab) back to the char |

The new DELIMITER for PG is then vtab and the | is just a char in the
data (in the example above a pipe of two UNIX cmd). Do you do UNIX?

    matthias

-- 
Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub
May, 9: Спаси́бо освободители! Thank you very much, Russian liberators!



Re: Two small questions re/ COPY CSV data into table

From
Andrew Gierth
Date:
>>>>> "Matthias" == Matthias Apitz <guru@unixarea.de> writes:

 Matthias> sed 's/|/\v/g' < table-from-Sybase  | sed 's/\\\v/|/g' > table-for-copy
 
 >> What on earth is this supposed to achieve?

 Matthias> It first translates any char '|' to vtab and then any '\vtab'
 Matthias> (i.e. a backslash followed by a vtab) back to the char |

 Matthias> The new DELIMITER for PG is then vtab and the | is just a char in the
 Matthias> data (in the example above a pipe of two UNIX cmd).

Yes, but why?

If you use COPY ... WITH DELIMITER '|' (i.e. text mode, not CSV mode)
then the \| is accepted as being a literal | and the unescaped | is
treated as a delimiter. What is the point of the substitutions?

-- 
Andrew (irc:RhodiumToad)



Re: Two small questions re/ COPY CSV data into table

From
Matthias Apitz
Date:
El día Tuesday, June 04, 2019 a las 05:56:49PM +0100, Andrew Gierth escribió:

>  Matthias> The new DELIMITER for PG is then vtab and the | is just a char in the
>  Matthias> data (in the example above a pipe of two UNIX cmd).
> 
> Yes, but why?
> 
> If you use COPY ... WITH DELIMITER '|' (i.e. text mode, not CSV mode)
> then the \| is accepted as being a literal | and the unescaped | is
> treated as a delimiter. What is the point of the substitutions?

In the original data record in Sybase there is a column containing, i.e.
some UNIX print command to be read and executed by the application:

    cmd | lpr ...

it is exported into the file as

    cmd \| lpr ...

because the export delimiter is set to '|';

the COPY in text mode was complaining because it interpreted the |
of the sequence \| as an delimiter and the number of columns did not
matched.

I will provide tomorrow the exact input line, the exact COPY command and
the error.

    matthias

-- 
Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub
May, 9: Спаси́бо освободители! Thank you very much, Russian liberators!



Re: Two small questions re/ COPY CSV data into table

From
Andrew Gierth
Date:
>>>>> "Matthias" == Matthias Apitz <guru@unixarea.de> writes:

 Matthias> it is exported into the file as

 Matthias>     cmd \| lpr ...

 Matthias> because the export delimiter is set to '|';

 Matthias> the COPY in text mode was complaining because it interpreted
 Matthias> the | of the sequence \| as an delimiter and the number of
 Matthias> columns did not matched.

 Matthias> I will provide tomorrow the exact input line, the exact COPY
 Matthias> command and the error.

This worked for me:

postgres=# copy ct(a,b,c,d,e) from stdin with delimiter '|';
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> 0|1| cat $1 \| lpr -Pprinter |3|4
>> \.
COPY 1

-- 
Andrew (irc:RhodiumToad)



Re: Two small questions re/ COPY CSV data into table

From
Francisco Olarte
Date:
Mathias:

On Tue, Jun 4, 2019 at 8:35 PM Matthias Apitz <guru@unixarea.de> wrote:
> El día Tuesday, June 04, 2019 a las 05:41:43PM +0200, Francisco Olarte escribió:

> > On Tue, Jun 4, 2019 at 5:03 PM Matthias Apitz <guru@unixarea.de> wrote:
> > > Well, it's not strictly CSV, but it is what the Sybase tool produces. The
> > > delimiter is set to '|' and a data example line looks like:
> > > 0|1| cat $1 \| lpr -Pprinter |3|4
> > > I do load this now with COPY in mode TEXT and modify the data before
> > > with:
> > >  sed 's/|/\v/g' < table-from-Sybase  | sed 's/\\\v/|/g' > table-for-copy
> > You are lucky it is not CSV, it is much better, it's similar to the
> > text format originally used by postgres. Your problem is just it uses
> > | instead of TAB for field delimiter.

> > What I d not know is why you use \v, doesn't sed use \t for TAB?
> I was thinking about \t and was afraid, that in ~2 GByte char fields
> even some \t could be used in the data. That's why I decided to use \v
> which is more unlikely to be used in our data.

Fine then. I haven't got much sed-foo under my belt, but if you do
this in perl ( I learned perl a while ago, forgot all awk and sed to
free some brain cells ;-> ) or something similar you can use high code
points or control chars ( and I do not know if sed can do several
replacements in a pass, but perl certainly can ). Something like (
untested ) "perl -pe 's/\\\\/\001/g; s/\\\|/\002/; s/\|/\t/;
s/\002/|/; s/\001/\\\\/", the advantage over the pipe approach may be
noticeable on big files, as it cuts a lot of context switches ( OTOH
it'll run single-core ).

( the sample tries to do
    scaped backslash to soh
    scaped pipe to stx
    pipe to tab
    stx to unescaped pipe
    soh to escaped backlash
)

In fact you can do "tab to backslash t" after phase 2 and tab would be
ok in the input.

But, as I said before and others have also pointed, COPY is perfectly
happy to use pipe as delimiter and havinf it escaped with backslash on
input.

> (You didn't cc'ed the list, by intention?)

No, I forgot to hit reply-all because I did not pay enough attention,
my fault. Doing it now in case some one thinks this is interesting.

Francisco Olarte.



Re: Two small questions re/ COPY CSV data into table

From
Matthias Apitz
Date:
El día Tuesday, June 04, 2019 a las 07:20:54PM +0200, Matthias Apitz escribió:

> El día Tuesday, June 04, 2019 a las 05:56:49PM +0100, Andrew Gierth escribió:
> 
> > If you use COPY ... WITH DELIMITER '|' (i.e. text mode, not CSV mode)
> > then the \| is accepted as being a literal | and the unescaped | is
> > treated as a delimiter. What is the point of the substitutions?
> 
> ...

> I will provide tomorrow the exact input line, the exact COPY command and
> the error.

It works exactly as Andrew states. My error was not caused by treating
the '|' of '\|' as DELIMITER, but by not having WITH (NULL '' ...) in the COPY cmd.
I misinterpreted the error message as not matching number of columns.
Sorry for the noise.

Thanks

    matthias
-- 
Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub
May, 9: Спаси́бо освободители! Thank you very much, Russian liberators!