Re: multiline CSV fields - Mailing list pgsql-hackers

From Andrew Dunstan
Subject Re: multiline CSV fields
Date
Msg-id 41947F3C.2050906@dunslane.net
Whole thread Raw
In response to Re: multiline CSV fields  (Bruce Momjian <pgman@candle.pha.pa.us>)
Responses Re: multiline CSV fields  (Bruce Momjian <pgman@candle.pha.pa.us>)
List pgsql-hackers
This example should fail  on data line 2 or 3 on any platform, 
regardless of the platform's line-end convention, although I haven't 
tested on Windows.

cheers

andrew

[andrew@aloysius inst]$ bin/psql -e -f csverr.sql ; od -c 
/tmp/csverrtest.csv
create table csverrtest (a int, b text, c int);
CREATE TABLE
insert into csverrtest values(1,'a',1);
INSERT 122471 1
insert into csverrtest values(2,'foo\r\nbar',2);
INSERT 122472 1
insert into csverrtest values(3,'baz\nblurfl',3);
INSERT 122473 1
insert into csverrtest values(4,'d',4);
INSERT 122474 1
insert into csverrtest values(5,'e',5);
INSERT 122475 1
copy csverrtest to '/tmp/csverrtest.csv' csv;
COPY
truncate csverrtest;
TRUNCATE TABLE
copy csverrtest from '/tmp/csverrtest.csv' csv;
psql:cvserr.sql:9: ERROR:  literal carriage return found in data
HINT:  Use "\r" to represent carriage return.
CONTEXT:  COPY csverrtest, line 2: "2,"foo"
drop table csverrtest;
DROP TABLE
0000000   1   ,   a   ,   1  \n   2   ,   "   f   o   o  \r  \n   b   a
0000020   r   "   ,   2  \n   3   ,   "   b   a   z  \n   b   l   u   r
0000040   f   l   "   ,   3  \n   4   ,   d   ,   4  \n   5   ,   e   ,
0000060   5  \n
0000062
[andrew@aloysius inst]$

Bruce Momjian wrote:

>Can I see an example of such a failure line?
>
>---------------------------------------------------------------------------
>
>Andrew Dunstan wrote:
>  
>
>>Darcy Buskermolen has drawn my attention to unfortunate behaviour of 
>>COPY CSV with fields containing embedded line end chars if the embedded 
>>sequence isn't the same as those of the file containing the CSV data. In 
>>that case we error out when reading the data in. This means there are 
>>cases where we can produce a CSV data file which we can't read in, which 
>>is not at all pleasant.
>>
>>Possible approaches to the problem:
>>. make it a documented limitation
>>. have a "csv read" mode for backend/commands/copy.c:CopyReadLine() that 
>>relaxes some of the restrictions on inconsistent line endings
>>. escape embedded line end chars
>>
>>The last really isn't an option, because the whole point of CSVs is to 
>>play with other programs, and my understanding is that those that 
>>understand multiline fields (e.g. Excel) expect them not to be escaped, 
>>and do not produce them escaped.
>>
>>So right now I'm tossing up in my head between the first two options. Or 
>>maybe there's another solution I haven't thought of.
>>
>>Thoughts?
>>
>>cheers
>>
>>andrew
>>
>>---------------------------(end of broadcast)---------------------------
>>TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>>
>>    
>>
>
>  
>


pgsql-hackers by date:

Previous
From: Oleg Bartunov
Date:
Subject: Re: ltree PostgreSQL Module
Next
From: Patrick B Kelly
Date:
Subject: Re: multiline CSV fields