Re: multiline CSV fields - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: multiline CSV fields
Date
Msg-id 200411290332.iAT3WPX22551@candle.pha.pa.us
Whole thread Raw
In response to Re: multiline CSV fields  (Andrew Dunstan <andrew@dunslane.net>)
Responses Re: multiline CSV fields
List pgsql-hackers
OK, what solutions do we have for this?  Not being able to load dumped
data is a serious bug.  I have added this to the open items list:
* fix COPY CSV with \r,\n in data

My feeling is that if we are in a quoted string we just process whatever
characters we find, even passing through an EOL.  I realize it might not
mark missing quote errors well but that seems minor compared to not
loading valid data.

---------------------------------------------------------------------------

Andrew Dunstan wrote:
> 
> This example should fail  on data line 2 or 3 on any platform, 
> regardless of the platform's line-end convention, although I haven't 
> tested on Windows.
> 
> cheers
> 
> andrew
> 
> [andrew@aloysius inst]$ bin/psql -e -f csverr.sql ; od -c 
> /tmp/csverrtest.csv
> create table csverrtest (a int, b text, c int);
> CREATE TABLE
> insert into csverrtest values(1,'a',1);
> INSERT 122471 1
> insert into csverrtest values(2,'foo\r\nbar',2);
> INSERT 122472 1
> insert into csverrtest values(3,'baz\nblurfl',3);
> INSERT 122473 1
> insert into csverrtest values(4,'d',4);
> INSERT 122474 1
> insert into csverrtest values(5,'e',5);
> INSERT 122475 1
> copy csverrtest to '/tmp/csverrtest.csv' csv;
> COPY
> truncate csverrtest;
> TRUNCATE TABLE
> copy csverrtest from '/tmp/csverrtest.csv' csv;
> psql:cvserr.sql:9: ERROR:  literal carriage return found in data
> HINT:  Use "\r" to represent carriage return.
> CONTEXT:  COPY csverrtest, line 2: "2,"foo"
> drop table csverrtest;
> DROP TABLE
> 0000000   1   ,   a   ,   1  \n   2   ,   "   f   o   o  \r  \n   b   a
> 0000020   r   "   ,   2  \n   3   ,   "   b   a   z  \n   b   l   u   r
> 0000040   f   l   "   ,   3  \n   4   ,   d   ,   4  \n   5   ,   e   ,
> 0000060   5  \n
> 0000062
> [andrew@aloysius inst]$
> 
> Bruce Momjian wrote:
> 
> >Can I see an example of such a failure line?
> >
> >---------------------------------------------------------------------------
> >
> >Andrew Dunstan wrote:
> >  
> >
> >>Darcy Buskermolen has drawn my attention to unfortunate behaviour of 
> >>COPY CSV with fields containing embedded line end chars if the embedded 
> >>sequence isn't the same as those of the file containing the CSV data. In 
> >>that case we error out when reading the data in. This means there are 
> >>cases where we can produce a CSV data file which we can't read in, which 
> >>is not at all pleasant.
> >>
> >>Possible approaches to the problem:
> >>. make it a documented limitation
> >>. have a "csv read" mode for backend/commands/copy.c:CopyReadLine() that 
> >>relaxes some of the restrictions on inconsistent line endings
> >>. escape embedded line end chars
> >>
> >>The last really isn't an option, because the whole point of CSVs is to 
> >>play with other programs, and my understanding is that those that 
> >>understand multiline fields (e.g. Excel) expect them not to be escaped, 
> >>and do not produce them escaped.
> >>
> >>So right now I'm tossing up in my head between the first two options. Or 
> >>maybe there's another solution I haven't thought of.
> >>
> >>Thoughts?
> >>
> >>cheers
> >>
> >>andrew
> >>
> >>---------------------------(end of broadcast)---------------------------
> >>TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
> >>
> >>    
> >>
> >
> >  
> >
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Adding Reply-To: to Lists configuration ...
Next
From: "Marc G. Fournier"
Date:
Subject: Re: Adding Reply-To: to Lists configuration