Re: multiline CSV fields - Mailing list pgsql-hackers
From | Bruce Momjian |
---|---|
Subject | Re: multiline CSV fields |
Date | |
Msg-id | 200411290332.iAT3WPX22551@candle.pha.pa.us Whole thread Raw |
In response to | Re: multiline CSV fields (Andrew Dunstan <andrew@dunslane.net>) |
Responses |
Re: multiline CSV fields
|
List | pgsql-hackers |
OK, what solutions do we have for this? Not being able to load dumped data is a serious bug. I have added this to the open items list: * fix COPY CSV with \r,\n in data My feeling is that if we are in a quoted string we just process whatever characters we find, even passing through an EOL. I realize it might not mark missing quote errors well but that seems minor compared to not loading valid data. --------------------------------------------------------------------------- Andrew Dunstan wrote: > > This example should fail on data line 2 or 3 on any platform, > regardless of the platform's line-end convention, although I haven't > tested on Windows. > > cheers > > andrew > > [andrew@aloysius inst]$ bin/psql -e -f csverr.sql ; od -c > /tmp/csverrtest.csv > create table csverrtest (a int, b text, c int); > CREATE TABLE > insert into csverrtest values(1,'a',1); > INSERT 122471 1 > insert into csverrtest values(2,'foo\r\nbar',2); > INSERT 122472 1 > insert into csverrtest values(3,'baz\nblurfl',3); > INSERT 122473 1 > insert into csverrtest values(4,'d',4); > INSERT 122474 1 > insert into csverrtest values(5,'e',5); > INSERT 122475 1 > copy csverrtest to '/tmp/csverrtest.csv' csv; > COPY > truncate csverrtest; > TRUNCATE TABLE > copy csverrtest from '/tmp/csverrtest.csv' csv; > psql:cvserr.sql:9: ERROR: literal carriage return found in data > HINT: Use "\r" to represent carriage return. > CONTEXT: COPY csverrtest, line 2: "2,"foo" > drop table csverrtest; > DROP TABLE > 0000000 1 , a , 1 \n 2 , " f o o \r \n b a > 0000020 r " , 2 \n 3 , " b a z \n b l u r > 0000040 f l " , 3 \n 4 , d , 4 \n 5 , e , > 0000060 5 \n > 0000062 > [andrew@aloysius inst]$ > > Bruce Momjian wrote: > > >Can I see an example of such a failure line? > > > >--------------------------------------------------------------------------- > > > >Andrew Dunstan wrote: > > > > > >>Darcy Buskermolen has drawn my attention to unfortunate behaviour of > >>COPY CSV with fields containing embedded line end chars if the embedded > >>sequence isn't the same as those of the file containing the CSV data. In > >>that case we error out when reading the data in. This means there are > >>cases where we can produce a CSV data file which we can't read in, which > >>is not at all pleasant. > >> > >>Possible approaches to the problem: > >>. make it a documented limitation > >>. have a "csv read" mode for backend/commands/copy.c:CopyReadLine() that > >>relaxes some of the restrictions on inconsistent line endings > >>. escape embedded line end chars > >> > >>The last really isn't an option, because the whole point of CSVs is to > >>play with other programs, and my understanding is that those that > >>understand multiline fields (e.g. Excel) expect them not to be escaped, > >>and do not produce them escaped. > >> > >>So right now I'm tossing up in my head between the first two options. Or > >>maybe there's another solution I haven't thought of. > >> > >>Thoughts? > >> > >>cheers > >> > >>andrew > >> > >>---------------------------(end of broadcast)--------------------------- > >>TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > >> > >> > >> > > > > > > > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
pgsql-hackers by date: