Re: multiline CSV fields - Mailing list pgsql-hackers

From Andrew Dunstan
Subject Re: multiline CSV fields
Date
Msg-id 4193C3F3.9090009@dunslane.net
Whole thread Raw
In response to Re: multiline CSV fields  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: multiline CSV fields  (Patrick B Kelly <pbk@patrickbkelly.org>)
List pgsql-hackers

Tom Lane wrote:

>Andrew Dunstan <andrew@dunslane.net> writes:
>  
>
>>Patrick B Kelly wrote:
>>    
>>
>>>Actually, when I try to export a sheet with multi-line cells from 
>>>excel, it tells me that this feature is incompatible with the CSV 
>>>format and will not include them in the CSV file.
>>>      
>>>
>
>  
>
>>It probably depends on the version. I have just tested with Excel 2000 
>>on a WinXP machine and it both read and wrote these files.
>>    
>>
>
>I'd be inclined to define Excel 2000 as broken, honestly, if it's
>writing unescaped newlines as data.  To support this would mean throwing
>away most of our ability to detect incorrectly formatted CSV files.
>A simple error like a missing close quote would look to the machine like
>the rest of the file is a single long data line where all the newlines
>are embedded in data fields.  How likely is it that you'll get a useful
>error message out of that?  Most likely the error message would point to
>the end of the file, or at least someplace well removed from the actual
>mistake.
>
>I would vote in favor of removing the current code that attempts to
>support unquoted newlines, and waiting to see if there are complaints.
>
>
>  
>

This feature was specifically requested when we discussed what sort of 
CSVs we would handle.

And it does in fact work as long as the newline style is the same.

I just had an idea. How about if we add a new CSV option MULTILINE. If 
absent, then on output we would not output unescaped LF/CR characters 
and on input we would not allow fields with embedded unescaped LF/CR 
characters. In both cases we could error out for now, with perhaps an 
8.1 TODO to provide some other behaviour.

Or we could drop the whole multiline "feature" for now and make the 
whole thing an 8.1 item, although it would be a bit of a pity when it 
does work in what will surely be the most common case.

cheers

andrew




pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: MAX/MIN optimization via rewrite (plus query rewrites generally)
Next
From: Greg Stark
Date:
Subject: Re: multiline CSV fields