Re: Copy From CSV feature request? - Mailing list pgsql-hackers

From Richard Huxton
Subject Re: Copy From CSV feature request?
Date
Msg-id 437064FF.5020608@archonet.com
Whole thread Raw
In response to Copy From CSV feature request?  (mike <mike@thegodshalls.com>)
List pgsql-hackers
mike wrote:
> I import some of my data into my postgres database, win32 platform, via
> the COPY table FROM with CSV.  My CSV file is created from a Crystal
> Report (v.9).  I run the report and have Crystal export the results into
> a CSV file (using the default settings).
> 
> I have some data which looks like this when stored in the source
> application (m$ sql server 2000) and Crystal:
> 
> Line 1000 1/1/2004  Company2  Person2             Misc
> Line 1001 1/1/2004  Company3  Person3 " Nickname  Misc
> Line 1002 1/1/2004  Company3  Person3 " Nickname  Misc
> Line 1003 1/1/2004  Company4  Person4             Misc
> Line 1004 1/1/2004  Company5  Person5             Misc
> 
> When I export the report as a CSV file Crystal exports it like this:
> 
> Line 1000 "1/1/2004","Company2","Person2","Misc"
> Line 1001 "1/1/2004","Company3","Person3 " Nickname","Misc"                                          ^^^
This is just wrong. Double-quotes inside double-quoted fields need to be 
escaped by doubling. As, e.g. here:  http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm

> Line 1002 "1/1/2004","Company3","Person3 " Nickname","Misc"
> Line 1003 "1/1/2004","Company4","Person4","Misc"
> Line 1004 "1/1/2004","Company4","Person4","Misc"
> 
> Now when I execute the COPY function it aborts on Line 1002 with the
> error "extra data after last expected column".  I am not sure why it
> does not abort on Line 1001.

It's probably seeing the end-of-line as part of the last field on line 1001.

> If Crystal is exporting the data in the correct format should COPY FROM
> CSV be modified to handle an odd number of text qualifiers in a row?  
> Does anyone know if this is a valid format (no escape character
> automatically inserted)?

Nope - it's a bad format. Not that there is actually a standard 
definition of what CSV is, AFAIK.

> Could COPY be modified to accept the data without having to insert a "
> someplace in the row so that the original data appears the same?

How does PG determine that what you've got is an unescaped " rather than 
a missing comma?

> Would it be too much of a performance hit to do this?
> 
> How other applications handle it:
> 
> Only Excel 2003 seems to display the data correctly.  

Well, it silently guesses what you want and gets it right this 
particular time.

> If I open the CSV file using OpenOffice Calc 2.0 it combines Line 1001
> and Line 1002 into one row.
> 
> If I import the data back into Crystal the data after Person3 does not
> appear.

There's a sign that you've got a bug. If Crystal can't read what it 
writes then I'm not sure you can expect anyone else to do so reliably.

> If I import it into Access 2003 it ships Line 1001 and Line 1002 into an
> import error table.

Quite right too. It's one of the areas where Access does the right thing :-)

> Worth adding to the TODO or not a good feature?

I'm not a developer, but it strikes me as double plus ungood.

--  Richard Huxton  Archonet Ltd


pgsql-hackers by date:

Previous
From: "Pavel Stehule"
Date:
Subject: Re: Supporting NULL elements in arrays
Next
From: Sreejesh O S
Date:
Subject: Is there any other way to compile pgsql without gmake