Thread: Copy From CSV feature request?

Copy From CSV feature request?

From
mike
Date:
I import some of my data into my postgres database, win32 platform, via
the COPY table FROM with CSV.  My CSV file is created from a Crystal
Report (v.9).  I run the report and have Crystal export the results into
a CSV file (using the default settings).

I have some data which looks like this when stored in the source
application (m$ sql server 2000) and Crystal:

Line 1000 1/1/2004  Company2  Person2             Misc
Line 1001 1/1/2004  Company3  Person3 " Nickname  Misc
Line 1002 1/1/2004  Company3  Person3 " Nickname  Misc
Line 1003 1/1/2004  Company4  Person4             Misc
Line 1004 1/1/2004  Company5  Person5             Misc

When I export the report as a CSV file Crystal exports it like this:

Line 1000 "1/1/2004","Company2","Person2","Misc"
Line 1001 "1/1/2004","Company3","Person3 " Nickname","Misc"
Line 1002 "1/1/2004","Company3","Person3 " Nickname","Misc"
Line 1003 "1/1/2004","Company4","Person4","Misc"
Line 1004 "1/1/2004","Company4","Person4","Misc"

Now when I execute the COPY function it aborts on Line 1002 with the
error "extra data after last expected column".  I am not sure why it
does not abort on Line 1001.

If Crystal is exporting the data in the correct format should COPY FROM
CSV be modified to handle an odd number of text qualifiers in a row?  
Does anyone know if this is a valid format (no escape character
automatically inserted)?

Could COPY be modified to accept the data without having to insert a "
someplace in the row so that the original data appears the same?

Would it be too much of a performance hit to do this?

How other applications handle it:

Only Excel 2003 seems to display the data correctly.  

If I open the CSV file using OpenOffice Calc 2.0 it combines Line 1001
and Line 1002 into one row.

If I import the data back into Crystal the data after Person3 does not
appear.

If I import it into Access 2003 it ships Line 1001 and Line 1002 into an
import error table.

Worth adding to the TODO or not a good feature?


Re: Copy From CSV feature request?

From
Richard Huxton
Date:
mike wrote:
> I import some of my data into my postgres database, win32 platform, via
> the COPY table FROM with CSV.  My CSV file is created from a Crystal
> Report (v.9).  I run the report and have Crystal export the results into
> a CSV file (using the default settings).
> 
> I have some data which looks like this when stored in the source
> application (m$ sql server 2000) and Crystal:
> 
> Line 1000 1/1/2004  Company2  Person2             Misc
> Line 1001 1/1/2004  Company3  Person3 " Nickname  Misc
> Line 1002 1/1/2004  Company3  Person3 " Nickname  Misc
> Line 1003 1/1/2004  Company4  Person4             Misc
> Line 1004 1/1/2004  Company5  Person5             Misc
> 
> When I export the report as a CSV file Crystal exports it like this:
> 
> Line 1000 "1/1/2004","Company2","Person2","Misc"
> Line 1001 "1/1/2004","Company3","Person3 " Nickname","Misc"                                          ^^^
This is just wrong. Double-quotes inside double-quoted fields need to be 
escaped by doubling. As, e.g. here:  http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm

> Line 1002 "1/1/2004","Company3","Person3 " Nickname","Misc"
> Line 1003 "1/1/2004","Company4","Person4","Misc"
> Line 1004 "1/1/2004","Company4","Person4","Misc"
> 
> Now when I execute the COPY function it aborts on Line 1002 with the
> error "extra data after last expected column".  I am not sure why it
> does not abort on Line 1001.

It's probably seeing the end-of-line as part of the last field on line 1001.

> If Crystal is exporting the data in the correct format should COPY FROM
> CSV be modified to handle an odd number of text qualifiers in a row?  
> Does anyone know if this is a valid format (no escape character
> automatically inserted)?

Nope - it's a bad format. Not that there is actually a standard 
definition of what CSV is, AFAIK.

> Could COPY be modified to accept the data without having to insert a "
> someplace in the row so that the original data appears the same?

How does PG determine that what you've got is an unescaped " rather than 
a missing comma?

> Would it be too much of a performance hit to do this?
> 
> How other applications handle it:
> 
> Only Excel 2003 seems to display the data correctly.  

Well, it silently guesses what you want and gets it right this 
particular time.

> If I open the CSV file using OpenOffice Calc 2.0 it combines Line 1001
> and Line 1002 into one row.
> 
> If I import the data back into Crystal the data after Person3 does not
> appear.

There's a sign that you've got a bug. If Crystal can't read what it 
writes then I'm not sure you can expect anyone else to do so reliably.

> If I import it into Access 2003 it ships Line 1001 and Line 1002 into an
> import error table.

Quite right too. It's one of the areas where Access does the right thing :-)

> Worth adding to the TODO or not a good feature?

I'm not a developer, but it strikes me as double plus ungood.

--  Richard Huxton  Archonet Ltd