Re: Copy From CSV feature request? - Mailing list pgsql-hackers
From | Richard Huxton |
---|---|
Subject | Re: Copy From CSV feature request? |
Date | |
Msg-id | 437064FF.5020608@archonet.com Whole thread Raw |
In response to | Copy From CSV feature request? (mike <mike@thegodshalls.com>) |
List | pgsql-hackers |
mike wrote: > I import some of my data into my postgres database, win32 platform, via > the COPY table FROM with CSV. My CSV file is created from a Crystal > Report (v.9). I run the report and have Crystal export the results into > a CSV file (using the default settings). > > I have some data which looks like this when stored in the source > application (m$ sql server 2000) and Crystal: > > Line 1000 1/1/2004 Company2 Person2 Misc > Line 1001 1/1/2004 Company3 Person3 " Nickname Misc > Line 1002 1/1/2004 Company3 Person3 " Nickname Misc > Line 1003 1/1/2004 Company4 Person4 Misc > Line 1004 1/1/2004 Company5 Person5 Misc > > When I export the report as a CSV file Crystal exports it like this: > > Line 1000 "1/1/2004","Company2","Person2","Misc" > Line 1001 "1/1/2004","Company3","Person3 " Nickname","Misc" ^^^ This is just wrong. Double-quotes inside double-quoted fields need to be escaped by doubling. As, e.g. here: http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm > Line 1002 "1/1/2004","Company3","Person3 " Nickname","Misc" > Line 1003 "1/1/2004","Company4","Person4","Misc" > Line 1004 "1/1/2004","Company4","Person4","Misc" > > Now when I execute the COPY function it aborts on Line 1002 with the > error "extra data after last expected column". I am not sure why it > does not abort on Line 1001. It's probably seeing the end-of-line as part of the last field on line 1001. > If Crystal is exporting the data in the correct format should COPY FROM > CSV be modified to handle an odd number of text qualifiers in a row? > Does anyone know if this is a valid format (no escape character > automatically inserted)? Nope - it's a bad format. Not that there is actually a standard definition of what CSV is, AFAIK. > Could COPY be modified to accept the data without having to insert a " > someplace in the row so that the original data appears the same? How does PG determine that what you've got is an unescaped " rather than a missing comma? > Would it be too much of a performance hit to do this? > > How other applications handle it: > > Only Excel 2003 seems to display the data correctly. Well, it silently guesses what you want and gets it right this particular time. > If I open the CSV file using OpenOffice Calc 2.0 it combines Line 1001 > and Line 1002 into one row. > > If I import the data back into Crystal the data after Person3 does not > appear. There's a sign that you've got a bug. If Crystal can't read what it writes then I'm not sure you can expect anyone else to do so reliably. > If I import it into Access 2003 it ships Line 1001 and Line 1002 into an > import error table. Quite right too. It's one of the areas where Access does the right thing :-) > Worth adding to the TODO or not a good feature? I'm not a developer, but it strikes me as double plus ungood. -- Richard Huxton Archonet Ltd
pgsql-hackers by date: