Thread: Re: COPY formatting
Karel, Andrew, Fernando: > On Wed, Mar 17, 2004 at 11:02:38AM -0500, Tom Lane wrote: > > Karel Zak <zakkr@zf.jcu.cz> writes: > > > The formatting function API can be pretty simple: > > > text *my_copy_format(text *attrdata, int direction, > > > int nattrs, int attr, oid attrtype, oid relation) No offense, but isn't this whole thing more appropriate for a client program? Like the pg_import and pg_export projects on GBorg? Has anyone looked at those projects? I can see making a special provision for CSV in COPY, just because it's such a universal format. But I personally don't see that a complex, sophisticated import/export formatter belongs on the SQL command line. Particularly since most users will want a GUI to handle it. And, BTW, I deal with CSV *all the time* for my insurance clients, and I can tell you that that format hasn't changed in 20 years. We can hard-code it if it's easier. -- -Josh BerkusAglio Database SolutionsSan Francisco
> > And, BTW, I deal with CSV *all the time* for my insurance clients, and I can > tell you that that format hasn't changed in 20 years. We can hard-code it > if it's easier. Well many of my clients consider CSV "Character Separated Value" not Comma... Thus I get data like this: "Hello","Good Bye" Hello Good Bye Hello,Good Bye "This", "They're" This They're "This" "Is" "A" 1 Dealing with all of these different nuances is may or may not be beyond the scope of copy but it seems that it could be something that it can handle. Python has a csv module that allows you to assign dialects to any specific type of import you are performing. Sincerely, Joshua D. Drake > -- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming shared hosting and dedicated hosting. +1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL
Attachment
Joshua D. Drake wrote: >> >> And, BTW, I deal with CSV *all the time* for my insurance clients, >> and I can tell you that that format hasn't changed in 20 years. We >> can hard-code it if it's easier. > > > Well many of my clients consider CSV "Character Separated Value" not > Comma... Thus I get data like this: > > "Hello","Good Bye" > Hello Good Bye > Hello,Good Bye > "This", "They're" > This They're > "This" "Is" "A" 1 *nod* I too have seen these and other variants over the years, including some that use single quote instead of double quote as the quote char, and \ as the escape char. My suggested scheme for beefing up COPY was made with all these variants in mind. cheers andrew
Thomas, Andrew, Karel, Thomas is correct: many applications which read or make CSVs will accept a newline if it is enclosed in a quote. > > I *have* seen monstrosities like fields that do not begin with the quote > > character but then break into a quote, e.g.: > > > > 1,2,a,123"abc""def",6,7,8 This I have never seen. It looks like a hackish error to me. What application is it from? Frankly, I would expect any CSV reader to error out on the above, and would be annoyed if it did not. Overall, I assert again that approaching this issue through COPY enhancements is really not the way to go. We should be looking at a client utility, like pg_import and pg_export. The primary purpose of COPY is bulk loads for backup/restore, and I'm against doing a lot of tinkering which might make it less efficient or introduce new issues into what's currently very reliable. -- -Josh BerkusAglio Database SolutionsSan Francisco
Josh Berkus wrote: > >Overall, I assert again that approaching this issue through COPY enhancements >is really not the way to go. We should be looking at a client utility, >like pg_import and pg_export. The primary purpose of COPY is bulk loads >for backup/restore, and I'm against doing a lot of tinkering which might make >it less efficient or introduce new issues into what's currently very >reliable. > > > That's not unreasonable. I floated my idea as an alternative to a much more radical proposal. If we decided against it we should remove the TODO item. As against that, if we don't do this then I think we should embrace these utility programs more, possibly bringing them into the distribution. cheers andrew
Andrew Dunstan wrote: > Josh Berkus wrote: > > > > >Overall, I assert again that approaching this issue through COPY enhancements > >is really not the way to go. We should be looking at a client utility, > >like pg_import and pg_export. The primary purpose of COPY is bulk loads > >for backup/restore, and I'm against doing a lot of tinkering which might make > >it less efficient or introduce new issues into what's currently very > >reliable. > > > > > > > > That's not unreasonable. I floated my idea as an alternative to a much > more radical proposal. If we decided against it we should remove the > TODO item. > > As against that, if we don't do this then I think we should embrace > these utility programs more, possibly bringing them into the distribution. CSV seems to be the most widely requested conversion format. Anything else is probably a one-off job that should be done in perl or sed. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073