Thread: Re: COPY formatting

Re: COPY formatting

From
Josh Berkus
Date:
Karel, Andrew, Fernando:

> On Wed, Mar 17, 2004 at 11:02:38AM -0500, Tom Lane wrote:
> > Karel Zak <zakkr@zf.jcu.cz> writes:
> > >  The formatting function API can be pretty simple:
> > >  text *my_copy_format(text *attrdata, int direction,
> > >              int nattrs, int attr, oid attrtype, oid relation)

No offense, but isn't this whole thing more appropriate for a client program?
Like the pg_import and pg_export projects on GBorg?   Has anyone looked at
those projects?

I can see making a special provision for CSV in COPY, just because it's such a
universal format.   But I personally don't see that a complex, sophisticated
import/export formatter belongs on the SQL command line.   Particularly since
most users will want a GUI to handle it.

And, BTW, I deal with CSV *all the time* for my insurance clients, and I can
tell you that that format hasn't changed in 20 years.   We can hard-code it
if it's easier.

--
-Josh BerkusAglio Database SolutionsSan Francisco



Re: COPY formatting

From
"Joshua D. Drake"
Date:
>
> And, BTW, I deal with CSV *all the time* for my insurance clients, and I can
> tell you that that format hasn't changed in 20 years.   We can hard-code it
> if it's easier.

Well many of my clients consider CSV "Character Separated Value" not
Comma... Thus I get data like this:

"Hello","Good Bye"
Hello    Good Bye
Hello,Good Bye
"This", "They're"
This    They're
"This"    "Is"    "A"    1


Dealing with all of these different nuances is may or may not be beyond
the scope of copy but it seems that it could be something that it can
handle.

Python has a csv module that allows you to assign dialects to any
specific type of import you are performing.

Sincerely,

Joshua D. Drake



>


--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com
Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL

Attachment

Re: COPY formatting

From
Andrew Dunstan
Date:

Joshua D. Drake wrote:

>>
>> And, BTW, I deal with CSV *all the time* for my insurance clients, 
>> and I can tell you that that format hasn't changed in 20 years.   We 
>> can hard-code it if it's easier.
>
>
> Well many of my clients consider CSV "Character Separated Value" not 
> Comma... Thus I get data like this:
>
> "Hello","Good Bye"
> Hello    Good Bye
> Hello,Good Bye
> "This", "They're"
> This    They're
> "This"    "Is"    "A"    1


*nod* I too have seen these and other variants over the years, including 
some that use single quote instead of double quote as the quote char, 
and \ as the escape char.

My suggested scheme for beefing up COPY was made with all these variants 
in mind.

cheers

andrew



Re: COPY formatting

From
Josh Berkus
Date:
Thomas, Andrew, Karel,

Thomas is correct: many applications which read or make CSVs will accept a 
newline if it is enclosed in a quote.   

> > I *have* seen monstrosities like fields that do not begin with the quote
> > character but then break into a quote, e.g.:
> >
> > 1,2,a,123"abc""def",6,7,8

This I have never seen.   It looks like a hackish error to me.   What 
application is it from?

Frankly, I would expect any CSV reader to error out on the above, and would be 
annoyed if it did not.

Overall, I assert again that approaching this issue through COPY enhancements 
is really not the way to go.    We should be looking at a client utility, 
like pg_import and pg_export.     The primary purpose of COPY is bulk loads 
for backup/restore, and I'm against doing a lot of tinkering which might make 
it less efficient or introduce new issues into what's currently very 
reliable.

-- 
-Josh BerkusAglio Database SolutionsSan Francisco



Re: COPY formatting

From
Andrew Dunstan
Date:
Josh Berkus wrote:

>
>Overall, I assert again that approaching this issue through COPY enhancements 
>is really not the way to go.    We should be looking at a client utility, 
>like pg_import and pg_export.     The primary purpose of COPY is bulk loads 
>for backup/restore, and I'm against doing a lot of tinkering which might make 
>it less efficient or introduce new issues into what's currently very 
>reliable.
>
>  
>

That's not unreasonable. I floated my idea as an alternative to a much 
more radical proposal. If we decided against it we should remove the 
TODO item.

As against that, if we don't do this then I think we should embrace 
these utility programs more, possibly bringing them into the distribution.

cheers

andrew



Re: COPY formatting

From
Bruce Momjian
Date:
Andrew Dunstan wrote:
> Josh Berkus wrote:
> 
> >
> >Overall, I assert again that approaching this issue through COPY enhancements 
> >is really not the way to go.    We should be looking at a client utility, 
> >like pg_import and pg_export.     The primary purpose of COPY is bulk loads 
> >for backup/restore, and I'm against doing a lot of tinkering which might make 
> >it less efficient or introduce new issues into what's currently very 
> >reliable.
> >
> >  
> >
> 
> That's not unreasonable. I floated my idea as an alternative to a much 
> more radical proposal. If we decided against it we should remove the 
> TODO item.
> 
> As against that, if we don't do this then I think we should embrace 
> these utility programs more, possibly bringing them into the distribution.

CSV seems to be the most widely requested conversion format.  Anything
else is probably a one-off job that should be done in perl or sed.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073