Thread: CSV mode option for pg_dump

CSV mode option for pg_dump

From

Andrew Dunstan

Date:

12 June 2006, 14:13:12

Something someone said on IRC just now triggered a little memory  ... I 
think we should provide an option to have pg_dump work in CSV mode 
rather than text mode. This probably doesn't have much importance in the 
case of text dumps, but in custom or tar dumps where you might want to 
get at individual data members, having an option for CSVs that you want 
to load into some other product might be nice.

This should be a pretty low cost item, I expect (good newbie project?)

thoughts?

cheers

andrew

Re: CSV mode option for pg_dump

From

"Joshua D. Drake"

Date:

12 June 2006, 14:53:32

Andrew Dunstan wrote:
> 
> Something someone said on IRC just now triggered a little memory  ... I 
> think we should provide an option to have pg_dump work in CSV mode 
> rather than text mode. This probably doesn't have much importance in the 
> case of text dumps, but in custom or tar dumps where you might want to 
> get at individual data members, having an option for CSVs that you want 
> to load into some other product might be nice.
> 
> This should be a pretty low cost item, I expect (good newbie project?)

Uhh... just about any application that can import CSV can import our 
dumps. It just tell it the delimiter is a tab.

Joshua D. Drake

> 
> thoughts?
> 
> cheers
> 
> andrew
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>       choose an index scan if your joining column's datatypes do not
>       match
> 


-- 
            === The PostgreSQL Company: Command Prompt, Inc. ===      Sales/Support: +1.503.667.4564 || 24x7/Emergency:
+1.800.492.2240     Providing the most comprehensive  PostgreSQL solutions since 1997
http://www.commandprompt.com/

Re: CSV mode option for pg_dump

From

Andrew Dunstan

Date:

12 June 2006, 14:57:55

Joshua D. Drake wrote:
> Andrew Dunstan wrote:
>>
>> Something someone said on IRC just now triggered a little memory  ... 
>> I think we should provide an option to have pg_dump work in CSV mode 
>> rather than text mode. This probably doesn't have much importance in 
>> the case of text dumps, but in custom or tar dumps where you might 
>> want to get at individual data members, having an option for CSVs 
>> that you want to load into some other product might be nice.
>>
>> This should be a pretty low cost item, I expect (good newbie project?)
>
> Uhh... just about any application that can import CSV can import our 
> dumps. It just tell it the delimiter is a tab.
>


No it won't, not if there are tabs in the data.

Why do you think we did CSV in the first place? Precisely because our 
dump does *not* work as a general export mechanism for arbitrary data.

cheers

andrew

Re: CSV mode option for pg_dump

From

Josh Berkus

Date:

12 June 2006, 14:58:27

Josh,

> > This should be a pretty low cost item, I expect (good newbie project?)
>
> Uhh... just about any application that can import CSV can import our
> dumps. It just tell it the delimiter is a tab.

Actually, there was an Summer of Code applcation to do this, but with all 
the other nifty stuff it wasn't accepted -- partly because we weren't sure 
that we wanted a CSV mode for dumps, partly because this should be a 
weekend fix, not a 3-month project.

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

Re: CSV mode option for pg_dump

From

Tom Lane

Date:

12 June 2006, 15:00:34

Andrew Dunstan <andrew@dunslane.net> writes:
> Something someone said on IRC just now triggered a little memory  ... I 
> think we should provide an option to have pg_dump work in CSV mode 
> rather than text mode. This probably doesn't have much importance in the 
> case of text dumps, but in custom or tar dumps where you might want to 
> get at individual data members, having an option for CSVs that you want 
> to load into some other product might be nice.

This is silly.  You'd just COPY the particular table you want, not use
pg_dump.  pg_dump's already got an unreasonably large number of options
without adding ones that have essentially zero use.  Also, I think there
are sufficient grounds to worry about whether a CSV dump would always
reload correctly --- we already know that that's a poorly thought out
"standard".
        regards, tom lane

Re: CSV mode option for pg_dump

From

"Joshua D. Drake"

Date:

12 June 2006, 15:09:13

> No it won't, not if there are tabs in the data.
> 

<snipping noise>

Hmmm then would just double quoting the data work? At least in OOCalc 
(and IIRC Excel) there is the ability to select a text delimiter.

Joshua D. Drake

-- 
            === The PostgreSQL Company: Command Prompt, Inc. ===      Sales/Support: +1.503.667.4564 || 24x7/Emergency:
+1.800.492.2240     Providing the most comprehensive  PostgreSQL solutions since 1997
http://www.commandprompt.com/

Re: CSV mode option for pg_dump

From

Andrew Dunstan

Date:

12 June 2006, 15:16:14

Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>   
>> Something someone said on IRC just now triggered a little memory  ... I 
>> think we should provide an option to have pg_dump work in CSV mode 
>> rather than text mode. This probably doesn't have much importance in the 
>> case of text dumps, but in custom or tar dumps where you might want to 
>> get at individual data members, having an option for CSVs that you want 
>> to load into some other product might be nice.
>>     
>
> This is silly.  You'd just COPY the particular table you want, not use
> pg_dump.  pg_dump's already got an unreasonably large number of options
> without adding ones that have essentially zero use.  Also, I think there
> are sufficient grounds to worry about whether a CSV dump would always
> reload correctly --- we already know that that's a poorly thought out
> "standard".
>
>     
>   

Well, if you have dozens or hundreds of tables it might well be more 
convenient.

As for not reloading - I went to some trouble to make sure that we could 
reload what we dumped, exactly, unless the force options are used. I 
might have made a bug in that, but it isn't dependent on the particular 
CSV format used.

Naturally you won't have a use for it, but I suspect others might (in 
which case they had better speak up ;-) )

I suppose the alternative would be to write a little tool in perl or 
whatever to do the same thing for you. Maybe a good pgfoundry project.

cheers

andrew

Re: CSV mode option for pg_dump

From

"Jim C. Nasby"

Date:

12 June 2006, 16:06:46

On Mon, Jun 12, 2006 at 02:15:59PM -0400, Andrew Dunstan wrote:
> Naturally you won't have a use for it, but I suspect others might (in 
> which case they had better speak up ;-) )

I'd bet that those who would find this useful are far more likely to be
on -general and not in here.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: CSV mode option for pg_dump

From

"Bill Bartlett"

Date:

12 June 2006, 16:17:50

Here's me speaking up -- I'd definitely use it!   As a quick way to pull
data into Excel to do basic reports or analysis, a CSV format would be
great.  Some of our users currently pull data into Excel for quickie
analysis, but creating fixed-width data via psql requires them to parse
the data and dumping anything via pg_dump with any delimiter (tabs,
etc.) usually doesn't work due to the delimiters being embedded in the
real data.

- Bill

> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org 
> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of 
> Andrew Dunstan
> Sent: Monday, June 12, 2006 2:16 PM
> To: Tom Lane
> Cc: PG Hackers
> Subject: Re: [HACKERS] CSV mode option for pg_dump
> 
> 
> Tom Lane wrote:
> > Andrew Dunstan <andrew@dunslane.net> writes:
> >   
> >> Something someone said on IRC just now triggered a little 
> memory  ... 
> >> I
> >> think we should provide an option to have pg_dump work in CSV mode 
> >> rather than text mode. This probably doesn't have much 
> importance in the 
> >> case of text dumps, but in custom or tar dumps where you 
> might want to 
> >> get at individual data members, having an option for CSVs 
> that you want 
> >> to load into some other product might be nice.
> >>     
> >
> > This is silly.  You'd just COPY the particular table you 
> want, not use 
> > pg_dump.  pg_dump's already got an unreasonably large number of 
> > options without adding ones that have essentially zero use. 
>  Also, I 
> > think there are sufficient grounds to worry about whether a 
> CSV dump 
> > would always reload correctly --- we already know that 
> that's a poorly 
> > thought out "standard".
> >
> >     
> >   
> 
> Well, if you have dozens or hundreds of tables it might well be more 
> convenient.
> 
> As for not reloading - I went to some trouble to make sure 
> that we could 
> reload what we dumped, exactly, unless the force options are used. I 
> might have made a bug in that, but it isn't dependent on the 
> particular 
> CSV format used.
> 
> Naturally you won't have a use for it, but I suspect others might (in 
> which case they had better speak up ;-) )
> 
> I suppose the alternative would be to write a little tool in perl or 
> whatever to do the same thing for you. Maybe a good pgfoundry project.
> 
> cheers
> 
> andrew
> 
> 
> ---------------------------(end of 
> broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>

Re: CSV mode option for pg_dump

From

"Joshua D. Drake"

Date:

12 June 2006, 17:13:57

Bill Bartlett wrote:
> Here's me speaking up -- I'd definitely use it!   As a quick way to pull
> data into Excel to do basic reports or analysis, a CSV format would be
> great. 

Why not just use ODBC?

Joshua D. Drake
-- 
            === The PostgreSQL Company: Command Prompt, Inc. ===      Sales/Support: +1.503.667.4564 || 24x7/Emergency:
+1.800.492.2240     Providing the most comprehensive  PostgreSQL solutions since 1997
http://www.commandprompt.com/

Re: CSV mode option for pg_dump

From

"Bill Bartlett"

Date:

12 June 2006, 17:29:34

Can't -- the main production database is over at a CoLo site with access
only available via SSH, and tightly-restricted SSH at that. Generally
one of the developers will SSH over to the server, pull out whatever
data is needed into a text file via psql or pg_dump, scp the file(s)
back here and send them to the user.

We're working on implementing a full-blown reporting system to eliminate
some of this need, but it's not quite there yet.

However, CSV is a very handy format to use to send data to other people
(especially "less technical" users), so even in the future the ability
to export to CSV would be handy.

- Bill

> -----Original Message-----
> From: Joshua D. Drake [mailto:jd@commandprompt.com] 
> Sent: Monday, June 12, 2006 4:15 PM
> To: Bill Bartlett
> Cc: 'Andrew Dunstan'; 'Tom Lane'; 'PG Hackers'
> Subject: Re: [HACKERS] CSV mode option for pg_dump
> 
> 
> Bill Bartlett wrote:
> > Here's me speaking up -- I'd definitely use it!   As a 
> quick way to pull
> > data into Excel to do basic reports or analysis, a CSV 
> format would be 
> > great.
> 
> Why not just use ODBC?
> 
> Joshua D. Drake
> -- 
> 
>              === The PostgreSQL Company: Command Prompt, Inc. ===
>        Sales/Support: +1.503.667.4564 || 24x7/Emergency: 
> +1.800.492.2240
>        Providing the most comprehensive  PostgreSQL solutions 
> since 1997
>                       http://www.commandprompt.com/
> 
>

Re: CSV mode option for pg_dump

From

Bruce Momjian

Date:

12 June 2006, 22:25:48

Already on TODO:

pg_dump:
         o %Add CSV output format


---------------------------------------------------------------------------

Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
> > Something someone said on IRC just now triggered a little memory  ... I 
> > think we should provide an option to have pg_dump work in CSV mode 
> > rather than text mode. This probably doesn't have much importance in the 
> > case of text dumps, but in custom or tar dumps where you might want to 
> > get at individual data members, having an option for CSVs that you want 
> > to load into some other product might be nice.
> 
> This is silly.  You'd just COPY the particular table you want, not use
> pg_dump.  pg_dump's already got an unreasonably large number of options
> without adding ones that have essentially zero use.  Also, I think there
> are sufficient grounds to worry about whether a CSV dump would always
> reload correctly --- we already know that that's a poorly thought out
> "standard".
> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly
> 

--  Bruce Momjian   http://candle.pha.pa.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: CSV mode option for pg_dump

From

"Matthew T. O'Connor"

Date:

13 June 2006, 09:15:56

Bill Bartlett wrote:
> Can't -- the main production database is over at a CoLo site with access
> only available via SSH, and tightly-restricted SSH at that. Generally
> one of the developers will SSH over to the server, pull out whatever
> data is needed into a text file via psql or pg_dump, scp the file(s)
> back here and send them to the user.

ODBC over an SSH tunnnel?

Re: CSV mode option for pg_dump

From

"Andrew Dunstan"

Date:

13 June 2006, 10:03:07

Matthew T. OConnor said:
> Bill Bartlett wrote:
>> Can't -- the main production database is over at a CoLo site with
>> access only available via SSH, and tightly-restricted SSH at that.
>> Generally one of the developers will SSH over to the server, pull out
>> whatever data is needed into a text file via psql or pg_dump, scp the
>> file(s) back here and send them to the user.
>
> ODBC over an SSH tunnnel?

I wish I could understand why people are so keen to make other people turn
handsprings in order to avoid a feature which, as Bruce points out, is
already on the TODO list, and which, by my 10 minute analysis, would involve
almost trivial code impact and risk. If this involved major impact I might
understand, but it really doesn't.

I know many people work in a Postgres only world. I wish everybody did, and
then we could just forget about things like CSV. They don't, so we can't.

I think I have said this before, but I'll say it again. From time to time
people thank me for things I have done for Postgres. The two things that
stand out BY FAR on the list of these are CSV import/export and dollar
quoting. This is a widely used feature.

cheers

andrew

Re: CSV mode option for pg_dump

From

Bruce Momjian

Date:

13 June 2006, 10:09:49

Andrew Dunstan wrote:
> Matthew T. OConnor said:
> > Bill Bartlett wrote:
> >> Can't -- the main production database is over at a CoLo site with
> >> access only available via SSH, and tightly-restricted SSH at that.
> >> Generally one of the developers will SSH over to the server, pull out
> >> whatever data is needed into a text file via psql or pg_dump, scp the
> >> file(s) back here and send them to the user.
> >
> > ODBC over an SSH tunnnel?
> 
> 
> I wish I could understand why people are so keen to make other people turn
> handsprings in order to avoid a feature which, as Bruce points out, is
> already on the TODO list, and which, by my 10 minute analysis, would involve
> almost trivial code impact and risk. If this involved major impact I might
> understand, but it really doesn't.
> 
> I know many people work in a Postgres only world. I wish everybody did, and
> then we could just forget about things like CSV. They don't, so we can't.
> 
> I think I have said this before, but I'll say it again. From time to time
> people thank me for things I have done for Postgres. The two things that
> stand out BY FAR on the list of these are CSV import/export and dollar
> quoting. This is a widely used feature.

I think the bottom line is that "ease of use" isn't as high enough on
the project's priority list as you (and others) think it should be.

I personally feel as you do that we should value "ease of use" more.

--  Bruce Momjian   http://candle.pha.pa.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: CSV mode option for pg_dump

From

Tom Lane

Date:

13 June 2006, 11:14:01

"Andrew Dunstan" <andrew@dunslane.net> writes:
> I wish I could understand why people are so keen to make other people turn
> handsprings in order to avoid a feature which, as Bruce points out, is
> already on the TODO list, and which, by my 10 minute analysis, would involve
> almost trivial code impact and risk. If this involved major impact I might
> understand, but it really doesn't.

Supporting all of the CSV options in pg_dump would involve major bloat
in its option set, and it already has far too many options.  If it were
just a matter of adding a "--csv" switch I wouldn't be complaining, but
there are half a dozen more sub-options, and it seems like every time we
turn around someone is finding a reason for another one.  Propagating
all that cruft through pg_dump would be a PITA, and updating it to track
future additions would be too.

Furthermore, the entire rationale for the feature is predicated on the
claim that programs other than pg_restore might find it useful.  But
this conveniently ignores the fact that if there are any such programs
in existence, what this will really do is BREAK them, because they won't
be able to cope with all the variants that pass for CSV.

My opinions would be less negative if I thought that CSV were a
well-defined format that would never change.  I don't believe that it
has either property, however, and so I'm against letting it get into our
dump file format.  I think we'll just live to regret it if we do.
        regards, tom lane

Re: CSV mode option for pg_dump

From

Bruce Momjian

Date:

13 June 2006, 11:21:10

Good point.  The number of CSV options would be hard to support for
pg_dump.  Any thoughts from anyone on how to do that cleanly?  Could we
just support the default behavior?

---------------------------------------------------------------------------

Tom Lane wrote:
> "Andrew Dunstan" <andrew@dunslane.net> writes:
> > I wish I could understand why people are so keen to make other people turn
> > handsprings in order to avoid a feature which, as Bruce points out, is
> > already on the TODO list, and which, by my 10 minute analysis, would involve
> > almost trivial code impact and risk. If this involved major impact I might
> > understand, but it really doesn't.
> 
> Supporting all of the CSV options in pg_dump would involve major bloat
> in its option set, and it already has far too many options.  If it were
> just a matter of adding a "--csv" switch I wouldn't be complaining, but
> there are half a dozen more sub-options, and it seems like every time we
> turn around someone is finding a reason for another one.  Propagating
> all that cruft through pg_dump would be a PITA, and updating it to track
> future additions would be too.
> 
> Furthermore, the entire rationale for the feature is predicated on the
> claim that programs other than pg_restore might find it useful.  But
> this conveniently ignores the fact that if there are any such programs
> in existence, what this will really do is BREAK them, because they won't
> be able to cope with all the variants that pass for CSV.
> 
> My opinions would be less negative if I thought that CSV were a
> well-defined format that would never change.  I don't believe that it
> has either property, however, and so I'm against letting it get into our
> dump file format.  I think we'll just live to regret it if we do.
> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match
> 

--  Bruce Momjian   http://candle.pha.pa.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: CSV mode option for pg_dump

From

Martijn van Oosterhout

Date:

13 June 2006, 11:35:07

On Tue, Jun 13, 2006 at 10:20:53AM -0400, Bruce Momjian wrote:
>
> Good point.  The number of CSV options would be hard to support for
> pg_dump.  Any thoughts from anyone on how to do that cleanly?  Could we
> just support the default behavior?

What this tells me is that we need a tool somewhere between psql and
pg_dump, say, pgquery. It's sole purpose in life is to generate output
from various queries. Because it's a seperate tool there's no question
of psql or pg_dump being able to parse them.

While you're at it, you could add modules to support many different
output styles, like CSV, XML, Excel format, HTML, etc.

This I beleive would take the load off psql to provide many different
output styles, as well as the load off pg_dump to produce
parsable-by-third-party output.

Thoughts?

Side note: In my experience Excel happily slurps up tab delimited
output, so I'm not sure why all of this is an issue in the first place.

Have a ncie day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: CSV mode option for pg_dump

From

Volkan YAZICI

Date:

13 June 2006, 11:48:33

On Jun 13 10:20, Bruce Momjian wrote:
> 
> Good point.  The number of CSV options would be hard to support for
> pg_dump.  Any thoughts from anyone on how to do that cleanly?  Could we
> just support the default behavior?

IMHO, it might be better if we'd support a syntax like
 pg_dump --csv=opt0,para0:opt2,opt3

This can save us from the pg_dump parameter pollution a little bit.

Furthermore, I think CSV format for the dump files can be maintained
better under an external project. (pgFoundry?) By this way, main
developers will be able to cope with their own core problems while
other users/developers can contribute on the CSV code easily. And if
any user will ever want to get CSV functionality in the pg_dump,
he/she will just issue a --csv parameter (with the above syntax) and
pg_dump will make a suitable dlopen() call for the related (CSV)
module. Anyway, this is just an idea for modularity; but the main
thing I try to underline is to give pg_dump a module functionality for
similar problems.

Regards.

Re: CSV mode option for pg_dump

From

Andrew Dunstan

Date:

13 June 2006, 11:53:14

Tom Lane wrote:

>there are half a dozen more sub-options, and it seems like every time we
>turn around someone is finding a reason for another one.  
>  
>

This is a bit unfair. The feature was introduced in 8.0, and slightly 
enhanced in 8.1. There have not been any additional CSV features this 
release cycle unless my memory is worse than I thought, and I at least 
have said previously that I will be resistant to the addition of further 
CSV options.

My thoughts regarding options for pg_dump was actually to provide a much 
smaller set than the full set available with COPY, specifically to 
provide for using a single rather than a double quote char, and optional 
header lines - no alternate escape or delimiter, and no FORCE QUOTE  
(FORCE NOT NULL isn't relevant as it is only useful for non-postgres 
derived data). At least that would be a reasonable starting point, and 
would I believe cater for the vast majority of uses.

cheers

andrew

Re: CSV mode option for pg_dump

From

"Joshua D. Drake"

Date:

13 June 2006, 12:08:39

Bruce Momjian wrote:
> Good point.  The number of CSV options would be hard to support for
> pg_dump.  Any thoughts from anyone on how to do that cleanly?  Could we
> just support the default behavior?

Although I don't see a real need for the feature, I do think that if we 
were to support "1" (well two if you include the already tab delimited) 
csv output it would be a large amount of bloat.

Perhaps we could pick "1" output, say comma delimted with quoted fields?

"foo","bar    ","baz"

Joshua D. Drake

> 
> ---------------------------------------------------------------------------
> 
> Tom Lane wrote:
>> "Andrew Dunstan" <andrew@dunslane.net> writes:
>>> I wish I could understand why people are so keen to make other people turn
>>> handsprings in order to avoid a feature which, as Bruce points out, is
>>> already on the TODO list, and which, by my 10 minute analysis, would involve
>>> almost trivial code impact and risk. If this involved major impact I might
>>> understand, but it really doesn't.
>> Supporting all of the CSV options in pg_dump would involve major bloat
>> in its option set, and it already has far too many options.  If it were
>> just a matter of adding a "--csv" switch I wouldn't be complaining, but
>> there are half a dozen more sub-options, and it seems like every time we
>> turn around someone is finding a reason for another one.  Propagating
>> all that cruft through pg_dump would be a PITA, and updating it to track
>> future additions would be too.
>>
>> Furthermore, the entire rationale for the feature is predicated on the
>> claim that programs other than pg_restore might find it useful.  But
>> this conveniently ignores the fact that if there are any such programs
>> in existence, what this will really do is BREAK them, because they won't
>> be able to cope with all the variants that pass for CSV.
>>
>> My opinions would be less negative if I thought that CSV were a
>> well-defined format that would never change.  I don't believe that it
>> has either property, however, and so I'm against letting it get into our
>> dump file format.  I think we'll just live to regret it if we do.
>>
>>             regards, tom lane
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 9: In versions below 8.0, the planner will ignore your desire to
>>        choose an index scan if your joining column's datatypes do not
>>        match
>>
> 


-- 
            === The PostgreSQL Company: Command Prompt, Inc. ===      Sales/Support: +1.503.667.4564 || 24x7/Emergency:
+1.800.492.2240     Providing the most comprehensive  PostgreSQL solutions since 1997
http://www.commandprompt.com/

Re: CSV mode option for pg_dump

From

Andrew Dunstan

Date:

13 June 2006, 12:17:13

Volkan YAZICI wrote:

>On Jun 13 10:20, Bruce Momjian wrote:
>  
>
>>Good point.  The number of CSV options would be hard to support for
>>pg_dump.  Any thoughts from anyone on how to do that cleanly?  Could we
>>just support the default behavior?
>>    
>>
>
>IMHO, it might be better if we'd support a syntax like
>
>  pg_dump --csv=opt0,para0:opt2,opt3
>
>This can save us from the pg_dump parameter pollution a little bit.
>
>Furthermore, I think CSV format for the dump files can be maintained
>better under an external project. (pgFoundry?) By this way, main
>developers will be able to cope with their own core problems while
>other users/developers can contribute on the CSV code easily. And if
>any user will ever want to get CSV functionality in the pg_dump,
>he/she will just issue a --csv parameter (with the above syntax) and
>pg_dump will make a suitable dlopen() call for the related (CSV)
>module. Anyway, this is just an idea for modularity; but the main
>thing I try to underline is to give pg_dump a module functionality for
>similar problems.
>
>  
>

There are some problems with this, though:

. FORCE QUOTE is table specific, and COPY will barf if you name a column 
that isn't on the table. Providing for this option would involve lots 
more code in pg_dump, as we'd have to filter the list according to the 
column names in each table.

. specifying arbitrary chars for quote, escape and delimiter could be 
tricky from the command line, especially if you want to specify a tab 
delimiter or backslash escape.

cheers

andrew

Re: CSV mode option for pg_dump

From

Andrew Dunstan

Date:

13 June 2006, 12:21:28

Martijn van Oosterhout wrote:

>
>Side note: In my experience Excel happily slurps up tab delimited
>output, so I'm not sure why all of this is an issue in the first place.
>
>  
>

I guess you experience doesn't run to data that has embedded tabs, for 
example.

There really is a reason we did this in the first place, and it wasn't 
for fun.

cheers

andrew

Re: CSV mode option for pg_dump

From

Rod Taylor

Date:

13 June 2006, 12:32:46

On Mon, 2006-06-12 at 16:28 -0400, Bill Bartlett wrote:
> Can't -- the main production database is over at a CoLo site with access
> only available via SSH, and tightly-restricted SSH at that. Generally
> one of the developers will SSH over to the server, pull out whatever
> data is needed into a text file via psql or pg_dump, scp the file(s)
> back here and send them to the user.

I don't get it. If you can use psql then you already have csv support.

psql -c 'COPY pg_class TO STDOUT WITH CSV' postgres > pg_class.csv

> > -----Original Message-----
> > From: Joshua D. Drake [mailto:jd@commandprompt.com] 
> > Sent: Monday, June 12, 2006 4:15 PM
> > To: Bill Bartlett
> > Cc: 'Andrew Dunstan'; 'Tom Lane'; 'PG Hackers'
> > Subject: Re: [HACKERS] CSV mode option for pg_dump
> > 
> > 
> > Bill Bartlett wrote:
> > > Here's me speaking up -- I'd definitely use it!   As a 
> > quick way to pull
> > > data into Excel to do basic reports or analysis, a CSV 
> > format would be 
> > > great.
> > 
> > Why not just use ODBC?
> > 
> > Joshua D. Drake
> > -- 
> > 
> >              === The PostgreSQL Company: Command Prompt, Inc. ===
> >        Sales/Support: +1.503.667.4564 || 24x7/Emergency: 
> > +1.800.492.2240
> >        Providing the most comprehensive  PostgreSQL solutions 
> > since 1997
> >                       http://www.commandprompt.com/
> > 
> > 
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly
> 
--

Re: CSV mode option for pg_dump

From

Steve Atkins

Date:

13 June 2006, 13:10:34

On Jun 13, 2006, at 7:34 AM, Martijn van Oosterhout wrote:
>
> What this tells me is that we need a tool somewhere between psql and
> pg_dump, say, pgquery. It's sole purpose in life is to generate output
> from various queries. Because it's a seperate tool there's no question
> of psql or pg_dump being able to parse them.
>
> While you're at it, you could add modules to support many different
> output styles, like CSV, XML, Excel format, HTML, etc.
>
> This I beleive would take the load off psql to provide many different
> output styles, as well as the load off pg_dump to produce
> parsable-by-third-party output.
>
> Thoughts?

Perl+DBD::Pg+CPAN does almost all of this already. Lots of support
for countless different output formats, and mostly fairly well battle- 
tested.

I suspect that a perl script to do all that would be dominated by  
commandline
option parsing, as all the hard work is in existing modules.

Would that be adequate, or do we really want to reimplement and  
maintain all
the output format complexity in our own code, in C?

Cheers,  Steve

Re: CSV mode option for pg_dump

From

"Joshua D. Drake"

Date:

13 June 2006, 13:13:56

Rod Taylor wrote:
> On Mon, 2006-06-12 at 16:28 -0400, Bill Bartlett wrote:
>> Can't -- the main production database is over at a CoLo site with access
>> only available via SSH, and tightly-restricted SSH at that. Generally
>> one of the developers will SSH over to the server, pull out whatever
>> data is needed into a text file via psql or pg_dump, scp the file(s)
>> back here and send them to the user.
> 
> I don't get it. If you can use psql then you already have csv support.
> 
> psql -c 'COPY pg_class TO STDOUT WITH CSV' postgres > pg_class.csv

If you data looks like this:

foo    barbaz    bing

You are o.k. You have three columns, tab delimited.

However if you data looks like this:

foo    bar    baz    bing

You have a problem.

foo is one column
bar and baz are a single column
bing is a single column

How does excel know that bar    baz is a single column? It doesn't because 
you told it to delimit on tabs and thus you have four columns as far as 
Excel is concerned.

An alternative although I don't know what kind of headaches it would 
cause is to have a text delimiter as well as a field delimter, e.g;

"foo"    "bar    baz"    "bing"

Sincerely,

Joshua D. Drake




> 
>>> -----Original Message-----
>>> From: Joshua D. Drake [mailto:jd@commandprompt.com] 
>>> Sent: Monday, June 12, 2006 4:15 PM
>>> To: Bill Bartlett
>>> Cc: 'Andrew Dunstan'; 'Tom Lane'; 'PG Hackers'
>>> Subject: Re: [HACKERS] CSV mode option for pg_dump
>>>
>>>
>>> Bill Bartlett wrote:
>>>> Here's me speaking up -- I'd definitely use it!   As a 
>>> quick way to pull
>>>> data into Excel to do basic reports or analysis, a CSV 
>>> format would be 
>>>> great.
>>> Why not just use ODBC?
>>>
>>> Joshua D. Drake
>>> -- 
>>>
>>>              === The PostgreSQL Company: Command Prompt, Inc. ===
>>>        Sales/Support: +1.503.667.4564 || 24x7/Emergency: 
>>> +1.800.492.2240
>>>        Providing the most comprehensive  PostgreSQL solutions 
>>> since 1997
>>>                       http://www.commandprompt.com/
>>>
>>>
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 1: if posting/reading through Usenet, please send an appropriate
>>        subscribe-nomail command to majordomo@postgresql.org so that your
>>        message can get through to the mailing list cleanly
>>


-- 
            === The PostgreSQL Company: Command Prompt, Inc. ===      Sales/Support: +1.503.667.4564 || 24x7/Emergency:
+1.800.492.2240     Providing the most comprehensive  PostgreSQL solutions since 1997
http://www.commandprompt.com/

Re: CSV mode option for pg_dump

From

PFC

Date:

13 June 2006, 13:24:53

From what I gather, the CSV format dump would only contain data.I think pg_dump is the friend of pg_restore. It dumps
everything 
 
including user defined functions, types, schemas etc. CSV does not fit  
with this.
Besides, people will probably want to dump into CSV the result of any  
query, to load it into excel, not just the full contents of a table.
So, why not create a separate tool, someone suggested pg_query for that,  
I second it.This tool would take a query and format options, and would output a file  
in whatever format chosen by the user (CSV, COPY format, xml, whatever)
A script language (python) can be used, which will significantly shorten  
development times and allow easy modularity, as it is easier to add a  
module to a python program than a C program.I would vote for Python because I love it and it has a very good postgres

adapter (psycopg2) which knows how to convers every postgres type to a  
native language type (yes, even multidimensional arrays of BOX get  
converted). And it's really fast at retrieving large volumes of data.
So you have a stable, fast tool for backup and restore (pg_dump) and a  
rapidly evolving, user-friendly and extendable tool for exporting data,  
and everyone is happy.
Mr Momijan talks about adding modular functionality to pg_dump. Is it  
really necessary ? What is the objective ? Is it to reuse code in pg_dump  
? I guess not ; if a user wants to dump, for instance, all the tables in a  
schema, implementing this logic in python is only a few lines of code  
(select from information_schema...)
To be realistic, output format modules should be written in script  
languages. Noone sane is eager to do string manipulation in C. Thus these  
modules would have to somehow fit with pg_dump, maybe with a pipe or  
something. This means designing another protocol. Reimplementing in a  
scripting langage the parts of pg_dump which will be reused by this  
project (mainly, enumerating tables and stuff) will be far easier.
Just look.

Python 2.4.2 (#1, Mar 30 2006, 14:34:35)
[GCC 3.4.4 (Gentoo 3.4.4-r1, ssp-3.4.4-1.0, pie-8.7.8)] on linux2
Type "help", "copyright", "credits" or "license" for more information.

...opens a db connection...

>>> c.execute( "SELECT * FROM test.csv" )
>>> data = c.fetchall()
>>> data
[[1, datetime.date(2006, 6, 13), 'this\tcontains\ttabulations'], [2,  
datetime.date(2006, 6, 13), "this'contains'quotes"], [3,  
datetime.date(2006, 6, 13), 'this"contains"double quotes']]
>>> import csv, sys
>>> c = csv.writer( sys.stdout, dialect = csv.excel )
>>> c.writerows( data )
1,2006-06-13,this       contains        tabulations
2,2006-06-13,this'contains'quotes
3,2006-06-13,"this""contains""double quotes"

Re: CSV mode option for pg_dump

From

Andrew Dunstan

Date:

13 June 2006, 13:31:08

PFC wrote:

>
>     From what I gather, the CSV format dump would only contain data.
>     I think pg_dump is the friend of pg_restore. It dumps everything  
> including user defined functions, types, schemas etc. CSV does not 
> fit  with this.
>
>    

This is just nonsense. There is not the slightest reason that CSV data 
cannot be embedded in a text dump nor exist as the data members of a 
custom or tar dump with the corresponding COPY statements modified 
accordingly.

Really, let's get the facts straight, please.

cheers

andrew

Re: CSV mode option for pg_dump

From

"Joshua D. Drake"

Date:

13 June 2006, 13:46:41

> 
> Would that be adequate, or do we really want to reimplement and maintain 
> all
> the output format complexity in our own code, in C?

I think the point is that we should provide a native implementation 
because not everyone is crazy enough to use perl (blatant jab ;)). I 
would never expect a customer to write a perl or python script just to 
get their data in what is widely considered a standard business format 
that can be imported by their userland application.

The people on the hackers list, are NOT the target for this feature. The 
people on general, admin and novice are.

Sincerely,

Joshua D. Drake




> 
> Cheers,
>   Steve
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
> 
>               http://archives.postgresql.org
> 


-- 
            === The PostgreSQL Company: Command Prompt, Inc. ===      Sales/Support: +1.503.667.4564 || 24x7/Emergency:
+1.800.492.2240     Providing the most comprehensive  PostgreSQL solutions since 1997
http://www.commandprompt.com/

Re: CSV mode option for pg_dump

From

"Bill Bartlett"

Date:

13 June 2006, 13:52:33

> From: Rod Taylor [mailto:pg@rbt.ca] 
> Sent: Tuesday, June 13, 2006 11:31 AM
> 
> 
> On Mon, 2006-06-12 at 16:28 -0400, Bill Bartlett wrote:
> > Can't -- the main production database is over at a CoLo site with 
> > access only available via SSH, and tightly-restricted SSH at that. 
> > Generally one of the developers will SSH over to the 
> server, pull out 
> > whatever data is needed into a text file via psql or 
> pg_dump, scp the 
> > file(s) back here and send them to the user.
> 
> I don't get it. If you can use psql then you already have csv support.
> 
> psql -c 'COPY pg_class TO STDOUT WITH CSV' postgres > pg_class.csv

Ah - RTF-UPDATED-M on my part.  Most of my systems are still in PG 7.4.x
databases so I tend to stick with the 7.x docs, and I didn't notice the
"WITH CSV" option added in 8.0.  That, plus temp tables, will be very
useful.

However, I also agree with the need for a new "pg_query / pg_export"
program. A program geared solely towards exporting the results of a
query would allow many of the options that are needed for the
ever-growing variety of output formats (XML, CSV, HTML, XHTML, etc.) and
details for each format without needing to clutter up pg_dump with
things that really having nothing to do with backing up and restoring
data.  It could also allow a large range of options related to getting
data out (where, order by), many of which have also been discussed for
pg_dump recently.

- Bill

> 
> > > -----Original Message-----
> > > From: Joshua D. Drake [mailto:jd@commandprompt.com]
> > > Sent: Monday, June 12, 2006 4:15 PM
> > > To: Bill Bartlett
> > > Cc: 'Andrew Dunstan'; 'Tom Lane'; 'PG Hackers'
> > > Subject: Re: [HACKERS] CSV mode option for pg_dump
> > > 
> > > 
> > > Bill Bartlett wrote:
> > > > Here's me speaking up -- I'd definitely use it!   As a 
> > > quick way to pull
> > > > data into Excel to do basic reports or analysis, a CSV
> > > format would be
> > > > great.
> > > 
> > > Why not just use ODBC?
> > > 
> > > Joshua D. Drake
> > > --
> > > 
> > >              === The PostgreSQL Company: Command Prompt, Inc. ===
> > >        Sales/Support: +1.503.667.4564 || 24x7/Emergency:
> > > +1.800.492.2240
> > >        Providing the most comprehensive  PostgreSQL solutions
> > > since 1997
> > >                       http://www.commandprompt.com/
> > > 
> > > 
> -- 
>

Re: CSV mode option for pg_dump

From

Steve Atkins

Date:

13 June 2006, 13:53:47

On Jun 13, 2006, at 9:47 AM, Joshua D. Drake wrote:

>> Would that be adequate, or do we really want to reimplement and  
>> maintain all
>> the output format complexity in our own code, in C?
>
> I think the point is that we should provide a native implementation  
> because not everyone is crazy enough to use perl (blatant jab ;)).  
> I would never expect a customer to write a perl or python script  
> just to get their data in what is widely considered a standard  
> business format that can be imported by their userland application.

That wasn't what I was discussing, on two levels. Firstly, I wasn't  
suggesting that the end user write anything, secondly I was talking  
about the other output formats discussed (Excel, HTML...) rather than  
just CSV.

>
> The people on the hackers list, are NOT the target for this  
> feature. The people on general, admin and novice are.

I was referring to the other export formats mentioned (Excel,  
HTML...). We already support CSV export of single tables with the  
bundled software, don't we?

Cheers,  Steve

Re: CSV mode option for pg_dump

From

Andrew Dunstan

Date:

13 June 2006, 14:03:01

Steve Atkins wrote:

>
> Would that be adequate, or do we really want to reimplement and  
> maintain all
> the output format complexity in our own code, in C?
>
>

Code to produce CSVs is there, now, today, and has been since 8.0.  That 
is *not* what is at issue here. If you want to debate whether or not it 
should be in the backend you are 2 years too late. The ONLY question 
here is about whether or not to have it enabled as an option in pg_dump.

try these and see the interesting results: COPY pg_class TO '/tmp/pg_class.csv' CSV;
or in psql \copy pg_class to '/tmp/pg_class.csv' csv

As for XML which was also mentioned, you should be aware that there is a 
Google Summer of Code project to implement SQL/XML, so watch this space. 
(As for why that needs to be in the backend, see previous discussions)

cheers

andrew

Re: CSV mode option for pg_dump

From

Bruce Momjian

Date:

13 June 2006, 18:08:57

pg_dump CSV TODO item removed until we come up with something everyone
can agree on.

---------------------------------------------------------------------------

Joshua D. Drake wrote:
> > 
> > Would that be adequate, or do we really want to reimplement and maintain 
> > all
> > the output format complexity in our own code, in C?
> 
> I think the point is that we should provide a native implementation 
> because not everyone is crazy enough to use perl (blatant jab ;)). I 
> would never expect a customer to write a perl or python script just to 
> get their data in what is widely considered a standard business format 
> that can be imported by their userland application.
> 
> The people on the hackers list, are NOT the target for this feature. The 
> people on general, admin and novice are.
> 
> Sincerely,
> 
> Joshua D. Drake
> 
> 
> 
> 
> > 
> > Cheers,
> >   Steve
> > 
> > ---------------------------(end of broadcast)---------------------------
> > TIP 4: Have you searched our list archives?
> > 
> >               http://archives.postgresql.org
> > 
> 
> 
> -- 
> 
>              === The PostgreSQL Company: Command Prompt, Inc. ===
>        Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
>        Providing the most comprehensive  PostgreSQL solutions since 1997
>                       http://www.commandprompt.com/
> 
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
> 

--  Bruce Momjian   http://candle.pha.pa.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: CSV mode option for pg_dump

From

Andrew Dunstan

Date:

13 June 2006, 21:46:55


Bruce Momjian wrote:

>pg_dump CSV TODO item removed until we come up with something everyone
>can agree on.
>
>
>

That's a pity.

Just to show you how little is involved in what I was suggesting, a
prototype patch is attached - it's 182 lines of context diff, which is
pretty small for a new feature. It took me about an hour to write and I
have tested it against the regression db in both text and binary dump
modes, where it works without a hitch.

cheers

andrew


Index: src/bin/pg_dump/pg_dump.c
===================================================================
RCS file: /cvsroot/pgsql/src/bin/pg_dump/pg_dump.c,v
retrieving revision 1.438
diff -c -r1.438 pg_dump.c
*** src/bin/pg_dump/pg_dump.c    9 Jun 2006 19:46:09 -0000    1.438
--- src/bin/pg_dump/pg_dump.c    14 Jun 2006 00:32:03 -0000
***************
*** 113,118 ****
--- 113,123 ----
  /* flag to turn on/off dollar quoting */
  static int    disable_dollar_quoting = 0;

+ /* flag to control if using CSv */
+ static bool use_csv = false;
+
+ /* holder for CSV options */
+ static PQExpBuffer csv_opts;

  static void help(const char *progname);
  static NamespaceInfo *findNamespace(Oid nsoid, Oid objoid);
***************
*** 251,256 ****
--- 256,265 ----
          {"disable-triggers", no_argument, &disable_triggers, 1},
          {"use-set-session-authorization", no_argument, &use_setsessauth, 1},

+         /* long options with no short version */
+         {"csv", no_argument, NULL ,2},
+         {"csv-option",required_argument, NULL, 3},
+
          {NULL, 0, NULL, 0}
      };
      int            optindex;
***************
*** 285,290 ****
--- 294,301 ----
          }
      }

+     csv_opts = createPQExpBuffer();
+
      while ((c = getopt_long(argc, argv, "abcCdDE:f:F:h:in:oOp:RsS:t:uU:vWxX:Z:",
                              long_options, &optindex)) != -1)
      {
***************
*** 419,424 ****
--- 430,462 ----
                  break;
                  /* This covers the long options equivalent to -X xxx. */

+             case 2:  /* csv */
+                 use_csv = true;
+                 break;
+
+             case 3: /* csv-option */
+                 if (strcmp(optarg, "singlequote") == 0)
+                 {
+                     appendPQExpBuffer(csv_opts,"QUOTE AS '''' ");
+                 }
+                 else if (strcmp(optarg, "tabdelimiter") == 0)
+                 {
+                     appendPQExpBuffer(csv_opts,"DELIMITER AS E'\\t' ");
+                 }
+                 else if (strcmp(optarg, "header") == 0)
+                 {
+                     appendPQExpBuffer(csv_opts,"HEADER ");
+                 }
+                 else
+                 {
+                     fprintf(stderr,
+                             _("%s: invalid csv option -- %s\n"),
+                             progname, optarg);
+                     fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
+                     exit(1);
+                 }
+
+
              case 0:
                  break;

***************
*** 463,468 ****
--- 501,518 ----
          exit(1);
      }

+     if ( use_csv == true && dumpInserts == true)
+     {
+         write_msg(NULL, "INSERT (-d, -D) and CSV (--csv) options cannot be used together\n");
+         exit(1);
+     }
+
+     if ( use_csv == false && strlen(csv_opts->data) > 0)
+     {
+         write_msg(NULL, "You must specify --csv to use --csv-option\n");
+         exit(1);
+     }
+
      /* open the output file */
      switch (format[0])
      {
***************
*** 714,719 ****
--- 764,771 ----
               "                           use SESSION AUTHORIZATION commands instead of\n"
               "                           OWNER TO commands\n"));

+     printf(_("  --csv                    use CSV mode\n"));
+     printf(_("  --csv-option=opt         one of header, tabdelimiter or singlequote\n"));
      printf(_("\nConnection options:\n"));
      printf(_("  -h, --host=HOSTNAME      database server host or socket directory\n"));
      printf(_("  -p, --port=PORT          database server port number\n"));
***************
*** 881,898 ****

      if (oids && hasoids)
      {
!         appendPQExpBuffer(q, "COPY %s %s WITH OIDS TO stdout;",
                            fmtQualifiedId(tbinfo->dobj.namespace->dobj.name,
                                           classname),
                            column_list);
      }
      else
      {
!         appendPQExpBuffer(q, "COPY %s %s TO stdout;",
                            fmtQualifiedId(tbinfo->dobj.namespace->dobj.name,
                                           classname),
                            column_list);
      }
      res = PQexec(g_conn, q->data);
      check_sql_result(res, g_conn, q->data, PGRES_COPY_OUT);
      PQclear(res);
--- 933,955 ----

      if (oids && hasoids)
      {
!         appendPQExpBuffer(q, "COPY %s %s WITH OIDS TO stdout",
                            fmtQualifiedId(tbinfo->dobj.namespace->dobj.name,
                                           classname),
                            column_list);
      }
      else
      {
!         appendPQExpBuffer(q, "COPY %s %s TO stdout",
                            fmtQualifiedId(tbinfo->dobj.namespace->dobj.name,
                                           classname),
                            column_list);
      }
+     if (use_csv)
+     {
+         appendPQExpBuffer(q, " CSV %s", csv_opts->data);
+     }
+     appendPQExpBuffer(q, ";");
      res = PQexec(g_conn, q->data);
      check_sql_result(res, g_conn, q->data, PGRES_COPY_OUT);
      PQclear(res);
***************
*** 1139,1147 ****
          /* must use 2 steps here 'cause fmtId is nonreentrant */
          appendPQExpBuffer(copyBuf, "COPY %s ",
                            fmtId(tbinfo->dobj.name));
!         appendPQExpBuffer(copyBuf, "%s %sFROM stdin;\n",
                            fmtCopyColumnList(tbinfo),
                        (tdinfo->oids && tbinfo->hasoids) ? "WITH OIDS " : "");
          copyStmt = copyBuf->data;
      }
      else
--- 1196,1208 ----
          /* must use 2 steps here 'cause fmtId is nonreentrant */
          appendPQExpBuffer(copyBuf, "COPY %s ",
                            fmtId(tbinfo->dobj.name));
!         appendPQExpBuffer(copyBuf, "%s %s",
                            fmtCopyColumnList(tbinfo),
                        (tdinfo->oids && tbinfo->hasoids) ? "WITH OIDS " : "");
+         appendPQExpBuffer(copyBuf, "FROM stdin");
+         if (use_csv)
+             appendPQExpBuffer(copyBuf, " CSV %s", csv_opts->data);
+         appendPQExpBuffer(copyBuf, ";\n");
          copyStmt = copyBuf->data;
      }
      else

Re: CSV mode option for pg_dump

From

Tom Lane

Date:

14 June 2006, 00:43:49

Andrew Dunstan <andrew@dunslane.net> writes:
> This is just nonsense. There is not the slightest reason that CSV data 
> cannot be embedded in a text dump nor exist as the data members of a 
> custom or tar dump with the corresponding COPY statements modified 
> accordingly.

Well, the really *core* question here is whether we trust the stability
of the CSV format definition (and code) enough to want to rely on it for
data dump/restore purposes.  I'm still a few years away from that,
myself.  AFAICT the raison d'etre of the CSV code is "emit whatever it
takes to satisfy this, that, and the other broken Microsoft application".
That's fine as an export tool, but as a dump/reload tool, nyet.  If you
put it in pg_dump you're just handing neophytes another foot-gun.
        regards, tom lane

Re: CSV mode option for pg_dump

From

"Andrew Dunstan"

Date:

14 June 2006, 08:59:57

Tom Lane said:
> Andrew Dunstan <andrew@dunslane.net> writes:
>> This is just nonsense. There is not the slightest reason that CSV data
>>  cannot be embedded in a text dump nor exist as the data members of a
>> custom or tar dump with the corresponding COPY statements modified
>> accordingly.
>
> Well, the really *core* question here is whether we trust the stability
> of the CSV format definition (and code) enough to want to rely on it
> for data dump/restore purposes.  I'm still a few years away from that,
> myself.  AFAICT the raison d'etre of the CSV code is "emit whatever it
> takes to satisfy this, that, and the other broken Microsoft
> application". That's fine as an export tool, but as a dump/reload tool,
> nyet.  If you put it in pg_dump you're just handing neophytes another
> foot-gun.
>

Well, I'm staggered. Really.

The CSV definition and its lack of formality is a complete red herring in
this, as are references to Microsoft.

The real issue in what you say here is your suggestion that we might not be
able to reproduce the input in some cases via a COPY CSV round trip. If that
is so it's a bug of our (probably my) making, and must be fixed. I assert
that it is not so. In fact all the tests I did during development and since
were premised on recovering the input exactly. The only CSV option that
breaks things in that respect is FORCE NOT NULL, and it is designed for data
coming in from a non Postgres source, so it's not relevant here. Even FORCE
QUOTE won't break things because it never quotes a null value, and the only
semantic significance to us of quoting is the null-ness of the value.

If the code is broken then it should be discoverable by test or code
analysis. There is no need to refer to any other application or standard. So
if you or anyone think there is a case that will not reproduce the data
exactly when the same CSV options are used for output and input, I challenge
you or them to provide a single example.

You say you're a few years away from trusting the code. Well, it's not so
huge that it's beyond analysis, and I'll be happy to explain anything that
puzzles you. Perhaps more importantly, it has been in use now for 18 months.
We discovered one problem with embedded line feeds very early in the 8.0
release cycle, and fixed it. After that I have not heard of a single
problem. And I assure you this code is widely used.

It probably isn't used much as a round trip mechanism, probably in part
because we haven't provided it as a pg_dump option. So maybe we have a
chicken/egg scenario here. We do have some round trip regression tests in
the copy test, and those can be beefed up if necessary to increase your
confidence level.

I'm happy to debate details, but general assertions of "we can't trust this
code" don't seem worth much to me.

cheers

andrew

Re: CSV mode option for pg_dump

From

Bruce Momjian

Date:

14 June 2006, 09:48:07

I agree with Andrew.  If there are bugs in CSV, then we should fix them,
not avoid give users this usability improvement.  I know I have analyzed
the code and fixed a few problems with it in the past.

As far as pg_dump, I am thinking the most flexible solution would be to
allow an arbitrary WITH clause to be added to COPY, like
pg_dump --copy-with "CSV"

This would allow not only CSV control, but specification of non-CSV
options if desired.

---------------------------------------------------------------------------

Andrew Dunstan wrote:
> Tom Lane said:
> > Andrew Dunstan <andrew@dunslane.net> writes:
> >> This is just nonsense. There is not the slightest reason that CSV data
> >>  cannot be embedded in a text dump nor exist as the data members of a
> >> custom or tar dump with the corresponding COPY statements modified
> >> accordingly.
> >
> > Well, the really *core* question here is whether we trust the stability
> > of the CSV format definition (and code) enough to want to rely on it
> > for data dump/restore purposes.  I'm still a few years away from that,
> > myself.  AFAICT the raison d'etre of the CSV code is "emit whatever it
> > takes to satisfy this, that, and the other broken Microsoft
> > application". That's fine as an export tool, but as a dump/reload tool,
> > nyet.  If you put it in pg_dump you're just handing neophytes another
> > foot-gun.
> >
> 
> 
> Well, I'm staggered. Really.
> 
> The CSV definition and its lack of formality is a complete red herring in
> this, as are references to Microsoft.
> 
> The real issue in what you say here is your suggestion that we might not be
> able to reproduce the input in some cases via a COPY CSV round trip. If that
> is so it's a bug of our (probably my) making, and must be fixed. I assert
> that it is not so. In fact all the tests I did during development and since
> were premised on recovering the input exactly. The only CSV option that
> breaks things in that respect is FORCE NOT NULL, and it is designed for data
> coming in from a non Postgres source, so it's not relevant here. Even FORCE
> QUOTE won't break things because it never quotes a null value, and the only
> semantic significance to us of quoting is the null-ness of the value.
> 
> If the code is broken then it should be discoverable by test or code
> analysis. There is no need to refer to any other application or standard. So
> if you or anyone think there is a case that will not reproduce the data
> exactly when the same CSV options are used for output and input, I challenge
> you or them to provide a single example.
> 
> You say you're a few years away from trusting the code. Well, it's not so
> huge that it's beyond analysis, and I'll be happy to explain anything that
> puzzles you. Perhaps more importantly, it has been in use now for 18 months.
> We discovered one problem with embedded line feeds very early in the 8.0
> release cycle, and fixed it. After that I have not heard of a single
> problem. And I assure you this code is widely used.
> 
> It probably isn't used much as a round trip mechanism, probably in part
> because we haven't provided it as a pg_dump option. So maybe we have a
> chicken/egg scenario here. We do have some round trip regression tests in
> the copy test, and those can be beefed up if necessary to increase your
> confidence level.
> 
> I'm happy to debate details, but general assertions of "we can't trust this
> code" don't seem worth much to me.
> 
> cheers
> 
> andrew
> 
> 
> 
> 
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
> 

--  Bruce Momjian   http://candle.pha.pa.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: CSV mode option for pg_dump

From

Tom Lane

Date:

14 June 2006, 11:31:51

Bruce Momjian <pgman@candle.pha.pa.us> writes:
> I agree with Andrew.  If there are bugs in CSV, then we should fix them,
> not avoid give users this usability improvement.

The case for it being a usability improvement seems very weak to me;
no one has yet demonstrated an actual use-case where someone would pull
CSV data out of pg_dump output instead of just dumping the table
directly with COPY.  Now the "anti" case is admittedly hypothetical:
I'm supposing that we will eventually be bitten by portability problems
with CSV-style dumps not being loadable into future versions.  But given
the weak nature of the "pro" case, I think we should be conservative
and not take that risk.
        regards, tom lane

Re: CSV mode option for pg_dump

From

Bruce Momjian

Date:

14 June 2006, 11:41:06

Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > I agree with Andrew.  If there are bugs in CSV, then we should fix them,
> > not avoid give users this usability improvement.
> 
> The case for it being a usability improvement seems very weak to me;
> no one has yet demonstrated an actual use-case where someone would pull
> CSV data out of pg_dump output instead of just dumping the table
> directly with COPY.  Now the "anti" case is admittedly hypothetical:
> I'm supposing that we will eventually be bitten by portability problems
> with CSV-style dumps not being loadable into future versions.  But given
> the weak nature of the "pro" case, I think we should be conservative
> and not take that risk.

Well, I saw little request for COPY CSV here, but IRC users were
reporting a lot of interest, and feedback from the release that added it
showed it was a major feature, so just because we haven't see use-case
here doesn't mean it doesn't exist.

As was stated before, the use-case for this is by people we don't
normally have contact with.

--  Bruce Momjian   http://candle.pha.pa.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: CSV mode option for pg_dump

From

Tom Lane

Date:

14 June 2006, 11:50:30

Bruce Momjian <pgman@candle.pha.pa.us> writes:
> As was stated before, the use-case for this is by people we don't
> normally have contact with.

Or who don't exist.

Once again: give me a plausible use-case for pulling this data out of a
pg_dump output file.  You'd need data-massaging code anyway just to
extract the data, so why not expect that code to convert to CSV or
whatever other format you might want?  If you can think of use-cases like
this, why do you think the destination format is necessarily CSV and not
something else?  If it is something else, adding a CSV option to pg_dump
makes it *harder* not easier to write that massaging code, because now
it's got to cope with N dump formats not one.
        regards, tom lane

Re: CSV mode option for pg_dump

From

Bruce Momjian

Date:

14 June 2006, 11:55:16

Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > As was stated before, the use-case for this is by people we don't
> > normally have contact with.
> 
> Or who don't exist.
> 
> Once again: give me a plausible use-case for pulling this data out of a
> pg_dump output file.  You'd need data-massaging code anyway just to
> extract the data, so why not expect that code to convert to CSV or
> whatever other format you might want?  If you can think of use-cases like
> this, why do you think the destination format is necessarily CSV and not
> something else?  If it is something else, adding a CSV option to pg_dump
> makes it *harder* not easier to write that massaging code, because now
> it's got to cope with N dump formats not one.

I don't have to think of a use case.  I trusted the people who said we
needed CSV, so I trust them again if they say doing pg_dump with CSV
would be a good idea.

Also, my suggestion of --copy-with would allow CSV and other format
modifications with minimal code and complexity.

--  Bruce Momjian   http://candle.pha.pa.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: CSV mode option for pg_dump

From

Martijn van Oosterhout

Date:

14 June 2006, 12:04:23

On Tue, Jun 13, 2006 at 09:14:48AM -0700, Joshua D. Drake wrote:
> >I don't get it. If you can use psql then you already have csv support.
> >
> >psql -c 'COPY pg_class TO STDOUT WITH CSV' postgres > pg_class.csv
>
> If you data looks like this:
>
> foo    barbaz    bing
>
> You are o.k. You have three columns, tab delimited.
>
> However if you data looks like this:
>
> foo    bar    baz    bing
>
> You have a problem.

I'm not exactly sure what you're getting at here:

postgres=# create temp table x (a text, b text, c text);
CREATE TABLE
postgres=# insert into x values ('"blah"', 'hello       world', 'hmmm
postgres'# ');
INSERT 0 1
postgres=# \copy x to stdout csv
"""blah""",hello        world,"hmmm
"

> An alternative although I don't know what kind of headaches it would
> cause is to have a text delimiter as well as a field delimter, e.g;

Postgresql already delimits CSV fields as required, so I'm not sure what
you're asking here...

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: CSV mode option for pg_dump

From

Martijn van Oosterhout

Date:

14 June 2006, 12:09:35

On Wed, Jun 14, 2006 at 10:55:04AM -0400, Bruce Momjian wrote:
> > Once again: give me a plausible use-case for pulling this data out of a
> > pg_dump output file.  You'd need data-massaging code anyway just to
> > extract the data, so why not expect that code to convert to CSV or
> > whatever other format you might want?  If you can think of use-cases like
> > this, why do you think the destination format is necessarily CSV and not
> > something else?  If it is something else, adding a CSV option to pg_dump
> > makes it *harder* not easier to write that massaging code, because now
> > it's got to cope with N dump formats not one.
>
> I don't have to think of a use case.  I trusted the people who said we
> needed CSV, so I trust them again if they say doing pg_dump with CSV
> would be a good idea.

I think the point is that it's hard to imagine a use case for CSV
support in pg_dump that isn't already better served by using psql. If
you add it to pg_dump, people have to write special code to extract it
anyway, so why can't they handle the tab delimited themselves.

OTOH, if you use psql you get straightforward CSV wit no garbage at
the beginning or end, just straight, raw, CSV.

$ psql postgres -c '\copy pg_namespace to stdout csv'
pg_toast,10,
pg_temp_1,10,
pg_catalog,10,"{kleptog=UC/kleptog,=U/kleptog}"
public,10,"{kleptog=UC/kleptog,=UC/kleptog}"
information_schema,10,"{kleptog=UC/kleptog,=U/kleptog}"

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: CSV mode option for pg_dump

From

Greg Stark

Date:

14 June 2006, 12:09:48

"Andrew Dunstan" <andrew@dunslane.net> writes:

> The CSV definition and its lack of formality is a complete red herring in
> this, as are references to Microsoft.

I think then you've missed the real issue.

> The real issue in what you say here is your suggestion that we might not be
> able to reproduce the input in some cases via a COPY CSV round trip. 

No that's not the suggestion. The question is: what do you do when the next
version of Excel comes out and can't parse your CSV files any more? If you
change your format to match then you won't be able to parse old pg_dump files
any more. You'll end up with an option for each csv variant and then how does
the user know which option to use to read a given pg_dump archive?

And before you ask, yes, virtually every version of Excel has changed its csv
file format.

I think the problem here is that pg_dump is serving double duty as a
postgres->postgres tool and some people using it as a kind of batch COPY. What
benefit does it buy you over a script that runs COPY for each table if that's
what you want?

-- 
greg

Re: CSV mode option for pg_dump

From

Andrew Dunstan

Date:

14 June 2006, 12:13:24

Bruce Momjian wrote:

>
>As was stated before, the use-case for this is by people we don't
>normally have contact with.
>
>  
>

I do think we need a use case for what we do.

The main use case seems to me to be where you are exporting a whole 
database or most of it with a very large number of tables, and it is 
convenient to have all the CSVs created for you rather than have to make 
them manually one at a time. You could get these out of, say, a tar 
format dump very easily.

I said near the beginning of this that a pgfoundry project to create a 
tool for this might be an alternative way to go. If that's the consensus 
then Ok. I just bristle a bit at the suggestion that we might not get 
back what we started with from a CSV dump, because we can, AFAIK.

cheers

andrew

Re: CSV mode option for pg_dump

From

Bruce Momjian

Date:

14 June 2006, 12:56:04

Andrew Dunstan wrote:
> 
> 
> Bruce Momjian wrote:
> 
> >
> >As was stated before, the use-case for this is by people we don't
> >normally have contact with.
> >
> >  
> >
> 
> I do think we need a use case for what we do.
> 
> The main use case seems to me to be where you are exporting a whole 
> database or most of it with a very large number of tables, and it is 
> convenient to have all the CSVs created for you rather than have to make 
> them manually one at a time. You could get these out of, say, a tar 
> format dump very easily.
> 
> I said near the beginning of this that a pgfoundry project to create a 
> tool for this might be an alternative way to go. If that's the consensus 
> then Ok. I just bristle a bit at the suggestion that we might not get 
> back what we started with from a CSV dump, because we can, AFAIK.

For me, the use case would be, what format do I want a dump in if it is
for long-term storage?  Do I want it in a PostgreSQL-native format, or
in a more universal format that can be loaded into PostgreSQL tomorrow,
and perhaps loaded into some other database, with modification, ten
years from now.

I just had that issue on my home system for file system backups, going
from cpio to ustar (POSIX.1-1988 / IEEE Std1003.2), but it seems that
POSIX.1-2001 would be best if my operating system supported it. 
(Perhaps cpio was better?)

Anyway, I never thought there would be a large demand for COPY CSV, but
obviously there is, so I have concluded that other people's environment
and skills are different enough from my own that I am willing to accept
the idea there is a use case when I don't understand it.  I will let the
people who work in those environments make that decision.

--  Bruce Momjian   http://candle.pha.pa.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: CSV mode option for pg_dump

From

Andrew Dunstan

Date:

14 June 2006, 13:04:17

Greg Stark wrote:

>And before you ask, yes, virtually every version of Excel has changed its csv
>file format.
>
>
>  
>

I'd be interested to hear of Excel variants we haven't catered for - our 
CSV parsing is some of the most flexible and complete around, IMNSHO.  
BTW, I suspect we won't see too much movement on this front in future, 
given the moves on the XML front.

cheers

andrew

Re: CSV mode option for pg_dump

From

Andrew Dunstan

Date:

14 June 2006, 19:27:04

I wrote:

>
> The main use case seems to me to be where you are exporting a whole 
> database or most of it with a very large number of tables, and it is 
> convenient to have all the CSVs created for you rather than have to 
> make them manually one at a time. You could get these out of, say, a 
> tar format dump very easily.
>
>

I just noticed that the data members all have \. and some blank lines at 
the end, so wash that out.

We now return you to normal -hacking

cheers

andrew