Thread: CSV mode option for pg_dump
Something someone said on IRC just now triggered a little memory ... I think we should provide an option to have pg_dump work in CSV mode rather than text mode. This probably doesn't have much importance in the case of text dumps, but in custom or tar dumps where you might want to get at individual data members, having an option for CSVs that you want to load into some other product might be nice. This should be a pretty low cost item, I expect (good newbie project?) thoughts? cheers andrew
Andrew Dunstan wrote: > > Something someone said on IRC just now triggered a little memory ... I > think we should provide an option to have pg_dump work in CSV mode > rather than text mode. This probably doesn't have much importance in the > case of text dumps, but in custom or tar dumps where you might want to > get at individual data members, having an option for CSVs that you want > to load into some other product might be nice. > > This should be a pretty low cost item, I expect (good newbie project?) Uhh... just about any application that can import CSV can import our dumps. It just tell it the delimiter is a tab. Joshua D. Drake > > thoughts? > > cheers > > andrew > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/
Joshua D. Drake wrote: > Andrew Dunstan wrote: >> >> Something someone said on IRC just now triggered a little memory ... >> I think we should provide an option to have pg_dump work in CSV mode >> rather than text mode. This probably doesn't have much importance in >> the case of text dumps, but in custom or tar dumps where you might >> want to get at individual data members, having an option for CSVs >> that you want to load into some other product might be nice. >> >> This should be a pretty low cost item, I expect (good newbie project?) > > Uhh... just about any application that can import CSV can import our > dumps. It just tell it the delimiter is a tab. > No it won't, not if there are tabs in the data. Why do you think we did CSV in the first place? Precisely because our dump does *not* work as a general export mechanism for arbitrary data. cheers andrew
Josh, > > This should be a pretty low cost item, I expect (good newbie project?) > > Uhh... just about any application that can import CSV can import our > dumps. It just tell it the delimiter is a tab. Actually, there was an Summer of Code applcation to do this, but with all the other nifty stuff it wasn't accepted -- partly because we weren't sure that we wanted a CSV mode for dumps, partly because this should be a weekend fix, not a 3-month project. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco
Andrew Dunstan <andrew@dunslane.net> writes: > Something someone said on IRC just now triggered a little memory ... I > think we should provide an option to have pg_dump work in CSV mode > rather than text mode. This probably doesn't have much importance in the > case of text dumps, but in custom or tar dumps where you might want to > get at individual data members, having an option for CSVs that you want > to load into some other product might be nice. This is silly. You'd just COPY the particular table you want, not use pg_dump. pg_dump's already got an unreasonably large number of options without adding ones that have essentially zero use. Also, I think there are sufficient grounds to worry about whether a CSV dump would always reload correctly --- we already know that that's a poorly thought out "standard". regards, tom lane
> No it won't, not if there are tabs in the data. > <snipping noise> Hmmm then would just double quoting the data work? At least in OOCalc (and IIRC Excel) there is the ability to select a text delimiter. Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/
Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: > >> Something someone said on IRC just now triggered a little memory ... I >> think we should provide an option to have pg_dump work in CSV mode >> rather than text mode. This probably doesn't have much importance in the >> case of text dumps, but in custom or tar dumps where you might want to >> get at individual data members, having an option for CSVs that you want >> to load into some other product might be nice. >> > > This is silly. You'd just COPY the particular table you want, not use > pg_dump. pg_dump's already got an unreasonably large number of options > without adding ones that have essentially zero use. Also, I think there > are sufficient grounds to worry about whether a CSV dump would always > reload correctly --- we already know that that's a poorly thought out > "standard". > > > Well, if you have dozens or hundreds of tables it might well be more convenient. As for not reloading - I went to some trouble to make sure that we could reload what we dumped, exactly, unless the force options are used. I might have made a bug in that, but it isn't dependent on the particular CSV format used. Naturally you won't have a use for it, but I suspect others might (in which case they had better speak up ;-) ) I suppose the alternative would be to write a little tool in perl or whatever to do the same thing for you. Maybe a good pgfoundry project. cheers andrew
On Mon, Jun 12, 2006 at 02:15:59PM -0400, Andrew Dunstan wrote: > Naturally you won't have a use for it, but I suspect others might (in > which case they had better speak up ;-) ) I'd bet that those who would find this useful are far more likely to be on -general and not in here. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Here's me speaking up -- I'd definitely use it! As a quick way to pull data into Excel to do basic reports or analysis, a CSV format would be great. Some of our users currently pull data into Excel for quickie analysis, but creating fixed-width data via psql requires them to parse the data and dumping anything via pg_dump with any delimiter (tabs, etc.) usually doesn't work due to the delimiters being embedded in the real data. - Bill > -----Original Message----- > From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of > Andrew Dunstan > Sent: Monday, June 12, 2006 2:16 PM > To: Tom Lane > Cc: PG Hackers > Subject: Re: [HACKERS] CSV mode option for pg_dump > > > Tom Lane wrote: > > Andrew Dunstan <andrew@dunslane.net> writes: > > > >> Something someone said on IRC just now triggered a little > memory ... > >> I > >> think we should provide an option to have pg_dump work in CSV mode > >> rather than text mode. This probably doesn't have much > importance in the > >> case of text dumps, but in custom or tar dumps where you > might want to > >> get at individual data members, having an option for CSVs > that you want > >> to load into some other product might be nice. > >> > > > > This is silly. You'd just COPY the particular table you > want, not use > > pg_dump. pg_dump's already got an unreasonably large number of > > options without adding ones that have essentially zero use. > Also, I > > think there are sufficient grounds to worry about whether a > CSV dump > > would always reload correctly --- we already know that > that's a poorly > > thought out "standard". > > > > > > > > Well, if you have dozens or hundreds of tables it might well be more > convenient. > > As for not reloading - I went to some trouble to make sure > that we could > reload what we dumped, exactly, unless the force options are used. I > might have made a bug in that, but it isn't dependent on the > particular > CSV format used. > > Naturally you won't have a use for it, but I suspect others might (in > which case they had better speak up ;-) ) > > I suppose the alternative would be to write a little tool in perl or > whatever to do the same thing for you. Maybe a good pgfoundry project. > > cheers > > andrew > > > ---------------------------(end of > broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings >
Bill Bartlett wrote: > Here's me speaking up -- I'd definitely use it! As a quick way to pull > data into Excel to do basic reports or analysis, a CSV format would be > great. Why not just use ODBC? Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/
Can't -- the main production database is over at a CoLo site with access only available via SSH, and tightly-restricted SSH at that. Generally one of the developers will SSH over to the server, pull out whatever data is needed into a text file via psql or pg_dump, scp the file(s) back here and send them to the user. We're working on implementing a full-blown reporting system to eliminate some of this need, but it's not quite there yet. However, CSV is a very handy format to use to send data to other people (especially "less technical" users), so even in the future the ability to export to CSV would be handy. - Bill > -----Original Message----- > From: Joshua D. Drake [mailto:jd@commandprompt.com] > Sent: Monday, June 12, 2006 4:15 PM > To: Bill Bartlett > Cc: 'Andrew Dunstan'; 'Tom Lane'; 'PG Hackers' > Subject: Re: [HACKERS] CSV mode option for pg_dump > > > Bill Bartlett wrote: > > Here's me speaking up -- I'd definitely use it! As a > quick way to pull > > data into Excel to do basic reports or analysis, a CSV > format would be > > great. > > Why not just use ODBC? > > Joshua D. Drake > -- > > === The PostgreSQL Company: Command Prompt, Inc. === > Sales/Support: +1.503.667.4564 || 24x7/Emergency: > +1.800.492.2240 > Providing the most comprehensive PostgreSQL solutions > since 1997 > http://www.commandprompt.com/ > >
Already on TODO: pg_dump: o %Add CSV output format --------------------------------------------------------------------------- Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: > > Something someone said on IRC just now triggered a little memory ... I > > think we should provide an option to have pg_dump work in CSV mode > > rather than text mode. This probably doesn't have much importance in the > > case of text dumps, but in custom or tar dumps where you might want to > > get at individual data members, having an option for CSVs that you want > > to load into some other product might be nice. > > This is silly. You'd just COPY the particular table you want, not use > pg_dump. pg_dump's already got an unreasonably large number of options > without adding ones that have essentially zero use. Also, I think there > are sufficient grounds to worry about whether a CSV dump would always > reload correctly --- we already know that that's a poorly thought out > "standard". > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > -- Bruce Momjian http://candle.pha.pa.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bill Bartlett wrote: > Can't -- the main production database is over at a CoLo site with access > only available via SSH, and tightly-restricted SSH at that. Generally > one of the developers will SSH over to the server, pull out whatever > data is needed into a text file via psql or pg_dump, scp the file(s) > back here and send them to the user. ODBC over an SSH tunnnel?
Matthew T. OConnor said: > Bill Bartlett wrote: >> Can't -- the main production database is over at a CoLo site with >> access only available via SSH, and tightly-restricted SSH at that. >> Generally one of the developers will SSH over to the server, pull out >> whatever data is needed into a text file via psql or pg_dump, scp the >> file(s) back here and send them to the user. > > ODBC over an SSH tunnnel? I wish I could understand why people are so keen to make other people turn handsprings in order to avoid a feature which, as Bruce points out, is already on the TODO list, and which, by my 10 minute analysis, would involve almost trivial code impact and risk. If this involved major impact I might understand, but it really doesn't. I know many people work in a Postgres only world. I wish everybody did, and then we could just forget about things like CSV. They don't, so we can't. I think I have said this before, but I'll say it again. From time to time people thank me for things I have done for Postgres. The two things that stand out BY FAR on the list of these are CSV import/export and dollar quoting. This is a widely used feature. cheers andrew
Andrew Dunstan wrote: > Matthew T. OConnor said: > > Bill Bartlett wrote: > >> Can't -- the main production database is over at a CoLo site with > >> access only available via SSH, and tightly-restricted SSH at that. > >> Generally one of the developers will SSH over to the server, pull out > >> whatever data is needed into a text file via psql or pg_dump, scp the > >> file(s) back here and send them to the user. > > > > ODBC over an SSH tunnnel? > > > I wish I could understand why people are so keen to make other people turn > handsprings in order to avoid a feature which, as Bruce points out, is > already on the TODO list, and which, by my 10 minute analysis, would involve > almost trivial code impact and risk. If this involved major impact I might > understand, but it really doesn't. > > I know many people work in a Postgres only world. I wish everybody did, and > then we could just forget about things like CSV. They don't, so we can't. > > I think I have said this before, but I'll say it again. From time to time > people thank me for things I have done for Postgres. The two things that > stand out BY FAR on the list of these are CSV import/export and dollar > quoting. This is a widely used feature. I think the bottom line is that "ease of use" isn't as high enough on the project's priority list as you (and others) think it should be. I personally feel as you do that we should value "ease of use" more. -- Bruce Momjian http://candle.pha.pa.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
"Andrew Dunstan" <andrew@dunslane.net> writes: > I wish I could understand why people are so keen to make other people turn > handsprings in order to avoid a feature which, as Bruce points out, is > already on the TODO list, and which, by my 10 minute analysis, would involve > almost trivial code impact and risk. If this involved major impact I might > understand, but it really doesn't. Supporting all of the CSV options in pg_dump would involve major bloat in its option set, and it already has far too many options. If it were just a matter of adding a "--csv" switch I wouldn't be complaining, but there are half a dozen more sub-options, and it seems like every time we turn around someone is finding a reason for another one. Propagating all that cruft through pg_dump would be a PITA, and updating it to track future additions would be too. Furthermore, the entire rationale for the feature is predicated on the claim that programs other than pg_restore might find it useful. But this conveniently ignores the fact that if there are any such programs in existence, what this will really do is BREAK them, because they won't be able to cope with all the variants that pass for CSV. My opinions would be less negative if I thought that CSV were a well-defined format that would never change. I don't believe that it has either property, however, and so I'm against letting it get into our dump file format. I think we'll just live to regret it if we do. regards, tom lane
Good point. The number of CSV options would be hard to support for pg_dump. Any thoughts from anyone on how to do that cleanly? Could we just support the default behavior? --------------------------------------------------------------------------- Tom Lane wrote: > "Andrew Dunstan" <andrew@dunslane.net> writes: > > I wish I could understand why people are so keen to make other people turn > > handsprings in order to avoid a feature which, as Bruce points out, is > > already on the TODO list, and which, by my 10 minute analysis, would involve > > almost trivial code impact and risk. If this involved major impact I might > > understand, but it really doesn't. > > Supporting all of the CSV options in pg_dump would involve major bloat > in its option set, and it already has far too many options. If it were > just a matter of adding a "--csv" switch I wouldn't be complaining, but > there are half a dozen more sub-options, and it seems like every time we > turn around someone is finding a reason for another one. Propagating > all that cruft through pg_dump would be a PITA, and updating it to track > future additions would be too. > > Furthermore, the entire rationale for the feature is predicated on the > claim that programs other than pg_restore might find it useful. But > this conveniently ignores the fact that if there are any such programs > in existence, what this will really do is BREAK them, because they won't > be able to cope with all the variants that pass for CSV. > > My opinions would be less negative if I thought that CSV were a > well-defined format that would never change. I don't believe that it > has either property, however, and so I'm against letting it get into our > dump file format. I think we'll just live to regret it if we do. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match > -- Bruce Momjian http://candle.pha.pa.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Tue, Jun 13, 2006 at 10:20:53AM -0400, Bruce Momjian wrote: > > Good point. The number of CSV options would be hard to support for > pg_dump. Any thoughts from anyone on how to do that cleanly? Could we > just support the default behavior? What this tells me is that we need a tool somewhere between psql and pg_dump, say, pgquery. It's sole purpose in life is to generate output from various queries. Because it's a seperate tool there's no question of psql or pg_dump being able to parse them. While you're at it, you could add modules to support many different output styles, like CSV, XML, Excel format, HTML, etc. This I beleive would take the load off psql to provide many different output styles, as well as the load off pg_dump to produce parsable-by-third-party output. Thoughts? Side note: In my experience Excel happily slurps up tab delimited output, so I'm not sure why all of this is an issue in the first place. Have a ncie day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
On Jun 13 10:20, Bruce Momjian wrote: > > Good point. The number of CSV options would be hard to support for > pg_dump. Any thoughts from anyone on how to do that cleanly? Could we > just support the default behavior? IMHO, it might be better if we'd support a syntax like pg_dump --csv=opt0,para0:opt2,opt3 This can save us from the pg_dump parameter pollution a little bit. Furthermore, I think CSV format for the dump files can be maintained better under an external project. (pgFoundry?) By this way, main developers will be able to cope with their own core problems while other users/developers can contribute on the CSV code easily. And if any user will ever want to get CSV functionality in the pg_dump, he/she will just issue a --csv parameter (with the above syntax) and pg_dump will make a suitable dlopen() call for the related (CSV) module. Anyway, this is just an idea for modularity; but the main thing I try to underline is to give pg_dump a module functionality for similar problems. Regards.
Tom Lane wrote: >there are half a dozen more sub-options, and it seems like every time we >turn around someone is finding a reason for another one. > > This is a bit unfair. The feature was introduced in 8.0, and slightly enhanced in 8.1. There have not been any additional CSV features this release cycle unless my memory is worse than I thought, and I at least have said previously that I will be resistant to the addition of further CSV options. My thoughts regarding options for pg_dump was actually to provide a much smaller set than the full set available with COPY, specifically to provide for using a single rather than a double quote char, and optional header lines - no alternate escape or delimiter, and no FORCE QUOTE (FORCE NOT NULL isn't relevant as it is only useful for non-postgres derived data). At least that would be a reasonable starting point, and would I believe cater for the vast majority of uses. cheers andrew
Bruce Momjian wrote: > Good point. The number of CSV options would be hard to support for > pg_dump. Any thoughts from anyone on how to do that cleanly? Could we > just support the default behavior? Although I don't see a real need for the feature, I do think that if we were to support "1" (well two if you include the already tab delimited) csv output it would be a large amount of bloat. Perhaps we could pick "1" output, say comma delimted with quoted fields? "foo","bar ","baz" Joshua D. Drake > > --------------------------------------------------------------------------- > > Tom Lane wrote: >> "Andrew Dunstan" <andrew@dunslane.net> writes: >>> I wish I could understand why people are so keen to make other people turn >>> handsprings in order to avoid a feature which, as Bruce points out, is >>> already on the TODO list, and which, by my 10 minute analysis, would involve >>> almost trivial code impact and risk. If this involved major impact I might >>> understand, but it really doesn't. >> Supporting all of the CSV options in pg_dump would involve major bloat >> in its option set, and it already has far too many options. If it were >> just a matter of adding a "--csv" switch I wouldn't be complaining, but >> there are half a dozen more sub-options, and it seems like every time we >> turn around someone is finding a reason for another one. Propagating >> all that cruft through pg_dump would be a PITA, and updating it to track >> future additions would be too. >> >> Furthermore, the entire rationale for the feature is predicated on the >> claim that programs other than pg_restore might find it useful. But >> this conveniently ignores the fact that if there are any such programs >> in existence, what this will really do is BREAK them, because they won't >> be able to cope with all the variants that pass for CSV. >> >> My opinions would be less negative if I thought that CSV were a >> well-defined format that would never change. I don't believe that it >> has either property, however, and so I'm against letting it get into our >> dump file format. I think we'll just live to regret it if we do. >> >> regards, tom lane >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 9: In versions below 8.0, the planner will ignore your desire to >> choose an index scan if your joining column's datatypes do not >> match >> > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/
Volkan YAZICI wrote: >On Jun 13 10:20, Bruce Momjian wrote: > > >>Good point. The number of CSV options would be hard to support for >>pg_dump. Any thoughts from anyone on how to do that cleanly? Could we >>just support the default behavior? >> >> > >IMHO, it might be better if we'd support a syntax like > > pg_dump --csv=opt0,para0:opt2,opt3 > >This can save us from the pg_dump parameter pollution a little bit. > >Furthermore, I think CSV format for the dump files can be maintained >better under an external project. (pgFoundry?) By this way, main >developers will be able to cope with their own core problems while >other users/developers can contribute on the CSV code easily. And if >any user will ever want to get CSV functionality in the pg_dump, >he/she will just issue a --csv parameter (with the above syntax) and >pg_dump will make a suitable dlopen() call for the related (CSV) >module. Anyway, this is just an idea for modularity; but the main >thing I try to underline is to give pg_dump a module functionality for >similar problems. > > > There are some problems with this, though: . FORCE QUOTE is table specific, and COPY will barf if you name a column that isn't on the table. Providing for this option would involve lots more code in pg_dump, as we'd have to filter the list according to the column names in each table. . specifying arbitrary chars for quote, escape and delimiter could be tricky from the command line, especially if you want to specify a tab delimiter or backslash escape. cheers andrew
Martijn van Oosterhout wrote: > >Side note: In my experience Excel happily slurps up tab delimited >output, so I'm not sure why all of this is an issue in the first place. > > > I guess you experience doesn't run to data that has embedded tabs, for example. There really is a reason we did this in the first place, and it wasn't for fun. cheers andrew
On Mon, 2006-06-12 at 16:28 -0400, Bill Bartlett wrote: > Can't -- the main production database is over at a CoLo site with access > only available via SSH, and tightly-restricted SSH at that. Generally > one of the developers will SSH over to the server, pull out whatever > data is needed into a text file via psql or pg_dump, scp the file(s) > back here and send them to the user. I don't get it. If you can use psql then you already have csv support. psql -c 'COPY pg_class TO STDOUT WITH CSV' postgres > pg_class.csv > > -----Original Message----- > > From: Joshua D. Drake [mailto:jd@commandprompt.com] > > Sent: Monday, June 12, 2006 4:15 PM > > To: Bill Bartlett > > Cc: 'Andrew Dunstan'; 'Tom Lane'; 'PG Hackers' > > Subject: Re: [HACKERS] CSV mode option for pg_dump > > > > > > Bill Bartlett wrote: > > > Here's me speaking up -- I'd definitely use it! As a > > quick way to pull > > > data into Excel to do basic reports or analysis, a CSV > > format would be > > > great. > > > > Why not just use ODBC? > > > > Joshua D. Drake > > -- > > > > === The PostgreSQL Company: Command Prompt, Inc. === > > Sales/Support: +1.503.667.4564 || 24x7/Emergency: > > +1.800.492.2240 > > Providing the most comprehensive PostgreSQL solutions > > since 1997 > > http://www.commandprompt.com/ > > > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > --
On Jun 13, 2006, at 7:34 AM, Martijn van Oosterhout wrote: > > What this tells me is that we need a tool somewhere between psql and > pg_dump, say, pgquery. It's sole purpose in life is to generate output > from various queries. Because it's a seperate tool there's no question > of psql or pg_dump being able to parse them. > > While you're at it, you could add modules to support many different > output styles, like CSV, XML, Excel format, HTML, etc. > > This I beleive would take the load off psql to provide many different > output styles, as well as the load off pg_dump to produce > parsable-by-third-party output. > > Thoughts? Perl+DBD::Pg+CPAN does almost all of this already. Lots of support for countless different output formats, and mostly fairly well battle- tested. I suspect that a perl script to do all that would be dominated by commandline option parsing, as all the hard work is in existing modules. Would that be adequate, or do we really want to reimplement and maintain all the output format complexity in our own code, in C? Cheers, Steve
Rod Taylor wrote: > On Mon, 2006-06-12 at 16:28 -0400, Bill Bartlett wrote: >> Can't -- the main production database is over at a CoLo site with access >> only available via SSH, and tightly-restricted SSH at that. Generally >> one of the developers will SSH over to the server, pull out whatever >> data is needed into a text file via psql or pg_dump, scp the file(s) >> back here and send them to the user. > > I don't get it. If you can use psql then you already have csv support. > > psql -c 'COPY pg_class TO STDOUT WITH CSV' postgres > pg_class.csv If you data looks like this: foo barbaz bing You are o.k. You have three columns, tab delimited. However if you data looks like this: foo bar baz bing You have a problem. foo is one column bar and baz are a single column bing is a single column How does excel know that bar baz is a single column? It doesn't because you told it to delimit on tabs and thus you have four columns as far as Excel is concerned. An alternative although I don't know what kind of headaches it would cause is to have a text delimiter as well as a field delimter, e.g; "foo" "bar baz" "bing" Sincerely, Joshua D. Drake > >>> -----Original Message----- >>> From: Joshua D. Drake [mailto:jd@commandprompt.com] >>> Sent: Monday, June 12, 2006 4:15 PM >>> To: Bill Bartlett >>> Cc: 'Andrew Dunstan'; 'Tom Lane'; 'PG Hackers' >>> Subject: Re: [HACKERS] CSV mode option for pg_dump >>> >>> >>> Bill Bartlett wrote: >>>> Here's me speaking up -- I'd definitely use it! As a >>> quick way to pull >>>> data into Excel to do basic reports or analysis, a CSV >>> format would be >>>> great. >>> Why not just use ODBC? >>> >>> Joshua D. Drake >>> -- >>> >>> === The PostgreSQL Company: Command Prompt, Inc. === >>> Sales/Support: +1.503.667.4564 || 24x7/Emergency: >>> +1.800.492.2240 >>> Providing the most comprehensive PostgreSQL solutions >>> since 1997 >>> http://www.commandprompt.com/ >>> >>> >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 1: if posting/reading through Usenet, please send an appropriate >> subscribe-nomail command to majordomo@postgresql.org so that your >> message can get through to the mailing list cleanly >> -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/
From what I gather, the CSV format dump would only contain data.I think pg_dump is the friend of pg_restore. It dumps everything including user defined functions, types, schemas etc. CSV does not fit with this. Besides, people will probably want to dump into CSV the result of any query, to load it into excel, not just the full contents of a table. So, why not create a separate tool, someone suggested pg_query for that, I second it.This tool would take a query and format options, and would output a file in whatever format chosen by the user (CSV, COPY format, xml, whatever) A script language (python) can be used, which will significantly shorten development times and allow easy modularity, as it is easier to add a module to a python program than a C program.I would vote for Python because I love it and it has a very good postgres adapter (psycopg2) which knows how to convers every postgres type to a native language type (yes, even multidimensional arrays of BOX get converted). And it's really fast at retrieving large volumes of data. So you have a stable, fast tool for backup and restore (pg_dump) and a rapidly evolving, user-friendly and extendable tool for exporting data, and everyone is happy. Mr Momijan talks about adding modular functionality to pg_dump. Is it really necessary ? What is the objective ? Is it to reuse code in pg_dump ? I guess not ; if a user wants to dump, for instance, all the tables in a schema, implementing this logic in python is only a few lines of code (select from information_schema...) To be realistic, output format modules should be written in script languages. Noone sane is eager to do string manipulation in C. Thus these modules would have to somehow fit with pg_dump, maybe with a pipe or something. This means designing another protocol. Reimplementing in a scripting langage the parts of pg_dump which will be reused by this project (mainly, enumerating tables and stuff) will be far easier. Just look. Python 2.4.2 (#1, Mar 30 2006, 14:34:35) [GCC 3.4.4 (Gentoo 3.4.4-r1, ssp-3.4.4-1.0, pie-8.7.8)] on linux2 Type "help", "copyright", "credits" or "license" for more information. ...opens a db connection... >>> c.execute( "SELECT * FROM test.csv" ) >>> data = c.fetchall() >>> data [[1, datetime.date(2006, 6, 13), 'this\tcontains\ttabulations'], [2, datetime.date(2006, 6, 13), "this'contains'quotes"], [3, datetime.date(2006, 6, 13), 'this"contains"double quotes']] >>> import csv, sys >>> c = csv.writer( sys.stdout, dialect = csv.excel ) >>> c.writerows( data ) 1,2006-06-13,this contains tabulations 2,2006-06-13,this'contains'quotes 3,2006-06-13,"this""contains""double quotes"
PFC wrote: > > From what I gather, the CSV format dump would only contain data. > I think pg_dump is the friend of pg_restore. It dumps everything > including user defined functions, types, schemas etc. CSV does not > fit with this. > > This is just nonsense. There is not the slightest reason that CSV data cannot be embedded in a text dump nor exist as the data members of a custom or tar dump with the corresponding COPY statements modified accordingly. Really, let's get the facts straight, please. cheers andrew
> > Would that be adequate, or do we really want to reimplement and maintain > all > the output format complexity in our own code, in C? I think the point is that we should provide a native implementation because not everyone is crazy enough to use perl (blatant jab ;)). I would never expect a customer to write a perl or python script just to get their data in what is widely considered a standard business format that can be imported by their userland application. The people on the hackers list, are NOT the target for this feature. The people on general, admin and novice are. Sincerely, Joshua D. Drake > > Cheers, > Steve > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/
> From: Rod Taylor [mailto:pg@rbt.ca] > Sent: Tuesday, June 13, 2006 11:31 AM > > > On Mon, 2006-06-12 at 16:28 -0400, Bill Bartlett wrote: > > Can't -- the main production database is over at a CoLo site with > > access only available via SSH, and tightly-restricted SSH at that. > > Generally one of the developers will SSH over to the > server, pull out > > whatever data is needed into a text file via psql or > pg_dump, scp the > > file(s) back here and send them to the user. > > I don't get it. If you can use psql then you already have csv support. > > psql -c 'COPY pg_class TO STDOUT WITH CSV' postgres > pg_class.csv Ah - RTF-UPDATED-M on my part. Most of my systems are still in PG 7.4.x databases so I tend to stick with the 7.x docs, and I didn't notice the "WITH CSV" option added in 8.0. That, plus temp tables, will be very useful. However, I also agree with the need for a new "pg_query / pg_export" program. A program geared solely towards exporting the results of a query would allow many of the options that are needed for the ever-growing variety of output formats (XML, CSV, HTML, XHTML, etc.) and details for each format without needing to clutter up pg_dump with things that really having nothing to do with backing up and restoring data. It could also allow a large range of options related to getting data out (where, order by), many of which have also been discussed for pg_dump recently. - Bill > > > > -----Original Message----- > > > From: Joshua D. Drake [mailto:jd@commandprompt.com] > > > Sent: Monday, June 12, 2006 4:15 PM > > > To: Bill Bartlett > > > Cc: 'Andrew Dunstan'; 'Tom Lane'; 'PG Hackers' > > > Subject: Re: [HACKERS] CSV mode option for pg_dump > > > > > > > > > Bill Bartlett wrote: > > > > Here's me speaking up -- I'd definitely use it! As a > > > quick way to pull > > > > data into Excel to do basic reports or analysis, a CSV > > > format would be > > > > great. > > > > > > Why not just use ODBC? > > > > > > Joshua D. Drake > > > -- > > > > > > === The PostgreSQL Company: Command Prompt, Inc. === > > > Sales/Support: +1.503.667.4564 || 24x7/Emergency: > > > +1.800.492.2240 > > > Providing the most comprehensive PostgreSQL solutions > > > since 1997 > > > http://www.commandprompt.com/ > > > > > > > -- >
On Jun 13, 2006, at 9:47 AM, Joshua D. Drake wrote: >> Would that be adequate, or do we really want to reimplement and >> maintain all >> the output format complexity in our own code, in C? > > I think the point is that we should provide a native implementation > because not everyone is crazy enough to use perl (blatant jab ;)). > I would never expect a customer to write a perl or python script > just to get their data in what is widely considered a standard > business format that can be imported by their userland application. That wasn't what I was discussing, on two levels. Firstly, I wasn't suggesting that the end user write anything, secondly I was talking about the other output formats discussed (Excel, HTML...) rather than just CSV. > > The people on the hackers list, are NOT the target for this > feature. The people on general, admin and novice are. I was referring to the other export formats mentioned (Excel, HTML...). We already support CSV export of single tables with the bundled software, don't we? Cheers, Steve
Steve Atkins wrote: > > Would that be adequate, or do we really want to reimplement and > maintain all > the output format complexity in our own code, in C? > > Code to produce CSVs is there, now, today, and has been since 8.0. That is *not* what is at issue here. If you want to debate whether or not it should be in the backend you are 2 years too late. The ONLY question here is about whether or not to have it enabled as an option in pg_dump. try these and see the interesting results: COPY pg_class TO '/tmp/pg_class.csv' CSV; or in psql \copy pg_class to '/tmp/pg_class.csv' csv As for XML which was also mentioned, you should be aware that there is a Google Summer of Code project to implement SQL/XML, so watch this space. (As for why that needs to be in the backend, see previous discussions) cheers andrew
pg_dump CSV TODO item removed until we come up with something everyone can agree on. --------------------------------------------------------------------------- Joshua D. Drake wrote: > > > > Would that be adequate, or do we really want to reimplement and maintain > > all > > the output format complexity in our own code, in C? > > I think the point is that we should provide a native implementation > because not everyone is crazy enough to use perl (blatant jab ;)). I > would never expect a customer to write a perl or python script just to > get their data in what is widely considered a standard business format > that can be imported by their userland application. > > The people on the hackers list, are NOT the target for this feature. The > people on general, admin and novice are. > > Sincerely, > > Joshua D. Drake > > > > > > > > Cheers, > > Steve > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 4: Have you searched our list archives? > > > > http://archives.postgresql.org > > > > > -- > > === The PostgreSQL Company: Command Prompt, Inc. === > Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 > Providing the most comprehensive PostgreSQL solutions since 1997 > http://www.commandprompt.com/ > > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings > -- Bruce Momjian http://candle.pha.pa.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote: >pg_dump CSV TODO item removed until we come up with something everyone >can agree on. > > > That's a pity. Just to show you how little is involved in what I was suggesting, a prototype patch is attached - it's 182 lines of context diff, which is pretty small for a new feature. It took me about an hour to write and I have tested it against the regression db in both text and binary dump modes, where it works without a hitch. cheers andrew Index: src/bin/pg_dump/pg_dump.c =================================================================== RCS file: /cvsroot/pgsql/src/bin/pg_dump/pg_dump.c,v retrieving revision 1.438 diff -c -r1.438 pg_dump.c *** src/bin/pg_dump/pg_dump.c 9 Jun 2006 19:46:09 -0000 1.438 --- src/bin/pg_dump/pg_dump.c 14 Jun 2006 00:32:03 -0000 *************** *** 113,118 **** --- 113,123 ---- /* flag to turn on/off dollar quoting */ static int disable_dollar_quoting = 0; + /* flag to control if using CSv */ + static bool use_csv = false; + + /* holder for CSV options */ + static PQExpBuffer csv_opts; static void help(const char *progname); static NamespaceInfo *findNamespace(Oid nsoid, Oid objoid); *************** *** 251,256 **** --- 256,265 ---- {"disable-triggers", no_argument, &disable_triggers, 1}, {"use-set-session-authorization", no_argument, &use_setsessauth, 1}, + /* long options with no short version */ + {"csv", no_argument, NULL ,2}, + {"csv-option",required_argument, NULL, 3}, + {NULL, 0, NULL, 0} }; int optindex; *************** *** 285,290 **** --- 294,301 ---- } } + csv_opts = createPQExpBuffer(); + while ((c = getopt_long(argc, argv, "abcCdDE:f:F:h:in:oOp:RsS:t:uU:vWxX:Z:", long_options, &optindex)) != -1) { *************** *** 419,424 **** --- 430,462 ---- break; /* This covers the long options equivalent to -X xxx. */ + case 2: /* csv */ + use_csv = true; + break; + + case 3: /* csv-option */ + if (strcmp(optarg, "singlequote") == 0) + { + appendPQExpBuffer(csv_opts,"QUOTE AS '''' "); + } + else if (strcmp(optarg, "tabdelimiter") == 0) + { + appendPQExpBuffer(csv_opts,"DELIMITER AS E'\\t' "); + } + else if (strcmp(optarg, "header") == 0) + { + appendPQExpBuffer(csv_opts,"HEADER "); + } + else + { + fprintf(stderr, + _("%s: invalid csv option -- %s\n"), + progname, optarg); + fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname); + exit(1); + } + + case 0: break; *************** *** 463,468 **** --- 501,518 ---- exit(1); } + if ( use_csv == true && dumpInserts == true) + { + write_msg(NULL, "INSERT (-d, -D) and CSV (--csv) options cannot be used together\n"); + exit(1); + } + + if ( use_csv == false && strlen(csv_opts->data) > 0) + { + write_msg(NULL, "You must specify --csv to use --csv-option\n"); + exit(1); + } + /* open the output file */ switch (format[0]) { *************** *** 714,719 **** --- 764,771 ---- " use SESSION AUTHORIZATION commands instead of\n" " OWNER TO commands\n")); + printf(_(" --csv use CSV mode\n")); + printf(_(" --csv-option=opt one of header, tabdelimiter or singlequote\n")); printf(_("\nConnection options:\n")); printf(_(" -h, --host=HOSTNAME database server host or socket directory\n")); printf(_(" -p, --port=PORT database server port number\n")); *************** *** 881,898 **** if (oids && hasoids) { ! appendPQExpBuffer(q, "COPY %s %s WITH OIDS TO stdout;", fmtQualifiedId(tbinfo->dobj.namespace->dobj.name, classname), column_list); } else { ! appendPQExpBuffer(q, "COPY %s %s TO stdout;", fmtQualifiedId(tbinfo->dobj.namespace->dobj.name, classname), column_list); } res = PQexec(g_conn, q->data); check_sql_result(res, g_conn, q->data, PGRES_COPY_OUT); PQclear(res); --- 933,955 ---- if (oids && hasoids) { ! appendPQExpBuffer(q, "COPY %s %s WITH OIDS TO stdout", fmtQualifiedId(tbinfo->dobj.namespace->dobj.name, classname), column_list); } else { ! appendPQExpBuffer(q, "COPY %s %s TO stdout", fmtQualifiedId(tbinfo->dobj.namespace->dobj.name, classname), column_list); } + if (use_csv) + { + appendPQExpBuffer(q, " CSV %s", csv_opts->data); + } + appendPQExpBuffer(q, ";"); res = PQexec(g_conn, q->data); check_sql_result(res, g_conn, q->data, PGRES_COPY_OUT); PQclear(res); *************** *** 1139,1147 **** /* must use 2 steps here 'cause fmtId is nonreentrant */ appendPQExpBuffer(copyBuf, "COPY %s ", fmtId(tbinfo->dobj.name)); ! appendPQExpBuffer(copyBuf, "%s %sFROM stdin;\n", fmtCopyColumnList(tbinfo), (tdinfo->oids && tbinfo->hasoids) ? "WITH OIDS " : ""); copyStmt = copyBuf->data; } else --- 1196,1208 ---- /* must use 2 steps here 'cause fmtId is nonreentrant */ appendPQExpBuffer(copyBuf, "COPY %s ", fmtId(tbinfo->dobj.name)); ! appendPQExpBuffer(copyBuf, "%s %s", fmtCopyColumnList(tbinfo), (tdinfo->oids && tbinfo->hasoids) ? "WITH OIDS " : ""); + appendPQExpBuffer(copyBuf, "FROM stdin"); + if (use_csv) + appendPQExpBuffer(copyBuf, " CSV %s", csv_opts->data); + appendPQExpBuffer(copyBuf, ";\n"); copyStmt = copyBuf->data; } else
Andrew Dunstan <andrew@dunslane.net> writes: > This is just nonsense. There is not the slightest reason that CSV data > cannot be embedded in a text dump nor exist as the data members of a > custom or tar dump with the corresponding COPY statements modified > accordingly. Well, the really *core* question here is whether we trust the stability of the CSV format definition (and code) enough to want to rely on it for data dump/restore purposes. I'm still a few years away from that, myself. AFAICT the raison d'etre of the CSV code is "emit whatever it takes to satisfy this, that, and the other broken Microsoft application". That's fine as an export tool, but as a dump/reload tool, nyet. If you put it in pg_dump you're just handing neophytes another foot-gun. regards, tom lane
Tom Lane said: > Andrew Dunstan <andrew@dunslane.net> writes: >> This is just nonsense. There is not the slightest reason that CSV data >> cannot be embedded in a text dump nor exist as the data members of a >> custom or tar dump with the corresponding COPY statements modified >> accordingly. > > Well, the really *core* question here is whether we trust the stability > of the CSV format definition (and code) enough to want to rely on it > for data dump/restore purposes. I'm still a few years away from that, > myself. AFAICT the raison d'etre of the CSV code is "emit whatever it > takes to satisfy this, that, and the other broken Microsoft > application". That's fine as an export tool, but as a dump/reload tool, > nyet. If you put it in pg_dump you're just handing neophytes another > foot-gun. > Well, I'm staggered. Really. The CSV definition and its lack of formality is a complete red herring in this, as are references to Microsoft. The real issue in what you say here is your suggestion that we might not be able to reproduce the input in some cases via a COPY CSV round trip. If that is so it's a bug of our (probably my) making, and must be fixed. I assert that it is not so. In fact all the tests I did during development and since were premised on recovering the input exactly. The only CSV option that breaks things in that respect is FORCE NOT NULL, and it is designed for data coming in from a non Postgres source, so it's not relevant here. Even FORCE QUOTE won't break things because it never quotes a null value, and the only semantic significance to us of quoting is the null-ness of the value. If the code is broken then it should be discoverable by test or code analysis. There is no need to refer to any other application or standard. So if you or anyone think there is a case that will not reproduce the data exactly when the same CSV options are used for output and input, I challenge you or them to provide a single example. You say you're a few years away from trusting the code. Well, it's not so huge that it's beyond analysis, and I'll be happy to explain anything that puzzles you. Perhaps more importantly, it has been in use now for 18 months. We discovered one problem with embedded line feeds very early in the 8.0 release cycle, and fixed it. After that I have not heard of a single problem. And I assure you this code is widely used. It probably isn't used much as a round trip mechanism, probably in part because we haven't provided it as a pg_dump option. So maybe we have a chicken/egg scenario here. We do have some round trip regression tests in the copy test, and those can be beefed up if necessary to increase your confidence level. I'm happy to debate details, but general assertions of "we can't trust this code" don't seem worth much to me. cheers andrew
I agree with Andrew. If there are bugs in CSV, then we should fix them, not avoid give users this usability improvement. I know I have analyzed the code and fixed a few problems with it in the past. As far as pg_dump, I am thinking the most flexible solution would be to allow an arbitrary WITH clause to be added to COPY, like pg_dump --copy-with "CSV" This would allow not only CSV control, but specification of non-CSV options if desired. --------------------------------------------------------------------------- Andrew Dunstan wrote: > Tom Lane said: > > Andrew Dunstan <andrew@dunslane.net> writes: > >> This is just nonsense. There is not the slightest reason that CSV data > >> cannot be embedded in a text dump nor exist as the data members of a > >> custom or tar dump with the corresponding COPY statements modified > >> accordingly. > > > > Well, the really *core* question here is whether we trust the stability > > of the CSV format definition (and code) enough to want to rely on it > > for data dump/restore purposes. I'm still a few years away from that, > > myself. AFAICT the raison d'etre of the CSV code is "emit whatever it > > takes to satisfy this, that, and the other broken Microsoft > > application". That's fine as an export tool, but as a dump/reload tool, > > nyet. If you put it in pg_dump you're just handing neophytes another > > foot-gun. > > > > > Well, I'm staggered. Really. > > The CSV definition and its lack of formality is a complete red herring in > this, as are references to Microsoft. > > The real issue in what you say here is your suggestion that we might not be > able to reproduce the input in some cases via a COPY CSV round trip. If that > is so it's a bug of our (probably my) making, and must be fixed. I assert > that it is not so. In fact all the tests I did during development and since > were premised on recovering the input exactly. The only CSV option that > breaks things in that respect is FORCE NOT NULL, and it is designed for data > coming in from a non Postgres source, so it's not relevant here. Even FORCE > QUOTE won't break things because it never quotes a null value, and the only > semantic significance to us of quoting is the null-ness of the value. > > If the code is broken then it should be discoverable by test or code > analysis. There is no need to refer to any other application or standard. So > if you or anyone think there is a case that will not reproduce the data > exactly when the same CSV options are used for output and input, I challenge > you or them to provide a single example. > > You say you're a few years away from trusting the code. Well, it's not so > huge that it's beyond analysis, and I'll be happy to explain anything that > puzzles you. Perhaps more importantly, it has been in use now for 18 months. > We discovered one problem with embedded line feeds very early in the 8.0 > release cycle, and fixed it. After that I have not heard of a single > problem. And I assure you this code is widely used. > > It probably isn't used much as a round trip mechanism, probably in part > because we haven't provided it as a pg_dump option. So maybe we have a > chicken/egg scenario here. We do have some round trip regression tests in > the copy test, and those can be beefed up if necessary to increase your > confidence level. > > I'm happy to debate details, but general assertions of "we can't trust this > code" don't seem worth much to me. > > cheers > > andrew > > > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings > -- Bruce Momjian http://candle.pha.pa.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <pgman@candle.pha.pa.us> writes: > I agree with Andrew. If there are bugs in CSV, then we should fix them, > not avoid give users this usability improvement. The case for it being a usability improvement seems very weak to me; no one has yet demonstrated an actual use-case where someone would pull CSV data out of pg_dump output instead of just dumping the table directly with COPY. Now the "anti" case is admittedly hypothetical: I'm supposing that we will eventually be bitten by portability problems with CSV-style dumps not being loadable into future versions. But given the weak nature of the "pro" case, I think we should be conservative and not take that risk. regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > I agree with Andrew. If there are bugs in CSV, then we should fix them, > > not avoid give users this usability improvement. > > The case for it being a usability improvement seems very weak to me; > no one has yet demonstrated an actual use-case where someone would pull > CSV data out of pg_dump output instead of just dumping the table > directly with COPY. Now the "anti" case is admittedly hypothetical: > I'm supposing that we will eventually be bitten by portability problems > with CSV-style dumps not being loadable into future versions. But given > the weak nature of the "pro" case, I think we should be conservative > and not take that risk. Well, I saw little request for COPY CSV here, but IRC users were reporting a lot of interest, and feedback from the release that added it showed it was a major feature, so just because we haven't see use-case here doesn't mean it doesn't exist. As was stated before, the use-case for this is by people we don't normally have contact with. -- Bruce Momjian http://candle.pha.pa.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <pgman@candle.pha.pa.us> writes: > As was stated before, the use-case for this is by people we don't > normally have contact with. Or who don't exist. Once again: give me a plausible use-case for pulling this data out of a pg_dump output file. You'd need data-massaging code anyway just to extract the data, so why not expect that code to convert to CSV or whatever other format you might want? If you can think of use-cases like this, why do you think the destination format is necessarily CSV and not something else? If it is something else, adding a CSV option to pg_dump makes it *harder* not easier to write that massaging code, because now it's got to cope with N dump formats not one. regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > As was stated before, the use-case for this is by people we don't > > normally have contact with. > > Or who don't exist. > > Once again: give me a plausible use-case for pulling this data out of a > pg_dump output file. You'd need data-massaging code anyway just to > extract the data, so why not expect that code to convert to CSV or > whatever other format you might want? If you can think of use-cases like > this, why do you think the destination format is necessarily CSV and not > something else? If it is something else, adding a CSV option to pg_dump > makes it *harder* not easier to write that massaging code, because now > it's got to cope with N dump formats not one. I don't have to think of a use case. I trusted the people who said we needed CSV, so I trust them again if they say doing pg_dump with CSV would be a good idea. Also, my suggestion of --copy-with would allow CSV and other format modifications with minimal code and complexity. -- Bruce Momjian http://candle.pha.pa.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Tue, Jun 13, 2006 at 09:14:48AM -0700, Joshua D. Drake wrote: > >I don't get it. If you can use psql then you already have csv support. > > > >psql -c 'COPY pg_class TO STDOUT WITH CSV' postgres > pg_class.csv > > If you data looks like this: > > foo barbaz bing > > You are o.k. You have three columns, tab delimited. > > However if you data looks like this: > > foo bar baz bing > > You have a problem. I'm not exactly sure what you're getting at here: postgres=# create temp table x (a text, b text, c text); CREATE TABLE postgres=# insert into x values ('"blah"', 'hello world', 'hmmm postgres'# '); INSERT 0 1 postgres=# \copy x to stdout csv """blah""",hello world,"hmmm " > An alternative although I don't know what kind of headaches it would > cause is to have a text delimiter as well as a field delimter, e.g; Postgresql already delimits CSV fields as required, so I'm not sure what you're asking here... Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
On Wed, Jun 14, 2006 at 10:55:04AM -0400, Bruce Momjian wrote: > > Once again: give me a plausible use-case for pulling this data out of a > > pg_dump output file. You'd need data-massaging code anyway just to > > extract the data, so why not expect that code to convert to CSV or > > whatever other format you might want? If you can think of use-cases like > > this, why do you think the destination format is necessarily CSV and not > > something else? If it is something else, adding a CSV option to pg_dump > > makes it *harder* not easier to write that massaging code, because now > > it's got to cope with N dump formats not one. > > I don't have to think of a use case. I trusted the people who said we > needed CSV, so I trust them again if they say doing pg_dump with CSV > would be a good idea. I think the point is that it's hard to imagine a use case for CSV support in pg_dump that isn't already better served by using psql. If you add it to pg_dump, people have to write special code to extract it anyway, so why can't they handle the tab delimited themselves. OTOH, if you use psql you get straightforward CSV wit no garbage at the beginning or end, just straight, raw, CSV. $ psql postgres -c '\copy pg_namespace to stdout csv' pg_toast,10, pg_temp_1,10, pg_catalog,10,"{kleptog=UC/kleptog,=U/kleptog}" public,10,"{kleptog=UC/kleptog,=UC/kleptog}" information_schema,10,"{kleptog=UC/kleptog,=U/kleptog}" Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
"Andrew Dunstan" <andrew@dunslane.net> writes: > The CSV definition and its lack of formality is a complete red herring in > this, as are references to Microsoft. I think then you've missed the real issue. > The real issue in what you say here is your suggestion that we might not be > able to reproduce the input in some cases via a COPY CSV round trip. No that's not the suggestion. The question is: what do you do when the next version of Excel comes out and can't parse your CSV files any more? If you change your format to match then you won't be able to parse old pg_dump files any more. You'll end up with an option for each csv variant and then how does the user know which option to use to read a given pg_dump archive? And before you ask, yes, virtually every version of Excel has changed its csv file format. I think the problem here is that pg_dump is serving double duty as a postgres->postgres tool and some people using it as a kind of batch COPY. What benefit does it buy you over a script that runs COPY for each table if that's what you want? -- greg
Bruce Momjian wrote: > >As was stated before, the use-case for this is by people we don't >normally have contact with. > > > I do think we need a use case for what we do. The main use case seems to me to be where you are exporting a whole database or most of it with a very large number of tables, and it is convenient to have all the CSVs created for you rather than have to make them manually one at a time. You could get these out of, say, a tar format dump very easily. I said near the beginning of this that a pgfoundry project to create a tool for this might be an alternative way to go. If that's the consensus then Ok. I just bristle a bit at the suggestion that we might not get back what we started with from a CSV dump, because we can, AFAIK. cheers andrew
Andrew Dunstan wrote: > > > Bruce Momjian wrote: > > > > >As was stated before, the use-case for this is by people we don't > >normally have contact with. > > > > > > > > I do think we need a use case for what we do. > > The main use case seems to me to be where you are exporting a whole > database or most of it with a very large number of tables, and it is > convenient to have all the CSVs created for you rather than have to make > them manually one at a time. You could get these out of, say, a tar > format dump very easily. > > I said near the beginning of this that a pgfoundry project to create a > tool for this might be an alternative way to go. If that's the consensus > then Ok. I just bristle a bit at the suggestion that we might not get > back what we started with from a CSV dump, because we can, AFAIK. For me, the use case would be, what format do I want a dump in if it is for long-term storage? Do I want it in a PostgreSQL-native format, or in a more universal format that can be loaded into PostgreSQL tomorrow, and perhaps loaded into some other database, with modification, ten years from now. I just had that issue on my home system for file system backups, going from cpio to ustar (POSIX.1-1988 / IEEE Std1003.2), but it seems that POSIX.1-2001 would be best if my operating system supported it. (Perhaps cpio was better?) Anyway, I never thought there would be a large demand for COPY CSV, but obviously there is, so I have concluded that other people's environment and skills are different enough from my own that I am willing to accept the idea there is a use case when I don't understand it. I will let the people who work in those environments make that decision. -- Bruce Momjian http://candle.pha.pa.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Greg Stark wrote: >And before you ask, yes, virtually every version of Excel has changed its csv >file format. > > > > I'd be interested to hear of Excel variants we haven't catered for - our CSV parsing is some of the most flexible and complete around, IMNSHO. BTW, I suspect we won't see too much movement on this front in future, given the moves on the XML front. cheers andrew
I wrote: > > The main use case seems to me to be where you are exporting a whole > database or most of it with a very large number of tables, and it is > convenient to have all the CSVs created for you rather than have to > make them manually one at a time. You could get these out of, say, a > tar format dump very easily. > > I just noticed that the data members all have \. and some blank lines at the end, so wash that out. We now return you to normal -hacking cheers andrew