Thread: COPY with column headings
Someone just asked about a COPY capability to supply the column headings as the first line of the copy statement. Do we want to support something like that? Does anyone else want such functionality? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Someone just asked about a COPY capability to supply the column headings > as the first line of the copy statement. Do we want to support > something like that? No. Tell him to use SELECT. (Such a capability would be a permanent hazard, because any time you tried to import with the opposite setting from what you'd exported, you'd either get a failure, or bad data, or silent loss of the first line of real data.) regards, tom lane
On Mon, Aug 16, 2004 at 10:53:36AM -0400, Bruce Momjian wrote: > Someone just asked about a COPY capability to supply the column headings > as the first line of the copy statement. Do we want to support > something like that? Does anyone else want such functionality? Wouldn't it be more logical, and more compatible, to keep this information as regular column information in the PGresult returned by the COPY? Or would that clash with the difference between "command" and "query" result handling? Jeroen
Bruce Momjian wrote: >Someone just asked about a COPY capability to supply the column headings >as the first line of the copy statement. Do we want to support >something like that? Does anyone else want such functionality? > > > I had it in mind all along as a possible option for CSV mode, but given the heat that was generated by the minimal implementation we did, I thought I'd let it rest. I certainly think it's worth considering, although I see Tom has objected :-) . Say we have an option called FIRSTLINELABELS, then on copy out it would write the headings on the first line, and on copy in it could just ignore the first line (so it could work symmetrically). cheers andrew
Andrew Dunstan wrote: > > > Bruce Momjian wrote: > > >Someone just asked about a COPY capability to supply the column headings > >as the first line of the copy statement. Do we want to support > >something like that? Does anyone else want such functionality? > > > > > > > > I had it in mind all along as a possible option for CSV mode, but given > the heat that was generated by the minimal implementation we did, I > thought I'd let it rest. > > I certainly think it's worth considering, although I see Tom has > objected :-) . Say we have an option called FIRSTLINELABELS, then on > copy out it would write the headings on the first line, and on copy in > it could just ignore the first line (so it could work symmetrically). Yes, that is what I was thinking. Is this a TODO? Sure it can be misused but most copy options can be similarly misused. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
"Jeroen T. Vermeulen" <jtv@xs4all.nl> writes: > Wouldn't it be more logical, and more compatible, to keep this information > as regular column information in the PGresult returned by the COPY? That would work but would require a protocol change, which this is surely not worth by itself. (We could put it on the to-do list for the next protocol rev though.) The bigger question is whether this would do anything to satisfy the requestor. He probably wants the headings to appear in the file resulting from COPY TO file (or the psql equivalent), which this would not do. Providing the info in the COPY PGresult would change nothing except the behavior of client applications specially rewritten to use it. regards, tom lane
On Mon, Aug 16, 2004 at 11:30:49AM -0400, Tom Lane wrote: > The bigger question is whether this would do anything to satisfy the > requestor. He probably wants the headings to appear in the file > resulting from COPY TO file (or the psql equivalent), which this would > not do. Providing the info in the COPY PGresult would change nothing > except the behavior of client applications specially rewritten to use it. True, and from a compatibility standpoint that would probably be good... Output post-processing (translation to CSV or XML output, XSLT transforms, what have you) IMHO should probably go on top of COPY anyway, rather than into it. Are people really having problems with select * from table where 1=0 ; copy table to stdout and first printing the column names found in the first PGresult, then dumping the COPY lines to the same fd? I think that should do the trick for most uses, although obviously I'm no expert. Jeroen
Bruce Momjian <pgman@candle.pha.pa.us> writes: >> I certainly think it's worth considering, although I see Tom has >> objected :-) . Say we have an option called FIRSTLINELABELS, then on >> copy out it would write the headings on the first line, and on copy in >> it could just ignore the first line (so it could work symmetrically). > Yes, that is what I was thinking. Is this a TODO? Sure it can be > misused but most copy options can be similarly misused. Most copy options will result in obvious failures if misused. I dislike this one because of the high risk of silent corruption of your data (loss of a row or insertion of a row that shouldn't be there). Considering no one has even made a positive case why we should add it (for instance, a use-case where it's necessary), I don't think it belongs in the to-do list. regards, tom lane
Tom Lane wrote: >Bruce Momjian <pgman@candle.pha.pa.us> writes: > > >>>I certainly think it's worth considering, although I see Tom has >>>objected :-) . Say we have an option called FIRSTLINELABELS, then on >>>copy out it would write the headings on the first line, and on copy in >>>it could just ignore the first line (so it could work symmetrically). >>> >>> > > > >>Yes, that is what I was thinking. Is this a TODO? Sure it can be >>misused but most copy options can be similarly misused. >> >> > >Most copy options will result in obvious failures if misused. I dislike >this one because of the high risk of silent corruption of your data >(loss of a row or insertion of a row that shouldn't be there). >Considering no one has even made a positive case why we should add it >(for instance, a use-case where it's necessary), I don't think it >belongs in the to-do list. > > > > The use case is that it fits in with the way spreadsheets usually do data tables, and many will only allow you to export a whole worksheet (including the heading row) to CSV, not a part of one. Conversely, working with imported data tables will be harder if they lack headings. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > The use case is that it fits in with the way spreadsheets usually do > data tables, and many will only allow you to export a whole worksheet > (including the heading row) to CSV, not a part of one. Conversely, > working with imported data tables will be harder if they lack headings. Well, if we wanted to *require* a heading row in CSV mode, it would be relatively foolproof. What I don't like is the proposal for an independent option; you've got a 50-50 chance of getting it wrong on import, and that's too high odds for me. Next question: are you imagining that the header row will actually have any semantic significance on input? Will we check the column names? Will we be willing to rearrange the columns if the header row claims the column order is different than the COPY command says? regards, tom lane
Tom Lane wrote: >Andrew Dunstan <andrew@dunslane.net> writes: > > >>The use case is that it fits in with the way spreadsheets usually do >>data tables, and many will only allow you to export a whole worksheet >>(including the heading row) to CSV, not a part of one. Conversely, >>working with imported data tables will be harder if they lack headings. >> >> > >Well, if we wanted to *require* a heading row in CSV mode, it would be >relatively foolproof. What I don't like is the proposal for an >independent option; you've got a 50-50 chance of getting it wrong on >import, and that's too high odds for me. > > No, we can't require it. Not all will have headers, and then we sure would skip a wanted row. And while my use case referred to spreadsheets, they are not the only sources/recipients of CSVs. Your 50-50 chance assumes the user knows nothing about the data and uses the switch at random. I think we're entitled to assume the user knows something about their data and uses the switch according to what they have/want. Right now, if you have a CSV with a header line you have to remove the line with something like 'sed 1d' before importing to PostgreSQL, and when importing from PostgreSQL you have to insert the headers manually after importing the CSV to your spreadsheet. That's got to be annoying and at least as error-prone as providing a switch to accomodate header lines. >Next question: are you imagining that the header row will actually have >any semantic significance on input? Will we check the column names? >Will we be willing to rearrange the columns if the header row claims the >column order is different than the COPY command says? > > > > I'm leaning towards not trying to interpret the line on copy in. Certainly that is the minimalist way to approach this. cheers andrew
Might be worthwhile to look at SQL*Server BCP format files: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/adminsq l/ad_impt_bcp_9yat.asp http://msdn.microsoft.com/library/default.asp?url=/library/en-us/tsqlref /ts_ba-bz_4fec.asp A TCL/TK front end would be especially nice. Of course, being a lazy slug, I am not volunteering to write any of this. Just a thought in case someone else is interested in doing it. > -----Original Message----- > From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Tom Lane > Sent: Monday, August 16, 2004 12:04 PM > To: Andrew Dunstan > Cc: Bruce Momjian; PostgreSQL-development > Subject: Re: [HACKERS] COPY with column headings > > > Andrew Dunstan <andrew@dunslane.net> writes: > > The use case is that it fits in with the way spreadsheets usually do > > data tables, and many will only allow you to export a whole > worksheet > > (including the heading row) to CSV, not a part of one. Conversely, > > working with imported data tables will be harder if they > lack headings. > > Well, if we wanted to *require* a heading row in CSV mode, it > would be relatively foolproof. What I don't like is the > proposal for an independent option; you've got a 50-50 chance > of getting it wrong on import, and that's too high odds for me. > > Next question: are you imagining that the header row will > actually have any semantic significance on input? Will we > check the column names? Will we be willing to rearrange the > columns if the header row claims the column order is > different than the COPY command says? > > regards, tom lane > > ---------------------------(end of > broadcast)--------------------------- > TIP 6: Have you searched our list archives? > http://archives.postgresql.org
Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: > > The use case is that it fits in with the way spreadsheets usually do > > data tables, and many will only allow you to export a whole worksheet > > (including the heading row) to CSV, not a part of one. Conversely, > > working with imported data tables will be harder if they lack headings. > > Well, if we wanted to *require* a heading row in CSV mode, it would be > relatively foolproof. What I don't like is the proposal for an > independent option; you've got a 50-50 chance of getting it wrong on > import, and that's too high odds for me. But we have other CVS options with even lower odds of success on a random guess. People have to know if their input file has headings, period. Telling people they can't have the option because they might get it wrong seems strange to me. > Next question: are you imagining that the header row will actually have > any semantic significance on input? Will we check the column names? > Will we be willing to rearrange the columns if the header row claims the > column order is different than the COPY command says? I assume no semantic significance to header row values on input. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
That'd likely be useful. Either specify the column names with COPY or \copy, or put them in the data file and pass an option to the command to look for them. The only time this could be a problem is if you forget to tell COPY to look for the field names in the file (AND specify the field names to COPY), and the values for the field names are such that they don't cause data errors (probably pretty slim chance). I'd also really like to see the ability to specify default values for fields not included in the data file (field defaults in the table definition is limited and doesn't fill the order when you need to specify various foreign key ids, specific times, etc.). CSN >>>>>>>>>>>>>>>>>>> Someone just asked about a COPY capability to supply the column headings as the first line of the copy statement. Do we want to support something like that? Does anyone else want such functionality? __________________________________ Do you Yahoo!? New and Improved Yahoo! Mail - Send 10MB messages! http://promotions.yahoo.com/new_mail
Added to TODO: > o Allow COPY to optionally include column headings as the first I know Tom didn't like it but there were others who did. --------------------------------------------------------------------------- Andrew Dunstan wrote: > > > Tom Lane wrote: > > >Bruce Momjian <pgman@candle.pha.pa.us> writes: > > > > > >>>I certainly think it's worth considering, although I see Tom has > >>>objected :-) . Say we have an option called FIRSTLINELABELS, then on > >>>copy out it would write the headings on the first line, and on copy in > >>>it could just ignore the first line (so it could work symmetrically). > >>> > >>> > > > > > > > >>Yes, that is what I was thinking. Is this a TODO? Sure it can be > >>misused but most copy options can be similarly misused. > >> > >> > > > >Most copy options will result in obvious failures if misused. I dislike > >this one because of the high risk of silent corruption of your data > >(loss of a row or insertion of a row that shouldn't be there). > >Considering no one has even made a positive case why we should add it > >(for instance, a use-case where it's necessary), I don't think it > >belongs in the to-do list. > > > > > > > > > > The use case is that it fits in with the way spreadsheets usually do > data tables, and many will only allow you to export a whole worksheet > (including the heading row) to CSV, not a part of one. Conversely, > working with imported data tables will be harder if they lack headings. > > cheers > > andrew > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073