Re: Allow COPY's 'text' format to output a header - Mailing list pgsql-hackers

From Isaac Morland
Subject Re: Allow COPY's 'text' format to output a header
Date
Msg-id CAMsGm5f_SMZHwja=a+R1ikdrWn3Nyp5_xaWpx_K3EdV-XkdQAg@mail.gmail.com
Whole thread Raw
In response to Re: Allow COPY's 'text' format to output a header  ("Daniel Verite" <daniel@manitou-mail.org>)
Responses Re: Allow COPY's 'text' format to output a header  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Allow COPY's 'text' format to output a header  ("Daniel Verite" <daniel@manitou-mail.org>)
List pgsql-hackers
On 15 May 2018 at 10:26, Daniel Verite <daniel@manitou-mail.org> wrote:
        Andrew Dunstan wrote:

> I'm not necessarily opposed to this, but I'm not certain about the use
> case either.

+1.
The downside is that it would create the need, when using COPY TO,
to know whether an input file was generated with or without header,
and a hazard on mistakes.
If you say it was and it wasn't, you quietly loose the first row of data.
If you say it wasn't and in fact it was, either there's a
datatype mismatch or you quietly get a spurious row of data.

Just to be clear, we're talking about my "header match" feature, not the basic idea of allowing a header in text format?

You already need to know whether or not there is a header, no matter what: there is no way to avoid needing to know the format of the data to be imported. And certainly if "header" is an option, one has to know whether or not to set it in any given situation.

The "header match" helps ensure the file is the right one by requiring the header contents to match the field names, rather than just being thrown away.

I don't view it as a way to avoid pre-defining the table. It just increases the chance that the wrong file won't load but will instead trigger an error condition immediately.

Note that this advantage includes what happens if you specify header but the file has no header: as long as you actually specified header match, the error will be caught unless the first row of actual data happens to match the field names, which is almost always highly unlikely and frequently impossible (e.g., a person with firstname "firstname", surname "surname", birthday "birthday" and so on).

One can imagine extensions of the idea: for example, the header could actually be used to identify the columns, so the column order in the file doesn't matter. There could also be an "AS" syntax to allow the target field names to be different from the field names in the header. I have occasionally found myself wanting to ignore certain columns of the file. But these are all significantly more complicated than just looking at the header and requiring it to match the target field names.

If one specifies no header but there actually is a header in the file, then loading will fail in many cases but it depends on what the header in the file looks like. This part is unaffected by my idea.
 
This complication should be balanced by some advantage.
What can we do with the header?
If you already have the table ready to COPY in, you don't
need that information. The only reason why COPY TO
needs to know about the header is to throw it away.
And if you don't have the table created yet, a header
with just the column names is hardly sufficient to create it,
isn't it?

pgsql-hackers by date:

Previous
From: Mark Dilger
Date:
Subject: Windows build broken starting at da9b580d89903fee871cf54845ffa2b26bda2e11
Next
From: Tom Lane
Date:
Subject: Re: Windows build broken starting at da9b580d89903fee871cf54845ffa2b26bda2e11