Thread: Allow COPY's 'text' format to output a header
This patch adds the capability to use the HEADER feature with the "text" format of the COPY command. The patch includes the related update to documentation and an additional regression test for this feature.
Currently you can only add a header line (which lists the column names) when exporting with COPY to the CSV format, but I much prefer using the default "text" format. This feature is also currently listed on the to-do list (https://wiki.postgresql.org/wiki/Todo#COPY) where it seems to have been requested some years ago.
Currently you can only add a header line (which lists the column names) when exporting with COPY to the CSV format, but I much prefer using the default "text" format. This feature is also currently listed on the to-do list (https://wiki.postgresql.org/wiki/Todo#COPY) where it seems to have been requested some years ago.
Hopefully I've done everything correctly and the patch is acceptable enough to be considered for application.
Simon Muller
Attachment
Hi Simon, On 5/13/18 6:18 PM, Simon Muller wrote: > This patch adds the capability to use the HEADER feature with the "text" > format of the COPY command. The patch includes the related update to > documentation and an additional regression test for this feature. > > Currently you can only add a header line (which lists the column names) > when exporting with COPY to the CSV format, but I much prefer using the > default "text" format. This feature is also currently listed on the > to-do list (https://wiki.postgresql.org/wiki/Todo#COPY) where it seems > to have been requested some years ago. > > Hopefully I've done everything correctly and the patch is acceptable > enough to be considered for application. This patch makes sense to me and looks reasonable. We're in the middle of a feature freeze that will last most of the summer, so be sure to enter your patch into the next commitfest so it can be considered when the freeze is over. https://commitfest.postgresql.org/18/ Regards, -- -David david@pgmasters.net
On Sun, May 13, 2018 at 07:01:00PM -0400, David Steele wrote: > This patch makes sense to me and looks reasonable. One "potential" problem is if a relation has a full set of column which allows the input of text-like data: if the header has been added with COPY TO, and that the user forgets to add again the header option with COPY FROM, then an extra row will be generated but there is the same problem with CSV format :) One comment I have about the patch is that there is no test for COPY FROM with an output file which has a header. In this case if HEADER is true then the file can be loaded. If HEADER is wrong, an error should normally be raised because of the format (well, let's discard the case of the relation with text-only columns). So the tests could be extended a bit even for CSV. > We're in the middle of a feature freeze that will last most of the > summer, so be sure to enter your patch into the next commitfest so it > can be considered when the freeze is over. > > https://commitfest.postgresql.org/18/ Yes, you will need to be patient a couple of months here. -- Michael
Attachment
Okay, I've added this to the next commitfest at https://commitfest.postgresql.org/18/1629/.
Thanks both Michael and David for the feedback so far.
Thanks both Michael and David for the feedback so far.
On 14 May 2018 at 02:37, Michael Paquier <michael@paquier.xyz> wrote:
On Sun, May 13, 2018 at 07:01:00PM -0400, David Steele wrote:
> This patch makes sense to me and looks reasonable.
One "potential" problem is if a relation has a full set of column which
allows the input of text-like data: if the header has been added with
COPY TO, and that the user forgets to add again the header option with
COPY FROM, then an extra row will be generated but there is the same
problem with CSV format :)
One comment I have about the patch is that there is no test for
COPY FROM with an output file which has a header. In this case if
HEADER is true then the file can be loaded. If HEADER is wrong, an
error should normally be raised because of the format (well, let's
discard the case of the relation with text-only columns). So the tests
could be extended a bit even for CSV.
> We're in the middle of a feature freeze that will last most of the
> summer, so be sure to enter your patch into the next commitfest so it
> can be considered when the freeze is over.
>
> https://commitfest.postgresql.org/18/
Yes, you will need to be patient a couple of months here.
--
Michael
On 05/14/2018 02:35 AM, Simon Muller wrote: > Okay, I've added this to the next commitfest at > https://commitfest.postgresql.org/18/1629/. > > Thanks both Michael and David for the feedback so far. > > (Please don't top-post on PostgreSQL lists.) I'm not necessarily opposed to this, but I'm not certain about the use case either. The original request seemed to stem from a false impression that CSV mode can't produce or consume tab-delimited files. But it can, and in fact it's saner for almost all uses than text format. Postgres' text format is really intended for Postgres' use. CSV format is more appropriate for dealing with external programs, whether the delimiter be a tab or a comma. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, May 14, 2018 at 09:37:07AM +0900, Michael Paquier wrote: > On Sun, May 13, 2018 at 07:01:00PM -0400, David Steele wrote: > > This patch makes sense to me and looks reasonable. > > One "potential" problem is if a relation has a full set of column which > allows the input of text-like data: if the header has been added with > COPY TO, and that the user forgets to add again the header option with > COPY FROM, then an extra row will be generated but there is the same > problem with CSV format :) Yeah, I wonder if that can be addressed. I wonder if there was a way to let COPY FROM detect or ignore headers as appropriate and rather than cause silently result in headers being added as data. Maybe a blank line after the header line could prevent this confusion? Garick
I wonder if there was a way to let COPY FROM detect or ignore headers
as appropriate and rather than cause silently result in headers being
added as data.
Not reliably
Maybe a blank line after the header line could prevent this confusion
No
+1 for allowing HEADER with FORMAT text. It doesn't interfere with COPY and even if I were to agree that CSV format is the better one this seems like an unnecessary area to impose preferences. If TSV with Header meets someone's need providing a minimal (and consistent with expectations) syntax to accomplish that goal seems reasonable, as does the patch.
David J.
While we're discussing COPY options, what do people think of an option for COPY FROM with header to require that the headers match the target column names? This would help to ensure that the file is actually the right one.
On 14 May 2018 at 14:55, David G. Johnston <david.g.johnston@gmail.com> wrote:
I wonder if there was a way to let COPY FROM detect or ignore headersas appropriate and rather than cause silently result in headers being
added as data. Not reliablyMaybe a blank line after the header line could prevent this confusionNo+1 for allowing HEADER with FORMAT text. It doesn't interfere with COPY and even if I were to agree that CSV format is the better one this seems like an unnecessary area to impose preferences. If TSV with Header meets someone's need providing a minimal (and consistent with expectations) syntax to accomplish that goal seems reasonable, as does the patch.David J.
On Mon, May 14, 2018 at 04:08:47PM -0400, Isaac Morland wrote: > While we're discussing COPY options, what do people think of an option for > COPY FROM with header to require that the headers match the target column > names? This would help to ensure that the file is actually the right one. I am personally not much into such sanity check logics in COPY FWIW if we can live without. -- Michael
Attachment
Andrew Dunstan wrote: > I'm not necessarily opposed to this, but I'm not certain about the use > case either. +1. The downside is that it would create the need, when using COPY TO, to know whether an input file was generated with or without header, and a hazard on mistakes. If you say it was and it wasn't, you quietly loose the first row of data. If you say it wasn't and in fact it was, either there's a datatype mismatch or you quietly get a spurious row of data. This complication should be balanced by some advantage. What can we do with the header? If you already have the table ready to COPY in, you don't need that information. The only reason why COPY TO needs to know about the header is to throw it away. And if you don't have the table created yet, a header with just the column names is hardly sufficient to create it, isn't it? Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
On 15 May 2018 at 10:26, Daniel Verite <daniel@manitou-mail.org> wrote:
Andrew Dunstan wrote:
> I'm not necessarily opposed to this, but I'm not certain about the use
> case either.
+1.
The downside is that it would create the need, when using COPY TO,
to know whether an input file was generated with or without header,
and a hazard on mistakes.
If you say it was and it wasn't, you quietly loose the first row of data.
If you say it wasn't and in fact it was, either there's a
datatype mismatch or you quietly get a spurious row of data.
Just to be clear, we're talking about my "header match" feature, not the basic idea of allowing a header in text format?
You already need to know whether or not there is a header, no matter what: there is no way to avoid needing to know the format of the data to be imported. And certainly if "header" is an option, one has to know whether or not to set it in any given situation.
The "header match" helps ensure the file is the right one by requiring the header contents to match the field names, rather than just being thrown away.
I don't view it as a way to avoid pre-defining the table. It just increases the chance that the wrong file won't load but will instead trigger an error condition immediately.
Note that this advantage includes what happens if you specify header but the file has no header: as long as you actually specified header match, the error will be caught unless the first row of actual data happens to match the field names, which is almost always highly unlikely and frequently impossible (e.g., a person with firstname "firstname", surname "surname", birthday "birthday" and so on).
One can imagine extensions of the idea: for example, the header could actually be used to identify the columns, so the column order in the file doesn't matter. There could also be an "AS" syntax to allow the target field names to be different from the field names in the header. I have occasionally found myself wanting to ignore certain columns of the file. But these are all significantly more complicated than just looking at the header and requiring it to match the target field names.
If one specifies no header but there actually is a header in the file, then loading will fail in many cases but it depends on what the header in the file looks like. This part is unaffected by my idea.
This complication should be balanced by some advantage.
What can we do with the header?
If you already have the table ready to COPY in, you don't
need that information. The only reason why COPY TO
needs to know about the header is to throw it away.
And if you don't have the table created yet, a header
with just the column names is hardly sufficient to create it,
isn't it?
Isaac Morland <isaac.morland@gmail.com> writes: > On 15 May 2018 at 10:26, Daniel Verite <daniel@manitou-mail.org> wrote: >> Andrew Dunstan wrote: >>> I'm not necessarily opposed to this, but I'm not certain about the use >>> case either. >> The downside is that it would create the need, when using COPY TO, >> to know whether an input file was generated with or without header, >> and a hazard on mistakes. >> If you say it was and it wasn't, you quietly loose the first row of data. >> If you say it wasn't and in fact it was, either there's a >> datatype mismatch or you quietly get a spurious row of data. > Just to be clear, we're talking about my "header match" feature, not the > basic idea of allowing a header in text format? AFAICS, Daniel's just reacting to the basic idea of a header line. I agree that by itself that's not worth much. However, if we added your proposed option to insist that the column names match during COPY IN, I think that that could have some value. It would allow forestalling one common type of pilot error, ie copying the wrong file entirely. (It'd also prevent copying in data that has the wrong column order, but I think that's a less common scenario. I might be wrong about that.) > One can imagine extensions of the idea: for example, the header could > actually be used to identify the columns, so the column order in the file > doesn't matter. There could also be an "AS" syntax to allow the target > field names to be different from the field names in the header. I have > occasionally found myself wanting to ignore certain columns of the file. > But these are all significantly more complicated than just looking at the > header and requiring it to match the target field names. Yeah, and every bit of flexibility you add raises the chance of an undetected error. COPY isn't intended as a general ETL facility, so I'd mostly be -1 on adding such things. But I can see the value of confirming that you're copying the right file, and a header match check would go a long way towards doing that. regards, tom lane
On Tuesday, May 15, 2018, Tom Lane <tgl@sss.pgh.pa.us> wrote:
AFAICS, Daniel's just reacting to the basic idea of a header line.
I agree that by itself that's not worth much. However, if we added
your proposed option to insist that the column names match during COPY
IN, I think that that could have some value.
I'm fine for adding it without the added matching behavior, though turning the boolean into an enum is appealing.
HEADER { true | false | match }
Though we'd need to accept all variants of Boolean for compatability...
I'm of the opinion that text and csv should be the same excepting their defaults for some of the options.
David J.
Isaac Morland wrote: > Just to be clear, we're talking about my "header match" feature, not the > basic idea of allowing a header in text format? For my reply it was on merely allowing it, as does the current patch at https://commitfest.postgresql.org/18/1629 Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
On Tue, May 15, 2018 at 12:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> One can imagine extensions of the idea: for example, the header could >> actually be used to identify the columns, so the column order in the file >> doesn't matter. There could also be an "AS" syntax to allow the target >> field names to be different from the field names in the header. I have >> occasionally found myself wanting to ignore certain columns of the file. >> But these are all significantly more complicated than just looking at the >> header and requiring it to match the target field names. > > Yeah, and every bit of flexibility you add raises the chance of an > undetected error. COPY isn't intended as a general ETL facility, > so I'd mostly be -1 on adding such things. But I can see the value > of confirming that you're copying the right file, and a header match > check would go a long way towards doing that. True. FWIW, I'm +1 on this idea. I think a header line is a pretty common need, and if you're exporting a large amount of data, it could be pretty annoying to have to first run COPY, and then do (echo blah,blah1,blah2; cat copyoutput.txt)>whatireallywant.txt There's a lot of value in being able to export from program A *exactly* what program B wants to import, rather than something that is close but has to be massaged. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 14 May 2018 at 08:35, Simon Muller <samullers@gmail.com> wrote:
Okay, I've added this to the next commitfest at https://commitfest.postgresql.org/18/1629/.
Thanks both Michael and David for the feedback so far.
I noticed through the patch tester link at http://commitfest.cputube.org/ that my patch caused a file_fdw test to fail (since I previously tested only with "make check" and not with "make check-world").
This v2 patch should fix that.
Attachment
On 4 July 2018 at 22:44, Simon Muller <samullers@gmail.com> wrote:
I noticed through the patch tester link at http://commitfest.cputube.org/ that my patch caused a file_fdw test to fail (since I previously tested only with "make check" and not with "make check-world").This v2 patch should fix that.
This patch just fixes a newline issue introduced in my previous patch.
Attachment
On 4 July 2018 at 22:44, Simon Muller <samullers@gmail.com> wrote:I noticed through the patch tester link at http://commitfest.cputube.org/ that my patch caused a file_fdw test to fail (since I previously tested only with "make check" and not with "make check-world").This v2 patch should fix that.This patch just fixes a newline issue introduced in my previous patch.
I've reviewed this patch and feel this patch addresses the original ask. I tested it manually trying to break it and, as mentioned previously, it's behavior is the same as the CSV copy with regards to it's shortcomings. However, I feel
1) a "copy from" test is needed and
2) the current "copy to" test is (along with a few others) in the wrong file.
With regards to #2, the copy.source tests are for things requiring replacement when running the tests. Given that these copy tests do not, I have moved the current last set of copy tests to the copy2.sql file and have provided an attached patch.
With regards to #1, the patch I have provided can then be used and the following added as the COPY TO/FROM tests (perhaps after line 426 of the attached copy2.sql file). Note that I moved the FROM test before the TO test and omitted the "(format text, header true)" in the FROM test since it is another way the command can be invoked.
copy copytest3 from stdin header;this is just a line full of junk that would error out if parsed11 a 122 b 2\.copy copytest3 to stdout with (format text, header true);
As for the matching check of the header in the discussion of this patch, I feel that is a separate patch that can be added later since it would affect the general functionality of the copy command, not just the ability to have a text header.
Best,
- Cynthia Shang
Attachment
On Wed, Jul 25, 2018 at 1:24 PM, Cynthia Shang <cynthia.shang@crunchydata.com> wrote: > With regards to #2, the copy.source tests are for things requiring > replacement when running the tests. Given that these copy tests do not, I > have moved the current last set of copy tests to the copy2.sql file and have > provided an attached patch. The patch appears in the RAW and in your email (hopefully) but it doesn't appear in the thread archive so I am reattaching from a different email client.
Attachment
On 25 July 2018 at 19:24, Cynthia Shang <cynthia.shang@crunchydata.com> wrote:
I've reviewed this patch and feel this patch addresses the original ask. I tested it manually trying to break it and, as mentioned previously, it's behavior is the same as the CSV copy with regards to it's shortcomings. However, I feel1) a "copy from" test is needed and2) the current "copy to" test is (along with a few others) in the wrong file.With regards to #2, the copy.source tests are for things requiring replacement when running the tests. Given that these copy tests do not, I have moved the current last set of copy tests to the copy2.sql file and have provided an attached patch.
Thanks for reviewing the patch.
I agree that moving those previous and these new tests out of the .source files seems to make more sense as they don't make use of the preprocessing/replacement feature.
With regards to #1, the patch I have provided can then be used and the following added as the COPY TO/FROM tests (perhaps after line 426 of the attached copy2.sql file). Note that I moved the FROM test before the TO test and omitted the "(format text, header true)" in the FROM test since it is another way the command can be invoked.copy copytest3 from stdin header;this is just a line full of junk that would error out if parsed11 a 122 b 2\.copy copytest3 to stdout with (format text, header true);
I've incorporated both your suggestions and included the patch you provided in the attached patch. Hope it's as expected.
As for the matching check of the header in the discussion of this patch, I feel that is a separate patch that can be added later since it would affect the general functionality of the copy command, not just the ability to have a text header.Best,- Cynthia Shang
P.S. I did receive the first attached patch, but on my Ubuntu I had to apply it using "git apply --ignore-space-change --ignore-whitespace", probably due to line ending differences.
--
Simon Muller
Attachment
> On Jul 25, 2018, at 6:09 PM, Simon Muller <samullers@gmail.com> wrote: > > I've incorporated both your suggestions and included the patch you provided in the attached patch. Hope it's as expected. > -- > Simon Muller > > <text_header_v4.patch> Reviewed and retested. Changing status to Ready for Committer.
Simon Muller wrote: > I've incorporated both your suggestions and included the patch you provided > in the attached patch. Hope it's as expected. Still unconvinced about the use case, since COPY's text format is only meant to be consumed by Postgres, and the only way that Postgres will consume this header is to discard it (at least as of the current patch). But anyway... /* Check header */ - if (!cstate->csv_mode && cstate->header_line) + if (cstate->binary && cstate->header_line) ereport(ERROR, - (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), - errmsg("COPY HEADER available only in CSV mode"))); + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("cannot specify HEADER in BINARY mode"))); Why should ERRCODE_FEATURE_NOT_SUPPORTED become ERRCODE_SYNTAX_ERROR? Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
> On Aug 1, 2018, at 10:20 AM, Daniel Verite <daniel@manitou-mail.org> wrote: > > /* Check header */ > - if (!cstate->csv_mode && cstate->header_line) > + if (cstate->binary && cstate->header_line) > ereport(ERROR, > - (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > - errmsg("COPY HEADER available only in CSV mode"))); > + (errcode(ERRCODE_SYNTAX_ERROR), > + errmsg("cannot specify HEADER in BINARY mode"))); > > Why should ERRCODE_FEATURE_NOT_SUPPORTED become ERRCODE_SYNTAX_ERROR? > I agree; it should remain ERRCODE_FEATURE_NOT_SUPPORTED and I might also suggest the message read "COPY HEADER not availablein BINARY mode", although I'm pretty agnostic on the latter. Regards, -Cynthia Shang
On 1 August 2018 at 17:18, Cynthia Shang <cynthia.shang@crunchydata.com> wrote:
> On Aug 1, 2018, at 10:20 AM, Daniel Verite <daniel@manitou-mail.org> wrote:
>
> /* Check header */
> - if (!cstate->csv_mode && cstate->header_line)
> + if (cstate->binary && cstate->header_line)
> ereport(ERROR,
> - (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> - errmsg("COPY HEADER available only in CSV mode")));
> + (errcode(ERRCODE_SYNTAX_ERROR),
> + errmsg("cannot specify HEADER in BINARY mode")));
>
> Why should ERRCODE_FEATURE_NOT_SUPPORTED become ERRCODE_SYNTAX_ERROR?
>
I agree; it should remain ERRCODE_FEATURE_NOT_SUPPORTED and I might also suggest the message read "COPY HEADER not available in BINARY mode", although I'm pretty agnostic on the latter.
Regards,
-Cynthia Shang
I changed the error type and message for consistency with other similar errors in that file. Whenever options are combined that are incompatible, it looks like the convention is for a ERRCODE_SYNTAX_ERROR to be thrown.
For instance, in case you both specify a specific DELIMITER but also declare the format as BINARY, then there is this code in that same file:
if (cstate->binary && cstate->delim)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify DELIMITER in BINARY mode")));
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify DELIMITER in BINARY mode")));
HEADER seems very similar to me since, like DELIMITER, it makes sense for the textual formats such as CSV and TEXT, but doesn't make sense with the BINARY format.
ERRCODE_FEATURE_NOT_SUPPORTED previously made sense since the only reason TEXT and HEADER weren't compatible options was because the feature was not yet implemented, but now ERRCODE_SYNTAX_ERROR seems to make sense to me since I can't foresee a use case where BINARY and HEADER would ever be compatible options.
--
Simon Muller
Simon Muller wrote: > I changed the error type and message for consistency with other similar > errors in that file. Whenever options are combined that are incompatible, > it looks like the convention is for a ERRCODE_SYNTAX_ERROR to be thrown. That makes sense, thanks for elaborating, although there are also a fair number of ERRCODE_FEATURE_NOT_SUPPORTED in copy.c that are raised on forbidden/nonsensical combination of features, so the consistency argument could work both ways. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
> On Aug 2, 2018, at 8:11 AM, Daniel Verite <daniel@manitou-mail.org> wrote: > > That makes sense, thanks for elaborating, although there are also > a fair number of ERRCODE_FEATURE_NOT_SUPPORTED in copy.c > that are raised on forbidden/nonsensical combination of features, > so the consistency argument could work both ways. > If there is not a strong reason to change the error code, then I believe we should not. The error is the same as it was before,just narrower in scope. Best, -Cynthia
On 2 August 2018 at 17:07, Cynthia Shang <cynthia.shang@crunchydata.com> wrote:
> On Aug 2, 2018, at 8:11 AM, Daniel Verite <daniel@manitou-mail.org> wrote:
>
> That makes sense, thanks for elaborating, although there are also
> a fair number of ERRCODE_FEATURE_NOT_SUPPORTED in copy.c
> that are raised on forbidden/nonsensical combination of features,
> so the consistency argument could work both ways.
>
If there is not a strong reason to change the error code, then I believe we should not. The error is the same as it was before, just narrower in scope.
Best,
-Cynthia
Sure, thanks both for the feedback. Attached is a patch with the error kept as ERRCODE_FEATURE_NOT_SUPPORTED.
--
Simon Muller
Attachment
> On Aug 2, 2018, at 3:30 PM, Simon Muller <samullers@gmail.com> wrote: > > Sure, thanks both for the feedback. Attached is a patch with the error kept as ERRCODE_FEATURE_NOT_SUPPORTED. > I was able to apply the patch (after resolving a merge conflict which was expected given an update in master). All looksgood. -Cynthia
Greetings, * Cynthia Shang (cynthia.shang@crunchydata.com) wrote: > > On Aug 2, 2018, at 3:30 PM, Simon Muller <samullers@gmail.com> wrote: > > > > Sure, thanks both for the feedback. Attached is a patch with the error kept as ERRCODE_FEATURE_NOT_SUPPORTED. > > I was able to apply the patch (after resolving a merge conflict which was expected given an update in master). All looksgood. If there's a merge conflict against master, then it'd be good for an updated patch to be posted. Thanks! Stephen
Attachment
On 6 August 2018 at 16:34, Stephen Frost <sfrost@snowman.net> wrote:
Greetings,
* Cynthia Shang (cynthia.shang@crunchydata.com) wrote:
> I was able to apply the patch (after resolving a merge conflict which was expected given an update in master). All looks good.
If there's a merge conflict against master, then it'd be good for an
updated patch to be posted.
Thanks!
Stephen
Attached is an updated patch that should directly apply against current master.
--
Simon Muller
Attachment
On Aug 8, 2018, at 2:57 PM, Simon Muller <samullers@gmail.com> wrote:<text_header_v6.patch>
If there's a merge conflict against master, then it'd be good for an
updated patch to be posted.
Thanks!
StephenAttached is an updated patch that should directly apply against current master.--Simon Muller
This patch looks good. I realized I should have changed the status back while we were discussing all this. It is now (and still is) ready for committer.
Thanks,
-Cynthia
On Thu, Aug 09, 2018 at 10:37:28AM -0400, Cynthia Shang wrote: > This patch looks good. I realized I should have changed the status > back while we were discussing all this. It is now (and still is) ready > for committer. I have some comments. -ERROR: COPY HEADER available only in CSV mode +ERROR: cannot specify HEADER in BINARY mode This should read "COPY HEADER not available in BINARY mode" perhaps? +copy copytest3 from stdin csv header; +copy copytest3 to stdout csv header; It would be more interesting to first export the data into the file with a header, truncate the relation, and import it back with again header specified. The data of the original should match the new, for both text and csv format. CopyStateData defines header_line, which still assumes that only CSV is supported. Why are there no additional tests for file_fdw? The point about the header matching mentioned upthread is quite interesting as it could make the proposed feature way more useful, and it has not really been discussed. As far as I can see this adds more sanity checks in NextCopyFromRawFields(). I'd like to think that this should be a completely different option, say CHECK_HEADER, as CSV simply skips the header in COPY FROM if specified on HEAD. -- Michael
Attachment
On Fri, Aug 17, 2018 at 01:39:11PM +0900, Michael Paquier wrote: > The point about the header matching mentioned upthread is quite > interesting as it could make the proposed feature way more useful, and > it has not really been discussed. As far as I can see this adds more > sanity checks in NextCopyFromRawFields(). I'd like to think that this > should be a completely different option, say CHECK_HEADER, as CSV simply > skips the header in COPY FROM if specified on HEAD. It has been a couple of weeks since the last review, which has not been addressed, so I am marking the patch as returned with feedback. -- Michael
Attachment
[PATCH v1] Allow COPY "test" to output a header and add header matching mode to COPY FROM
From
"Rémi Lapeyre"
Date:
Hi, here's a new version of the patch with the header matching feature. I should apply cleanly on master, let me know if anything's wrong. --- contrib/file_fdw/input/file_fdw.source | 7 +- contrib/file_fdw/output/file_fdw.source | 13 ++-- doc/src/sgml/ref/copy.sgml | 9 ++- src/backend/commands/copy.c | 93 ++++++++++++++++++++++--- src/test/regress/input/copy.source | 71 ++++++++++++++----- src/test/regress/output/copy.source | 58 ++++++++++----- 6 files changed, 202 insertions(+), 49 deletions(-)
Attachment
Re: [PATCH v1] Allow COPY "text" to output a header and add headermatching mode to COPY FROM
From
Rémi Lapeyre
Date:
I created an entry for this patch in the new CommiFest but it seems that it is not finding it. Is there anything that I needto do?
Re: [PATCH v1] Allow COPY "text" to output a header and add headermatching mode to COPY FROM
From
Surafel Temesgen
Date:
On Mon, Mar 2, 2020 at 2:45 AM Rémi Lapeyre <remi.lapeyre@henki.fr> wrote:
I created an entry for this patch in the new CommiFest but it seems that it is not finding it. Is there anything that I need to do?
Is is added on next open commit fest which is https://commitfest.postgresql.org/28/ now
regards
Surafel
[PATCH v2] Allow COPY "text" to output a header and add header matching mode to COPY FROM
From
"Rémi Lapeyre"
Date:
Here's a rebased version that should apply cleanly on HEAD. --- contrib/file_fdw/input/file_fdw.source | 7 +- contrib/file_fdw/output/file_fdw.source | 13 ++-- doc/src/sgml/ref/copy.sgml | 9 ++- src/backend/commands/copy.c | 93 ++++++++++++++++++++++--- src/test/regress/input/copy.source | 71 ++++++++++++++----- src/test/regress/output/copy.source | 58 ++++++++++----- 6 files changed, 202 insertions(+), 49 deletions(-)
Attachment
Re: [PATCH v1] Allow COPY "text" to output a header and add header matching mode to COPY FROM
From
Daniel Gustafsson
Date:
> On 2 Mar 2020, at 00:45, Rémi Lapeyre <remi.lapeyre@henki.fr> wrote: > I created an entry for this patch in the new CommiFest but it seems that it is not finding it. Is there anything that Ineed to do? This patch no longer applies cleanly on HEAD, due to changes in the regress tests. Please submit a rebased version, I've marked this entry as Waiting on Author for now. cheers ./daniel
[PATCH v2] Allow COPY "text" to output a header and add header matching mode to COPY FROM
From
Rémi Lapeyre
Date:
Hi, here's a new version of the patch that should apply cleanly. I'll monitor the status on http://cfbot.cputube.org/ Rémi --- contrib/file_fdw/input/file_fdw.source | 7 +- contrib/file_fdw/output/file_fdw.source | 13 ++-- doc/src/sgml/ref/copy.sgml | 9 ++- src/backend/commands/copy.c | 93 ++++++++++++++++++++++--- src/test/regress/input/copy.source | 71 ++++++++++++++----- src/test/regress/output/copy.source | 58 ++++++++++----- 6 files changed, 202 insertions(+), 49 deletions(-)
Attachment
Re: [PATCH v2] Allow COPY "text" to output a header and add header matching mode to COPY FROM
From
Daniel Gustafsson
Date:
> On 8 Jul 2020, at 13:45, Rémi Lapeyre <remi.lapeyre@lenstra.fr> wrote: > > Hi, here's a new version of the patch that should apply cleanly. I'll monitor > the status on http://cfbot.cputube.org/ Please reply to the old thread about this, as that's the one connected to the Commitfest entry and thats where all the discussion has happened. While additional threads can be attached to a CF entry, it's for when multiple discussions are relevant to a patch, a single discussion should not be broken into multiple threads. cheers ./daniel
Re: [PATCH v2] Allow COPY "text" to output a header and add header matching mode to COPY FROM
From
Rémi Lapeyre
Date:
> > Please reply to the old thread about this, as that's the one connected to the > Commitfest entry and thats where all the discussion has happened. While > additional threads can be attached to a CF entry, it's for when multiple > discussions are relevant to a patch, a single discussion should not be broken > into multiple threads. Sorry about this, I thought setting the In-Reply-To like so: git send-email -v2 --to=pgsql-hackers@postgresql.org --in-reply-to=4E31E7AA-BFC6-47ED-90E1-3838E4D1F4FF@yesql.se HEAD^ There is some nice informations about how to write a good commit but I could not find exactly how to send it so I probablydid something wrong. > > cheers ./daniel
Re: [PATCH v2] Allow COPY "text" to output a header and add header matching mode to COPY FROM
From
Peter Eisentraut
Date:
On 2020-07-08 13:45, Rémi Lapeyre wrote: > Hi, here's a new version of the patch that should apply cleanly. I'll monitor > the status on http://cfbot.cputube.org/ It's hard to find an explanation what this patch actually does. I don't want to have to go through threads dating back 4 months to determine what was discussed and what was actually implemented. Since you're already using git format-patch, just add something to the commit message. It appears that these are really two separate features, so perhaps they should be two patches. Also, the new header matching mode could probably use more than one line of documentation. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> It's hard to find an explanation what this patch actually does. I don't > want to have to go through threads dating back 4 months to determine > what was discussed and what was actually implemented. Since you're > already using git format-patch, just add something to the commit message. > > > It appears that these are really two separate features, so perhaps they > should be two patches. Thanks for the feedback, I've split cleanly the two patches, simplified the tests and tried to explain the changes in the commit message. > Also, the new header matching mode could probably use more than one line > of documentation. I've improved the documentation, let's me know if it's better. It seems like cfbot is not happy with the way I'm sending my patches. The wiki has some good advices on how to write a patch but I couldn't find anything on how to send it. I've used git send-email -v3 --compose --to=... --in-reply-to=... HEAD^^ here but I'm not sure if it's correct. I will see if it works and will try to fix it if it's not but since it runs once a day it may take some time.
CSV format supports the HEADER option to output a header in the output, it is convenient when other programs need to consume the output. This patch adds the same option to the default text format. Discussion: https://www.postgresql.org/message-id/flat/CAF1-J-0PtCWMeLtswwGV2M70U26n4g33gpe1rcKQqe6wVQDrFA@mail.gmail.com --- contrib/file_fdw/input/file_fdw.source | 1 - contrib/file_fdw/output/file_fdw.source | 4 +--- doc/src/sgml/ref/copy.sgml | 3 ++- src/backend/commands/copy.c | 11 +++++++---- src/test/regress/input/copy.source | 12 ++++++++++++ src/test/regress/output/copy.source | 8 ++++++++ 6 files changed, 30 insertions(+), 9 deletions(-)
Attachment
COPY FROM supports the HEADER option to silently discard the header from a CSV or text file. It is possible to load by mistake a file that matches the expected format, for example if two text columns have been swapped, resulting in garbage in the database. This option adds the possibility to actually check the header to make sure it matches what is expected and exit immediatly if it does not. Discussion: https://www.postgresql.org/message-id/flat/CAF1-J-0PtCWMeLtswwGV2M70U26n4g33gpe1rcKQqe6wVQDrFA@mail.gmail.com --- contrib/file_fdw/input/file_fdw.source | 6 ++ contrib/file_fdw/output/file_fdw.source | 9 ++- doc/src/sgml/ref/copy.sgml | 8 ++- src/backend/commands/copy.c | 84 +++++++++++++++++++++++-- src/test/regress/input/copy.source | 25 ++++++++ src/test/regress/output/copy.source | 17 +++++ 6 files changed, 140 insertions(+), 9 deletions(-)
Attachment
On Fri, Jul 17, 2020 at 5:11 PM Rémi Lapeyre <remi.lapeyre@lenstra.fr> wrote:
> It's hard to find an explanation what this patch actually does. I don't
> want to have to go through threads dating back 4 months to determine
> what was discussed and what was actually implemented. Since you're
> already using git format-patch, just add something to the commit message.
>
>
> It appears that these are really two separate features, so perhaps they
> should be two patches.
Thanks for the feedback, I've split cleanly the two patches, simplified the
tests and tried to explain the changes in the commit message.
> Also, the new header matching mode could probably use more than one line
> of documentation.
I've improved the documentation, let's me know if it's better.
It seems like cfbot is not happy with the way I'm sending my patches. The wiki
has some good advices on how to write a patch but I couldn't find anything on
how to send it. I've used
git send-email -v3 --compose --to=... --in-reply-to=... HEAD^^
here but I'm not sure if it's correct. I will see if it works and will try to fix
it if it's not but since it runs once a day it may take some time.
If you have two patches that depend on each other, you should send them as two attachment to the same email. You now sent them as two separate emails, and cfbot will then pick up the latest one of them which is only patch 0002 (at least I'm fairly sure that's how it works).
I don't know how to do that with git-send-email, but you can certainly do it easy with git-format-patch and just attach them using your regular MUA.
(and while the cfbot and the archives have no problems dealing with the change in subject, it does break threading in some other MUAs, so I would recommend not doing that and sticking to the existing subject of the thread)
> > I don't know how to do that with git-send-email, but you can certainly do it easy with git-format-patch and just attachthem using your regular MUA. > > (and while the cfbot and the archives have no problems dealing with the change in subject, it does break threading in someother MUAs, so I would recommend not doing that and sticking to the existing subject of the thread) > Thanks, here are both patches attached so cfbot can read them.
Attachment
On Fri, Jul 17, 2020 at 10:18 PM Rémi Lapeyre <remi.lapeyre@lenstra.fr> wrote: > > > > > I don't know how to do that with git-send-email, but you can certainly do it easy with git-format-patch and just attachthem using your regular MUA. > > > > (and while the cfbot and the archives have no problems dealing with the change in subject, it does break threading insome other MUAs, so I would recommend not doing that and sticking to the existing subject of the thread) > > > > Thanks, here are both patches attached so cfbot can read them. Few comments: Few tests are failing because of hardcoded path: +-- test header matching +CREATE FOREIGN TABLE header_match ("1" int, foo text) SERVER file_server +OPTIONS (format 'csv', filename '/Users/remi/src/postgresql/contrib/file_fdw/data/list1.csv', delimiter ',', header 'match'); +CREATE FOREIGN TABLE header_dont_match (a int, foo text) SERVER file_server +OPTIONS (format 'csv', filename '/Users/remi/src/postgresql/contrib/file_fdw/data/list1.csv', delimiter ',', header 'match'); -- ERROR Generally path is not specified like this. file_fdw test of make check-world is failing because of this. There is one warning present in the changes: copy.c: In function ‘ProcessCopyOptions’: copy.c:1208:5: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement] char *sval = defGetString(defel); There is space before tab in indent in the below code, check for git diff --check. + if (fldct < list_length(cstate->attnumlist)) + ereport(ERROR, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), + errmsg("missing header"))); Regards, Vignesh EnterpriseDB: http://www.enterprisedb.com
Thanks for the feedback, > There is one warning present in the changes: > copy.c: In function ‘ProcessCopyOptions’: > copy.c:1208:5: warning: ISO C90 forbids mixed declarations and code > [-Wdeclaration-after-statement] > char *sval = defGetString(defel); > Weirdly this is not caught by clang, even with -Wdeclaration-after-statement. Here’s a new version that fix all the issues. Rémi
Attachment
Rémi Lapeyre wrote: > Here’s a new version that fix all the issues. Here's a review of v4. The patch improves COPY in two ways: - COPY TO and COPY FROM now accept "HEADER ON" for the TEXT format (previously it was only for CSV) - COPY FROM also accepts "HEADER match" to tell that there's a header and that its contents must match the columns of the destination table. This works for both the CSV and TEXT formats. The syntax for the columns is the same as for the data and the match is case-sensitive. The first improvement when submitted alone (in 2018 by Simon Muller) has been judged not useful enough or even hazardous without any "match" feature. It was returned with feedback in 2018-10 and resubmitted by Rémi in 2020-02 with the match feature. The patches apply cleanly, "make check" and "make check-world" pass. In my tests it works fine except for one crash that I can reproduce on a fresh build and default configuration with: $ cat >file.txt i 1 $ psql postgres=# create table x(i int); CREATE TABLE postgres=# \copy x(i) from file.txt with (header match) COPY 1 postgres=# \copy x(i) from file.txt with (header match) COPY 1 postgres=# \copy x(i) from file.txt with (header match) COPY 1 postgres=# \copy x(i) from file.txt with (header match) COPY 1 postgres=# \copy x(i) from file.txt with (header match) COPY 1 postgres=# \copy x(i) from file.txt with (header match) PANIC: ERRORDATA_STACK_SIZE exceeded server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. I suspect the reason is the way that PG_TRY/PG_CATCH is used, see below. Code comments: +/* + * Represents whether the header must be absent, present or present and match. + */ +typedef enum CopyHeader +{ + COPY_HEADER_ABSENT, + COPY_HEADER_PRESENT, + COPY_HEADER_MATCH +} CopyHeader; + /* * This struct contains all the state variables used throughout a COPY * operation. For simplicity, we use the same struct for all variants of COPY, @@ -136,7 +146,7 @@ typedef struct CopyStateData bool binary; /* binary format? */ bool freeze; /* freeze rows on loading? */ bool csv_mode; /* Comma Separated Value format? */ - bool header_line; /* CSV or text header line? */ + CopyHeader header_line; /* CSV or text header line? */ After the redefinition into this enum type, there are still a bunch of references to header_line that treat it like a boolean: 1190: if (cstate->header_line) 1398: if (cstate->binary && cstate->header_line) 2119: if (cstate->header_line) 3635: if (cstate->cur_lineno == 0 && cstate->header_line) It works fine since COPY_HEADER_ABSENT is 0 as the first value of the enum, but maybe it's not good style to count on that. + PG_TRY(); + { + if (defGetBoolean(defel)) + cstate->header_line = COPY_HEADER_PRESENT; + else + cstate->header_line = COPY_HEADER_ABSENT; + } + PG_CATCH(); + { + char *sval = defGetString(defel); + + if (!cstate->is_copy_from) + PG_RE_THROW(); + + if (pg_strcasecmp(sval, "match") == 0) + cstate->header_line = COPY_HEADER_MATCH; + else + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("header requires a boolean or \"match\""))); + } + PG_END_TRY(); It seems wrong to use a PG_CATCH block for this. I understand that it's because defGetBoolean() calls ereport() on non-booleans, but then it should be split into an error-throwing function and a non-error-throwing lexical analysis of the boolean, the above code calling the latter. Besides the comments in elog.h above PG_TRY say that "the error recovery code can either do PG_RE_THROW to propagate the error outwards, or do a (sub)transaction abort. Failure to do so may leave the system in an inconsistent state for further processing." Maybe this is what happens with the repeated uses of "match" eventually failing with ERRORDATA_STACK_SIZE exceeded. - HEADER [ <replaceable class="parameter">boolean</replaceable> ] + HEADER { <literal>match</literal> | <literal>true</literal> | <literal>false</literal> } This should be enclosed in square brackets because HEADER with no argument is still accepted. + names from the table. On input, the first line is discarded when set + to <literal>true</literal> or required to match the column names if set The elision of "header" as the subject might be misinterpreted as if it's the first line that is true. I'd suggest "when <literal>header>/literal> is set to ..." to avoid any confusion. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: https://www.manitou-mail.org Twitter: @DanielVerite
Thanks for your comments, Please find my thoughts inline. > In my tests it works fine except for one crash that I can reproduce > on a fresh build and default configuration with: > > $ cat >file.txt > i > 1 > > $ psql > postgres=# create table x(i int); > CREATE TABLE > postgres=# \copy x(i) from file.txt with (header match) > COPY 1 > postgres=# \copy x(i) from file.txt with (header match) > COPY 1 > postgres=# \copy x(i) from file.txt with (header match) > COPY 1 > postgres=# \copy x(i) from file.txt with (header match) > COPY 1 > postgres=# \copy x(i) from file.txt with (header match) > COPY 1 > postgres=# \copy x(i) from file.txt with (header match) > PANIC: ERRORDATA_STACK_SIZE exceeded > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > The connection to the server was lost. Attempting reset: Failed. > Fixed, replaced PG_TRY/PG_CATCH with strcmp logic to get the header option. > > Code comments: > > > +/* > + * Represents whether the header must be absent, present or present and > match. > + */ > +typedef enum CopyHeader > +{ > + COPY_HEADER_ABSENT, > + COPY_HEADER_PRESENT, > + COPY_HEADER_MATCH > +} CopyHeader; > + > /* > * This struct contains all the state variables used throughout a COPY > * operation. For simplicity, we use the same struct for all variants of > COPY, > @@ -136,7 +146,7 @@ typedef struct CopyStateData > bool binary; /* binary format? */ > bool freeze; /* freeze rows on loading? */ > bool csv_mode; /* Comma Separated Value > format? */ > - bool header_line; /* CSV or text header line? */ > + CopyHeader header_line; /* CSV or text header line? */ > > > After the redefinition into this enum type, there are still a > bunch of references to header_line that treat it like a boolean: > > 1190: if (cstate->header_line) > 1398: if (cstate->binary && cstate->header_line) > 2119: if (cstate->header_line) > 3635: if (cstate->cur_lineno == 0 && cstate->header_line) > > It works fine since COPY_HEADER_ABSENT is 0 as the first value of the enum, > but maybe it's not good style to count on that. Fixed. Changed it to cstate->header_line != COPY_HEADER_ABSENT. > > > > + PG_TRY(); > + { > + if (defGetBoolean(defel)) > + cstate->header_line = > COPY_HEADER_PRESENT; > + else > + cstate->header_line = > COPY_HEADER_ABSENT; > + } > + PG_CATCH(); > + { > + char *sval = defGetString(defel); > + > + if (!cstate->is_copy_from) > + PG_RE_THROW(); > + > + if (pg_strcasecmp(sval, "match") == 0) > + cstate->header_line = > COPY_HEADER_MATCH; > + else > + ereport(ERROR, > + > (errcode(ERRCODE_SYNTAX_ERROR), > + errmsg("header requires a > boolean or \"match\""))); > + } > + PG_END_TRY(); > > It seems wrong to use a PG_CATCH block for this. I understand that > it's because defGetBoolean() calls ereport() on non-booleans, but then > it should be split into an error-throwing function and a > non-error-throwing lexical analysis of the boolean, the above code > calling the latter. > Besides the comments in elog.h above PG_TRY say that > "the error recovery code > can either do PG_RE_THROW to propagate the error outwards, or do a > (sub)transaction abort. Failure to do so may leave the system in an > inconsistent state for further processing." > Maybe this is what happens with the repeated uses of "match" > eventually failing with ERRORDATA_STACK_SIZE exceeded. > Fixed, replaced PG_TRY/PG_CATCH with strcmp logic to get the header option. > > - HEADER [ <replaceable class="parameter">boolean</replaceable> ] > + HEADER { <literal>match</literal> | <literal>true</literal> | > <literal>false</literal> } > > This should be enclosed in square brackets because HEADER > with no argument is still accepted. > Fixed. > > > > + names from the table. On input, the first line is discarded when set > + to <literal>true</literal> or required to match the column names if > set > > The elision of "header" as the subject might be misinterpreted as if > it's the first line that is true. I'd suggest > "when <literal>header>/literal> is set to ..." to avoid any confusion. > Fixed. Attached v5 patch with the fixes of above comments. Thoughts? Regards, Vignesh EnterpriseDB: http://www.enterprisedb.com
Attachment
Thanks Daniel for the review and Vignesh for addressing the comments. I have two remarks with the state of the current patches: - DefGetCopyHeader() duplicates a lot of code from defGetBoolean(), should we refactor this so that they can share more oftheir internals? In the current implementation any change to defGetBoolean() should be made to DefGetCopyHeader() too ortheir behaviour will subtly differ. - It is possible to set the header option multiple time: \copy x(i) from file.txt with (format csv, header off, header on); In which case the last one is the one kept. I think this is a bug and it should be fixed, but this is already the behaviourin the current implementation so fixing it would not be backward compatible. Do you think users should not do thisand I can fix it or that keeping the current behaviour is better for backward compatibility? Regards, Rémi > Le 17 août 2020 à 14:49, vignesh C <vignesh21@gmail.com> a écrit : > > Thanks for your comments, Please find my thoughts inline. > >> In my tests it works fine except for one crash that I can reproduce >> on a fresh build and default configuration with: >> >> $ cat >file.txt >> i >> 1 >> >> $ psql >> postgres=# create table x(i int); >> CREATE TABLE >> postgres=# \copy x(i) from file.txt with (header match) >> COPY 1 >> postgres=# \copy x(i) from file.txt with (header match) >> COPY 1 >> postgres=# \copy x(i) from file.txt with (header match) >> COPY 1 >> postgres=# \copy x(i) from file.txt with (header match) >> COPY 1 >> postgres=# \copy x(i) from file.txt with (header match) >> COPY 1 >> postgres=# \copy x(i) from file.txt with (header match) >> PANIC: ERRORDATA_STACK_SIZE exceeded >> server closed the connection unexpectedly >> This probably means the server terminated abnormally >> before or while processing the request. >> The connection to the server was lost. Attempting reset: Failed. >> > > Fixed, replaced PG_TRY/PG_CATCH with strcmp logic to get the header option. > >> >> Code comments: >> >> >> +/* >> + * Represents whether the header must be absent, present or present and >> match. >> + */ >> +typedef enum CopyHeader >> +{ >> + COPY_HEADER_ABSENT, >> + COPY_HEADER_PRESENT, >> + COPY_HEADER_MATCH >> +} CopyHeader; >> + >> /* >> * This struct contains all the state variables used throughout a COPY >> * operation. For simplicity, we use the same struct for all variants of >> COPY, >> @@ -136,7 +146,7 @@ typedef struct CopyStateData >> bool binary; /* binary format? */ >> bool freeze; /* freeze rows on loading? */ >> bool csv_mode; /* Comma Separated Value >> format? */ >> - bool header_line; /* CSV or text header line? */ >> + CopyHeader header_line; /* CSV or text header line? */ >> >> >> After the redefinition into this enum type, there are still a >> bunch of references to header_line that treat it like a boolean: >> >> 1190: if (cstate->header_line) >> 1398: if (cstate->binary && cstate->header_line) >> 2119: if (cstate->header_line) >> 3635: if (cstate->cur_lineno == 0 && cstate->header_line) >> >> It works fine since COPY_HEADER_ABSENT is 0 as the first value of the enum, >> but maybe it's not good style to count on that. > > Fixed. Changed it to cstate->header_line != COPY_HEADER_ABSENT. > >> >> >> >> + PG_TRY(); >> + { >> + if (defGetBoolean(defel)) >> + cstate->header_line = >> COPY_HEADER_PRESENT; >> + else >> + cstate->header_line = >> COPY_HEADER_ABSENT; >> + } >> + PG_CATCH(); >> + { >> + char *sval = defGetString(defel); >> + >> + if (!cstate->is_copy_from) >> + PG_RE_THROW(); >> + >> + if (pg_strcasecmp(sval, "match") == 0) >> + cstate->header_line = >> COPY_HEADER_MATCH; >> + else >> + ereport(ERROR, >> + >> (errcode(ERRCODE_SYNTAX_ERROR), >> + errmsg("header requires a >> boolean or \"match\""))); >> + } >> + PG_END_TRY(); >> >> It seems wrong to use a PG_CATCH block for this. I understand that >> it's because defGetBoolean() calls ereport() on non-booleans, but then >> it should be split into an error-throwing function and a >> non-error-throwing lexical analysis of the boolean, the above code >> calling the latter. >> Besides the comments in elog.h above PG_TRY say that >> "the error recovery code >> can either do PG_RE_THROW to propagate the error outwards, or do a >> (sub)transaction abort. Failure to do so may leave the system in an >> inconsistent state for further processing." >> Maybe this is what happens with the repeated uses of "match" >> eventually failing with ERRORDATA_STACK_SIZE exceeded. >> > > Fixed, replaced PG_TRY/PG_CATCH with strcmp logic to get the header option. > >> >> - HEADER [ <replaceable class="parameter">boolean</replaceable> ] >> + HEADER { <literal>match</literal> | <literal>true</literal> | >> <literal>false</literal> } >> >> This should be enclosed in square brackets because HEADER >> with no argument is still accepted. >> > > Fixed. > >> >> >> >> + names from the table. On input, the first line is discarded when set >> + to <literal>true</literal> or required to match the column names if >> set >> >> The elision of "header" as the subject might be misinterpreted as if >> it's the first line that is true. I'd suggest >> "when <literal>header>/literal> is set to ..." to avoid any confusion. >> > > Fixed. > > Attached v5 patch with the fixes of above comments. > Thoughts? > > Regards, > Vignesh > EnterpriseDB: http://www.enterprisedb.com > <v5-0001-Add-header-support-to-COPY-TO-text-format.patch><v5-0002-Add-header-matching-mode-to-COPY-FROM.patch>
On Thu, Aug 27, 2020 at 04:53:11PM +0200, Rémi Lapeyre wrote: > I have two remarks with the state of the current patches: > - DefGetCopyHeader() duplicates a lot of code from defGetBoolean(), > should we refactor this so that they can share more of their > internals? In the current implementation any change to > defGetBoolean() should be made to DefGetCopyHeader() too or their > behaviour will subtly differ. The difference comes from the use of "match", and my take would be here that it is wrong to assume that header can be a boolean-like parameter with only one exception. It seems to me that we may actually be looking at having this stuff as an option different than "header" at the end to have clear semantics. > - It is possible to set the header option multiple time: > \copy x(i) from file.txt with (format csv, header off, header on); > In which case the last one is the one kept. I think this is a bug > and it should be fixed, but this is already the behaviour in the > current implementation so fixing it would not be backward > compatible. Do you think users should not do this and I can fix it > or that keeping the current behaviour is better for backward > compatibility? I would agree that this is a bug because we are failing to detect what's actually a redundant option here as the first option still causes the flag to be set to false, but that's not something worth a back-patch IMO. What we are looking here is something similar to what is done with "format", where we track if the option has been specified with format_specified. The same is actually true with the "freeze" option here, and it is true that we tend to prefer error-ing in such cases while there are exceptions like EXPLAIN. I think that it would be nicer to be at least consistent with the behavior that each command has chosen, and COPY is now a mixed bag. I have marked the patch as returned with feedback for now. -- Michael
Attachment
> I would agree that this is a bug because we are failing to detect > what's actually a redundant option here as the first option still > causes the flag to be set to false, but that's not something worth a > back-patch IMO. What we are looking here is something similar > to what is done with "format", where we track if the option has been > specified with format_specified. The same is actually true with the > "freeze" option here, and it is true that we tend to prefer error-ing > in such cases while there are exceptions like EXPLAIN. I think that > it would be nicer to be at least consistent with the behavior that > each command has chosen, and COPY is now a mixed bag. Here’s a new version of the patches that report an error when the options are set multiple time. Regards, Rémi
Attachment
On Sat, Oct 03, 2020 at 11:42:52PM +0200, Rémi Lapeyre wrote: > Here’s a new version of the patches that report an error when the options are set multiple time. Please note that I have applied a fix for the redundant option handling as of 10c5291, though I have missed that you sent a patch. Sorry about that. Looking at it, we have done the same thing byte-by-byte except that I have added tests for all option combinations. -- Michael
Attachment
Thanks Michael for taking care of that! Here’s the rebased patches with the last one dropped. Regards, Rémi > Le 5 oct. 2020 à 03:05, Michael Paquier <michael@paquier.xyz> a écrit : > > On Sat, Oct 03, 2020 at 11:42:52PM +0200, Rémi Lapeyre wrote: >> Here’s a new version of the patches that report an error when the options are set multiple time. > > Please note that I have applied a fix for the redundant option > handling as of 10c5291, though I have missed that you sent a patch. > Sorry about that. Looking at it, we have done the same thing > byte-by-byte except that I have added tests for all option > combinations. > -- > Michael
Attachment
It looks like this is not in the current commitfest and that Cabot does not find it. I’m not yet accustomed to the PostgreSQLworkflow, should I just create a new entry in the current commitfest? Regards, Rémi > Le 13 oct. 2020 à 14:49, Rémi Lapeyre <remi.lapeyre@lenstra.fr> a écrit : > > Thanks Michael for taking care of that! > > Here’s the rebased patches with the last one dropped. > > Regards, > Rémi > > > <v6-0001-Add-header-support-to-COPY-TO-text-format.patch><v6-0002-Add-header-matching-mode-to-COPY-FROM.patch> > >> Le 5 oct. 2020 à 03:05, Michael Paquier <michael@paquier.xyz> a écrit : >> >> On Sat, Oct 03, 2020 at 11:42:52PM +0200, Rémi Lapeyre wrote: >>> Here’s a new version of the patches that report an error when the options are set multiple time. >> >> Please note that I have applied a fix for the redundant option >> handling as of 10c5291, though I have missed that you sent a patch. >> Sorry about that. Looking at it, we have done the same thing >> byte-by-byte except that I have added tests for all option >> combinations. >> -- >> Michael >
Rémi Lapeyre wrote: > It looks like this is not in the current commitfest and that Cabot does not > find it. I’m not yet accustomed to the PostgreSQL workflow, should I just > create a new entry in the current commitfest? Yes. Because in the last CommitFest it was marked as "Returned with feedback" https://commitfest.postgresql.org/29/2504/ Best regards, -- Daniel Vérité PostgreSQL-powered mailer: https://www.manitou-mail.org Twitter: @DanielVerite
Hi, here’s a rebased version of the patch. Best regards, Rémi > On 21 Oct 2020, at 19:49, Daniel Verite <daniel@manitou-mail.org> wrote: > > Rémi Lapeyre wrote: > >> It looks like this is not in the current commitfest and that Cabot does not >> find it. I’m not yet accustomed to the PostgreSQL workflow, should I just >> create a new entry in the current commitfest? > > Yes. Because in the last CommitFest it was marked > as "Returned with feedback" > https://commitfest.postgresql.org/29/2504/ > > > Best regards, > -- > Daniel Vérité > PostgreSQL-powered mailer: https://www.manitou-mail.org > Twitter: @DanielVerite > > > >
Attachment
> > Michael, since the issue of duplicated options has been fixed do either of these patches look like they are ready for commit? > Here’s a rebased version of the patch. Cheers, Rémi > Regards, > -- > -David > david@pgmasters.net
Attachment
On Sat, Apr 10, 2021 at 4:17 PM Rémi Lapeyre <remi.lapeyre@lenstra.fr> wrote:
>
> Michael, since the issue of duplicated options has been fixed do either of these patches look like they are ready for commit?
>
Here’s a rebased version of the patch.
Cheers,
Rémi
> Regards,
> --
> -David
> david@pgmasters.net
Hi,
>> sure it matches what is expected and exit immediatly if it does not.
Typo: immediately
+CREATE FOREIGN TABLE header_dont_match (a int, foo text) SERVER file_server
nit: since header is singular, you can name the table header_doesnt_match
+ from the one expected, or the name or case do not match, the copy will
For 'the name or case do not match', either use plural for the subjects or change 'do' to doesn't
- opts_out->header_line = defGetBoolean(defel);
+ opts_out->header_line = DefGetCopyHeader(defel);
+ opts_out->header_line = DefGetCopyHeader(defel);
Existing method starts with lower case d, I wonder why the new method starts with upper case D.
+ if (fldct < list_length(cstate->attnumlist))
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("missing header")));
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("missing header")));
The message seems to be inaccurate: the header may be there - it just misses some fields.
+ * Represents whether the header must be absent, present or present and match.
present and match: it seems present is redundant - if header is absent, how can it match ?
Cheers
> > Hi, > >> sure it matches what is expected and exit immediatly if it does not. > > Typo: immediately > > +CREATE FOREIGN TABLE header_dont_match (a int, foo text) SERVER file_server > > nit: since header is singular, you can name the table header_doesnt_match > > + from the one expected, or the name or case do not match, the copy will > > For 'the name or case do not match', either use plural for the subjects or change 'do' to doesn't > Thanks, I fixed both typos. > - opts_out->header_line = defGetBoolean(defel); > + opts_out->header_line = DefGetCopyHeader(defel); > > Existing method starts with lower case d, I wonder why the new method starts with upper case D. > I don’t remember why I used DefGetCopyHeader, should I change it? > + if (fldct < list_length(cstate->attnumlist)) > + ereport(ERROR, > + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), > + errmsg("missing header"))); > > The message seems to be inaccurate: the header may be there - it just misses some fields. I changed the error messages, they now are: ERROR: incomplete header, expected 3 columns but got 2 ERROR: extra data after last expected header, expected 3 columns but got 4 > > + * Represents whether the header must be absent, present or present and match. > > present and match: it seems present is redundant - if header is absent, how can it match ? This now reads "Represents whether the header must be absent, present or match.”. Cheers, Rémi > > Cheers
Attachment
On Sun, Apr 11, 2021 at 4:01 AM Rémi Lapeyre <remi.lapeyre@lenstra.fr> wrote:
>
> Hi,
> >> sure it matches what is expected and exit immediatly if it does not.
>
> Typo: immediately
>
> +CREATE FOREIGN TABLE header_dont_match (a int, foo text) SERVER file_server
>
> nit: since header is singular, you can name the table header_doesnt_match
>
> + from the one expected, or the name or case do not match, the copy will
>
> For 'the name or case do not match', either use plural for the subjects or change 'do' to doesn't
>
Thanks, I fixed both typos.
> - opts_out->header_line = defGetBoolean(defel);
> + opts_out->header_line = DefGetCopyHeader(defel);
>
> Existing method starts with lower case d, I wonder why the new method starts with upper case D.
>
I don’t remember why I used DefGetCopyHeader, should I change it?
> + if (fldct < list_length(cstate->attnumlist))
> + ereport(ERROR,
> + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
> + errmsg("missing header")));
>
> The message seems to be inaccurate: the header may be there - it just misses some fields.
I changed the error messages, they now are:
ERROR: incomplete header, expected 3 columns but got 2
ERROR: extra data after last expected header, expected 3 columns but got 4
>
> + * Represents whether the header must be absent, present or present and match.
>
> present and match: it seems present is redundant - if header is absent, how can it match ?
This now reads "Represents whether the header must be absent, present or match.”.
Cheers,
Rémi
>
> Cheers
>> This now reads "Represents whether the header must be absent, present or match.”.
Since match shouldn't be preceded with be, I think we can say:
Represents whether the header must match, be absent or be present.
Cheers
> > >> This now reads "Represents whether the header must be absent, present or match.”. > > Since match shouldn't be preceded with be, I think we can say: > > Represents whether the header must match, be absent or be present. Thanks, here’s a v10 version of the patch that fixes this.
Attachment
Here’s an updated version of the patch that takes into account the changes in d1029bb5a2. The actual code is the same asv10 which was already marked as ready for committer. > On 11 Apr 2021, at 16:35, Rémi Lapeyre <remi.lapeyre@lenstra.fr> wrote: > >> >>>> This now reads "Represents whether the header must be absent, present or match.”. >> >> Since match shouldn't be preceded with be, I think we can say: >> >> Represents whether the header must match, be absent or be present. > > Thanks, here’s a v10 version of the patch that fixes this. > > <v10-0001-Add-header-support-to-COPY-TO-text-format.patch><v10-0002-Add-header-matching-mode-to-COPY-FROM.patch>
Attachment
On 31.12.21 18:36, Rémi Lapeyre wrote: > Here’s an updated version of the patch that takes into account the changes in d1029bb5a2. The actual code is the same asv10 which was already marked as ready for committer. I have committed the 0001 patch. I will work on the 0002 patch next. I notice in the 0002 patch that there is no test case for the error "wrong header for column \"%s\": got \"%s\"", which I think is really the core functionality of this patch. So please add that. I wonder whether the header matching should be a separate option from the HEADER option. The option parsing in this patch is quite complicated and could be simpler if there were two separate options. It appears this has been mentioned in the thread but not fully discussed.
> On 28 Jan 2022, at 09:57, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote: > > On 31.12.21 18:36, Rémi Lapeyre wrote: >> Here’s an updated version of the patch that takes into account the changes in d1029bb5a2. The actual code is the sameas v10 which was already marked as ready for committer. > > I have committed the 0001 patch. I will work on the 0002 patch next. > Thanks! > I notice in the 0002 patch that there is no test case for the error "wrong header for column \"%s\": got \"%s\"", whichI think is really the core functionality of this patch. So please add that. > I added a test for it in this new version of the patch. > I wonder whether the header matching should be a separate option from the HEADER option. The option parsing in this patchis quite complicated and could be simpler if there were two separate options. It appears this has been mentioned inthe thread but not fully discussed. I suppose a new option could be added but I’m not sure it would simplify things much with regard to the code and in my opinionit would be a bit weirder for users, right now it is just: copy my_table from stdin with (header match); with an additional option it could be: copy my_table from stdin with (header true, match); with potentially “header true” being implicit when “match” is given: copy my_table from stdin with (match); But I think we would still have to check for and return an error if the user inputs: copy my_table from stdin with (header off, match); Rather than complicating things, the current implementation seemed to be the best but I will update the patch if you thinkI should change it. Best regards, Rémi
Attachment
On 30.01.22 23:56, Rémi Lapeyre wrote: >> I notice in the 0002 patch that there is no test case for the error "wrong header for column \"%s\": got \"%s\"", whichI think is really the core functionality of this patch. So please add that. >> > > I added a test for it in this new version of the patch. The file_fdw.sql tests contain this +CREATE FOREIGN TABLE header_doesnt_match (a int, foo text) SERVER file_server +OPTIONS (format 'csv', filename :'filename', delimiter ',', header 'match'); -- ERROR but no actual error is generated. Please review the additions on the file_fdw tests to see that they make sense.
On 31.01.22 07:54, Peter Eisentraut wrote: > On 30.01.22 23:56, Rémi Lapeyre wrote: >>> I notice in the 0002 patch that there is no test case for the error >>> "wrong header for column \"%s\": got \"%s\"", which I think is really >>> the core functionality of this patch. So please add that. >>> >> >> I added a test for it in this new version of the patch. > > The file_fdw.sql tests contain this > > +CREATE FOREIGN TABLE header_doesnt_match (a int, foo text) SERVER > file_server > +OPTIONS (format 'csv', filename :'filename', delimiter ',', header > 'match'); -- ERROR > > but no actual error is generated. Please review the additions on the > file_fdw tests to see that they make sense. A few more comments on your latest patch: - The DefGetCopyHeader() function seems very bulky and might not be necessary. I think you can just check for the string "match" first and then use defGetBoolean() as before if it didn't match. - If you define COPY_HEADER_ABSENT = 0 in the enum, then most of the existing truth checks don't need to be changed. - In NextCopyFromRawFields(), it looks like you have code that replaces the null_print value if the supplied column name is null. I don't think we should allow null column values. Someone, this should be an error. (Do we use null_print on output and make the column name null if it matches?)
Great to see the first of the two patches committed. It looks like the second patch got some feedback from Peter in February and has been marked "Waiting on author" ever since. Remi, will you have a chance to look at this this month? Peter, are these comments blocking if Remi doesn't have a chance to work on it should I move it to the next release or could it be fixed up and committed?
Peter Eisentraut wrote: > - The DefGetCopyHeader() function seems very bulky and might not be > necessary. I think you can just check for the string "match" first and > then use defGetBoolean() as before if it didn't match. The problem is that defGetBoolean() ends like this in the non-matching case: ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), errmsg("%s requires a Boolean value", def->defname))); We don't want this error message when the string does not match since it's really a tri-state option, not a boolean. To avoid duplicating the code that recognizes true/false/on/off/0/1, probably defGetBoolean()'s guts should be moved into another function that does the matching and report to the caller instead of throwing an error. Then DefGetCopyHeader() could call that non-throwing function. > - If you define COPY_HEADER_ABSENT = 0 in the enum, then most of the > existing truth checks don't need to be changed. It's already 0 by default. Not changing some truth checks does work, but then we get some code that treat CopyFromState.header_line like a boolean and some other code like an enum, which I found not ideal wrt clarity in an earlier round of review [1] It's not a major issue though, as it concerns only 3 lines of code in the v12 patch: - if (opts_out->binary && opts_out->header_line) + if (opts_out->binary && opts_out->header_line != COPY_HEADER_ABSENT) + /* on input check that the header line is correct if needed */ + if (cstate->cur_lineno == 0 && cstate->opts.header_line != COPY_HEADER_ABSENT) - if (cstate->opts.header_line) + if (cstate->opts.header_line != COPY_HEADER_ABSENT) > - In NextCopyFromRawFields(), it looks like you have code that replaces > the null_print value if the supplied column name is null. I don't think > we should allow null column values. Someone, this should be an error. +1 [1] https://www.postgresql.org/message-id/80a9b594-01d6-4ee4-a612-9ae9fd3c1194%40manitou-mail.org Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite
On 25.03.22 05:21, Greg Stark wrote: > Great to see the first of the two patches committed. > > It looks like the second patch got some feedback from Peter in > February and has been marked "Waiting on author" ever since. > > Remi, will you have a chance to look at this this month? > > Peter, are these comments blocking if Remi doesn't have a chance to > work on it should I move it to the next release or could it be fixed > up and committed? I will try to finish up this patch.
On 29.03.22 17:02, Peter Eisentraut wrote: > On 25.03.22 05:21, Greg Stark wrote: >> Great to see the first of the two patches committed. >> >> It looks like the second patch got some feedback from Peter in >> February and has been marked "Waiting on author" ever since. >> >> Remi, will you have a chance to look at this this month? >> >> Peter, are these comments blocking if Remi doesn't have a chance to >> work on it should I move it to the next release or could it be fixed >> up and committed? > > I will try to finish up this patch. Committed, after some further refinements as discussed.
Hi, On Wed, Mar 30, 2022 at 09:11:09AM +0200, Peter Eisentraut wrote: > > Committed, after some further refinements as discussed. While working on nearby code, I found some problems with this feature. First, probably nitpicking, the HEADER MATCH is allowed for COPY TO, is that expected? The documentation isn't really explicit about it, but there's nothing to match when exporting data it's a bit surprising. I'm not opposed to have HEADER MATCH means HEADER ON for COPY TO, as as-is one can easily reuse the commands history, but maybe it should be clearly documented? Then, apparently HEADER MATCH doesn't let you do sanity checks against a custom column list. This one looks like a clear oversight, as something like that should be entirely valid IMHO: CREATE TABLE tbl(col1 int, col2 int); COPY tbl (col2, col1) TO '/path/to/file' WITH (HEADER MATCH); COPY tbl (col2, col1) FROM '/path/to/file' WITH (HEADER MATCH); but right now it errors out with: ERROR: column name mismatch in header line field 1: got "col1", expected "col2" Note that the error message is bogus if you specify attributes in a different order from the relation, as the code is mixing access to the tuple desc and access to the raw fields with the same offset. This also means that it will actually fail to detect a mismatch in the provided column list and let you import data in the wrong position as long as the datatypes are compatible and the column header in the file are in the correct order. For instance: CREATE TABLE abc (a text, b text, c text); INSERT INTO abc SELECT 'a', 'b', 'c'; COPY abc TO '/path/to/file' WITH (HEADER MATCH); You can then import the data with any of those: COPY abc(c, b, a) TO '/path/to/file' WITH (HEADER MATCH); COPY abc(c, a, b) TO '/path/to/file' WITH (HEADER MATCH); [...] SELECT * FROM abc; Even worse, if you try to do a COPY ... FROM ... WITH (HEADER ON) on a table that has some dropped attribute(s). The current code will access random memory as there's no exact attnum / raw field mapping anymore. I can work on a fix if needed (with some additional regression test to cover those cases), but I'm still not sure that having a user provided column list is supposed to be accepted or not for the HEADER MATCH. In the meantime I will add an open item.
On 2022-06-07 Tu 11:47, Julien Rouhaud wrote: > Hi, > > On Wed, Mar 30, 2022 at 09:11:09AM +0200, Peter Eisentraut wrote: >> Committed, after some further refinements as discussed. > While working on nearby code, I found some problems with this feature. > > First, probably nitpicking, the HEADER MATCH is allowed for COPY TO, is that > expected? The documentation isn't really explicit about it, but there's > nothing to match when exporting data it's a bit surprising. I'm not opposed to > have HEADER MATCH means HEADER ON for COPY TO, as as-is one can easily reuse > the commands history, but maybe it should be clearly documented? I think it makes more sense to have a sanity check to prevent HEADER MATCH with COPY TO. > > Then, apparently HEADER MATCH doesn't let you do sanity checks against a custom > column list. This one looks like a clear oversight, as something like that > should be entirely valid IMHO: > > CREATE TABLE tbl(col1 int, col2 int); > COPY tbl (col2, col1) TO '/path/to/file' WITH (HEADER MATCH); > COPY tbl (col2, col1) FROM '/path/to/file' WITH (HEADER MATCH); > > but right now it errors out with: > > ERROR: column name mismatch in header line field 1: got "col1", expected "col2" > > Note that the error message is bogus if you specify attributes in a > different order from the relation, as the code is mixing access to the tuple > desc and access to the raw fields with the same offset. > > This also means that it will actually fail to detect a mismatch in the provided > column list and let you import data in the wrong position as long as the > datatypes are compatible and the column header in the file are in the correct > order. For instance: > > CREATE TABLE abc (a text, b text, c text); > INSERT INTO abc SELECT 'a', 'b', 'c'; > COPY abc TO '/path/to/file' WITH (HEADER MATCH); > > You can then import the data with any of those: > COPY abc(c, b, a) TO '/path/to/file' WITH (HEADER MATCH); > COPY abc(c, a, b) TO '/path/to/file' WITH (HEADER MATCH); > [...] > SELECT * FROM abc; > > Even worse, if you try to do a COPY ... FROM ... WITH (HEADER ON) on a table > that has some dropped attribute(s). The current code will access random memory > as there's no exact attnum / raw field mapping anymore. Ouch! That certainly needs to be fixed. > > I can work on a fix if needed (with some additional regression test to cover > those cases), but I'm still not sure that having a user provided column list is > supposed to be accepted or not for the HEADER MATCH. In the meantime I will > add an open item. > > I think it should, but a temporary alternative would be to forbid HEADER MATCH with explicit column lists until we can make it work right. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Hi, On Sun, Jun 12, 2022 at 09:36:13AM -0400, Andrew Dunstan wrote: > > On 2022-06-07 Tu 11:47, Julien Rouhaud wrote: > > > > First, probably nitpicking, the HEADER MATCH is allowed for COPY TO, is that > > expected? The documentation isn't really explicit about it, but there's > > nothing to match when exporting data it's a bit surprising. I'm not opposed to > > have HEADER MATCH means HEADER ON for COPY TO, as as-is one can easily reuse > > the commands history, but maybe it should be clearly documented? > > > I think it makes more sense to have a sanity check to prevent HEADER > MATCH with COPY TO. I'm fine with it. I added such a check and mentioned it in the documentation. > > Then, apparently HEADER MATCH doesn't let you do sanity checks against a custom > > column list. This one looks like a clear oversight, as something like that > > should be entirely valid IMHO: > > > > CREATE TABLE tbl(col1 int, col2 int); > > COPY tbl (col2, col1) TO '/path/to/file' WITH (HEADER MATCH); > > COPY tbl (col2, col1) FROM '/path/to/file' WITH (HEADER MATCH); > > > > but right now it errors out with: > > > > ERROR: column name mismatch in header line field 1: got "col1", expected "col2" > > > > Note that the error message is bogus if you specify attributes in a > > different order from the relation, as the code is mixing access to the tuple > > desc and access to the raw fields with the same offset. > > [...] > I think it should, but a temporary alternative would be to forbid HEADER > MATCH with explicit column lists until we can make it work right. I think it would still be problematic if the target table has dropped columns. Fortunately, as I initially thought the problem is only due to a thinko in the original commit which used a wrong variable for the raw_fields offset. Once fixed (attached v1) I didn't see any other problem in the rest of the logic and all the added regression tests work as expected.
Attachment
On 13.06.22 04:32, Julien Rouhaud wrote: >> I think it makes more sense to have a sanity check to prevent HEADER >> MATCH with COPY TO. > > I'm fine with it. I added such a check and mentioned it in the documentation. > I think it would still be problematic if the target table has dropped columns. > Fortunately, as I initially thought the problem is only due to a thinko in the > original commit which used a wrong variable for the raw_fields offset. Once > fixed (attached v1) I didn't see any other problem in the rest of the logic and > all the added regression tests work as expected. Thanks for this patch. I'll check it in detail in a bit. It looks good to me at first glance.
On Mon, Jun 13, 2022 at 10:32:13AM +0800, Julien Rouhaud wrote: > On Sun, Jun 12, 2022 at 09:36:13AM -0400, Andrew Dunstan wrote: > I'm fine with it. I added such a check and mentioned it in the documentation. An error looks like the right call at this stage of the game. I am not sure what the combination of MATCH with COPY TO would mean, actually. And with the concept of SELECT queries on top of it, the whole idea gets blurrier. > I think it would still be problematic if the target table has dropped columns. > Fortunately, as I initially thought the problem is only due to a thinko in the > original commit which used a wrong variable for the raw_fields offset. Once > fixed (attached v1) I didn't see any other problem in the rest of the logic and > all the added regression tests work as expected. Interesting catch. One thing that I've always found useful when it comes to tests that stress dropped columns is to have tests where we reduce the number of total columns that still exist. An extra thing is to look after ........pg.dropped.N........ a bit, and I would put something in one of the headers. > if (pg_strcasecmp(sval, "match") == 0) > + { > + /* match is only valid for COPY FROM */ > + if (!is_from) > + ereport(ERROR, > + (errcode(ERRCODE_SYNTAX_ERROR), > + errmsg("%s match is only valid for COPY FROM", > + def->defname))); Some nits. I would suggest to reword this error message, like "cannot use \"match\" with HEADER in COPY TO". There is no need for the extra comment, and the error code should be ERRCODE_FEATURE_NOT_SUPPORTED. -- Michael
Attachment
On Mon, Jun 13, 2022 at 04:46:46PM +0900, Michael Paquier wrote: > > Some nits. I would suggest to reword this error message, like "cannot > use \"match\" with HEADER in COPY TO". Agreed. > There is no need for the extra > comment, and the error code should be ERRCODE_FEATURE_NOT_SUPPORTED. Is there any rule for what error code should be used? Maybe that's just me but I understand "not supported" as "this makes sense, but this is currently a limitation that might be lifted later". Here I don't think it can ever make to use MATCH for a COPY TO, apart from ignoring its meaning and accept it as an alias for HEADER ON. But if we don't allow this loose alias now it would just cause trouble to later allow it so having an invalid syntax or something like that sounds more suited.
On 14.06.22 11:13, Julien Rouhaud wrote: >> There is no need for the extra >> comment, and the error code should be ERRCODE_FEATURE_NOT_SUPPORTED. > Is there any rule for what error code should be used? > > Maybe that's just me but I understand "not supported" as "this makes sense, but > this is currently a limitation that might be lifted later". I tend to agree with that interpretation. Also, when you consider the way SQL rules and error codes are set up, errors that are detected during parse analysis should be a subclass of "syntax error or access rule violation".
Julien Rouhaud wrote: > Maybe that's just me but I understand "not supported" as "this makes > sense, but this is currently a limitation that might be lifted > later". Looking at ProcessCopyOptions(), there are quite a few invalid combinations of options that produce ERRCODE_FEATURE_NOT_SUPPORTED currently: - HEADER in binary mode - FORCE_QUOTE outside of csv - FORCE_QUOTE outside of COPY TO - FORCE_NOT_NULL outside of csv - FORCE_NOT_NULL outside of COPY FROM - ESCAPE outside of csv - delimiter appearing in the NULL specification - csv quote appearing in the NULL specification FORCE_QUOTE and FORCE_NOT_NULL are options that only make sense in one direction, so the errors when using these in the wrong direction are comparable to the "HEADER MATCH outside of COPY FROM" error that we want to add. In that sense, ERRCODE_FEATURE_NOT_SUPPORTED would be consistent. The other errors in the list above are more about the format itself, with options that make sense only for one format. So the way we're supposed to understand ERRCODE_FEATURE_NOT_SUPPORTED in these other cases is that such format does not support such feature, but without implying that it's a limitation. Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite
On 15.06.22 13:50, Daniel Verite wrote: > The other errors in the list above are more about the format itself, > with options that make sense only for one format. So the way we're > supposed to understand ERRCODE_FEATURE_NOT_SUPPORTED in these > other cases is that such format does not support such feature, > but without implying that it's a limitation. I don't feel very strongly about this. It makes sense to stay consistent with the existing COPY code.
On Thu, Jun 16, 2022 at 09:24:56AM +0200, Peter Eisentraut wrote: > I don't feel very strongly about this. It makes sense to stay consistent > with the existing COPY code. Yes, my previous argument is based on consistency with the surroundings. I am not saying that this could not be made better, it surely can, but I would recommend to tackle that in a separate patch, and apply that to more areas than this specific one. -- Michael
Attachment
On Mon, Jun 20, 2022 at 09:03:23AM +0900, Michael Paquier wrote: > On Thu, Jun 16, 2022 at 09:24:56AM +0200, Peter Eisentraut wrote: >> I don't feel very strongly about this. It makes sense to stay consistent >> with the existing COPY code. > > Yes, my previous argument is based on consistency with the > surroundings. I am not saying that this could not be made better, it > surely can, but I would recommend to tackle that in a separate patch, > and apply that to more areas than this specific one. Peter, beta2 is planned for next week. Do you think that you would be able to address this open item by the end of this week? If not, and I have already looked at the proposed patch, I can jump in and help. -- Michael
Attachment
On 22.06.22 01:34, Michael Paquier wrote: > On Mon, Jun 20, 2022 at 09:03:23AM +0900, Michael Paquier wrote: >> On Thu, Jun 16, 2022 at 09:24:56AM +0200, Peter Eisentraut wrote: >>> I don't feel very strongly about this. It makes sense to stay consistent >>> with the existing COPY code. >> >> Yes, my previous argument is based on consistency with the >> surroundings. I am not saying that this could not be made better, it >> surely can, but I would recommend to tackle that in a separate patch, >> and apply that to more areas than this specific one. > > Peter, beta2 is planned for next week. Do you think that you would be > able to address this open item by the end of this week? If not, and I > have already looked at the proposed patch, I can jump in and help. The latest patch was posted by you, so I was deferring to you to commit it. Would you like me to do it?
On Wed, Jun 22, 2022 at 12:22:01PM +0200, Peter Eisentraut wrote: > The latest patch was posted by you, so I was deferring to you to commit it. > Would you like me to do it? OK. As this is originally a feature you have committed, I originally thought that you would take care of it, even if I sent a patch. I'll handle that tomorrow then, if that's fine for you, of course. Happy to help. -- Michael
Attachment
On Wed, Jun 22, 2022 at 08:00:15PM +0900, Michael Paquier wrote: > OK. As this is originally a feature you have committed, I originally > thought that you would take care of it, even if I sent a patch. I'll > handle that tomorrow then, if that's fine for you, of course. Happy > to help. And done. Thanks. -- Michael
Attachment
On Thu, Jun 23, 2022 at 01:26:29PM +0900, Michael Paquier wrote: > On Wed, Jun 22, 2022 at 08:00:15PM +0900, Michael Paquier wrote: > > OK. As this is originally a feature you have committed, I originally > > thought that you would take care of it, even if I sent a patch. I'll > > handle that tomorrow then, if that's fine for you, of course. Happy > > to help. > > And done. Thanks. Thanks!