Re: Extend COPY FROM with HEADER to skip multiple lines - Mailing list pgsql-hackers
From | Shinya Kato |
---|---|
Subject | Re: Extend COPY FROM with HEADER |
Date | |
Msg-id | CAOzEurQhbN5Y8u-4Ert1UVBiLSyc1+Q-9AqQs032ZhfQyf-0SQ@mail.gmail.com Whole thread Raw |
In response to |
Extend COPY FROM with HEADER |
Responses |
Re: Extend COPY FROM with HEADER |
List | pgsql-hackers |
On Fri, Jun 27, 2025 at 12:03 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
On 2025/06/26 19:12, Shinya Kato wrote:
> On Thu, Jun 26, 2025 at 4:36 PM Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>> wrote:
>
> On 2025/06/26 14:35, Shinya Kato wrote:
> >
> > > > So it seems better for you to implement the patch at first and then
> > > > check the performance overhead etc if necessary.
> > >
> > > Thank you for your advice. I will create a patch.
> >
> > I created a patch.
>
> Thanks for the patch!
>
>
> Thank you for your review.
>
> When I compiled with the patch applied, I got the following warning:
>
> copyfromparse.c:834:20: error: ‘done’ may be used uninitialized [-Werror=maybe-uninitialized]
> 834 | if (done)
> | ^
>
>
> Oh, sorry. I missed it.
> I fixed it to initialize done = false.
>
> + lines are discarded. An integer value greater than 1 is only valid for
> + <command>COPY FROM</command> commands.
>
>
> This might be slightly misleading since 0 is also a valid value for COPY FROM.
>
> That's certainly true. I fixed it below:
> + lines are discarded. An integer value greater than 1 is not allowed for
> + <command>COPY TO</command> commands.
Thanks for fixing that! However, it's odd that the description for COPY TO appears
under the section starting with "On input." Shouldn't it be under the "On output"
section instead?
Also, I think the entire HEADER option section could use a clearer structure.
How about revising it like this?
---------------------
<listitem>
<para>
- Specifies that the file contains a header line with the names of each
- column in the file. On output, the first line contains the column
- names from the table. On input, the first line is discarded when this
- option is set to <literal>true</literal> (or equivalent Boolean value).
- If this option is set to <literal>MATCH</literal>, the number and names
- of the columns in the header line must match the actual column names of
- the table, in order; otherwise an error is raised.
+ On output, if this option is set to <literal>true</literal>
+ (or an equivalent Boolean value), the first line of the output will
+ contain the column names from the table.
+ Integer values <literal>0</literal> and <literal>1</literal> are
+ accepted as Boolean values, but other integers are not allowed for
+ <command>COPY TO</command> commands.
+ </para>
+ <para>
+ On input, if this option is set to <literal>true</literal>
+ (or an equivalent Boolean value), the first line of the input is
+ discarded. If it is set to a non-negative integer, that number of
+ lines are discarded. If the option is set to <literal>MATCH</literal>,
+ the number and names of the columns in the header line must exactly
+ match those of the table, in order; otherwise an error is raised.
+ The <literal>MATCH</literal> value is only valid for
+ <command>COPY FROM</command> commands.
+ </para>
+ <para>
This option is not allowed when using <literal>binary</literal> format.
- The <literal>MATCH</literal> option is only valid for <command>COPY
- FROM</command> commands.
</para>
</listitem>
---------------------
Yes, your documentation is clearer than mine.
Also, similar to the existing "boolean" type explanation in the COPY docs,
we should add a corresponding entry for integer, now that it's accepted by
the HEADER option. The VACUUM docs has a good example of how to phrase this.
Exactly.
> Original error message "%s requires a Boolean value, integer value, or \"match\"" should also be fixed to "non-negative integer"?
Yes, I think that the message
"%s requires a Boolean value, a non-negative integer, or the string \"match\""
is clearer and more accurate.
Thank you, you're right.
-typedef enum CopyHeaderChoice
+typedef struct CopyHeaderOption
{
- COPY_HEADER_FALSE = 0,
- COPY_HEADER_TRUE,
- COPY_HEADER_MATCH,
-} CopyHeaderChoice;
+ bool match; /* header line must match actual names? */
+ int skip_lines; /* number of lines to skip before data */
+} CopyHeaderOption;
<snip>
- CopyHeaderChoice header_line; /* header line? */
+ CopyHeaderOption header; /* header line? */
Wouldn't it be simpler to just use an integer instead of introducing
CopyHeaderOption? We could use 0 for false, 1 for true, n > 1 for skipping lines,
and -1 for MATCH in that integer.
This would simplify the logic in defGetCopyHeaderOption().
I've attached a patch that shows this idea in action. Thought?
I thought about the approach, and had a minor concern about the following inconsistency between the option and its internal implementation:
- When the option is set to "MATCH", header_line becomes -1.
- When the option is set to -1, it's an error.
However, your patch is clear, and it seems we don't need to worry about this concern.
Your patch looks good to me.
Your patch looks good to me.
Best regards,
Shinya Kato
NTT OSS Center
pgsql-hackers by date: