Re: Extend COPY FROM with HEADER to skip multiple lines - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: Extend COPY FROM with HEADER to skip multiple lines
Date
Msg-id 23fccda0-fd64-44bf-ad57-16a804996a68@oss.nttdata.com
Whole thread Raw
In response to Extend COPY FROM with HEADER to skip multiple lines  (Shinya Kato <shinya11.kato@gmail.com>)
Responses Re: Extend COPY FROM with HEADER to skip multiple lines
List pgsql-hackers

On 2025/06/26 14:35, Shinya Kato wrote:
>
>  > > So it seems better for you to implement the patch at first and then
>  > > check the performance overhead etc if necessary.
>  >
>  > Thank you for your advice. I will create a patch.
>
> I created a patch.

Thanks for the patch!

> As you can see from the patch, I believe the performance impact is negligible. The only changes were to modify how
theHEADER option is stored and to add a loop that skips the specified number of header lines when parsing the options. 
>
> The design is such that if a HEADER value larger than the number of lines in the file is specified, the command will
completewith zero rows loaded and will not return an error. This is the same behavior as specifying HEADER true for a
CSVfile that has zero rows. 

Sounds good to me.

> And I will add this patch for the next CF.
>
> Thoughts?

When I compiled with the patch applied, I got the following warning:

copyfromparse.c:834:20: error: ‘done’ may be used uninitialized [-Werror=maybe-uninitialized]
   834 |                 if (done)
       |                    ^


+      lines are discarded.  An integer value greater than 1 is only valid for
+      <command>COPY FROM</command> commands.


This might be slightly misleading since 0 is also a valid value for COPY FROM.


+        for (int i = 0; i < cstate->opts.header.skip_lines; i++)
+        {
+            cstate->cur_lineno++;
+            done = CopyReadLine(cstate, is_csv);
+        }

If "done" is true, shouldn't we break out of the loop immediately?
Otherwise, with a large HEADER value, we could end up wasting a lot of
cycles unnecessarily.


+defGetCopyHeaderOption(DefElem *def, bool is_from)
  {
+    CopyHeaderOption option;

To avoid repeating the same code like "option.match = false" in every case,
how about initializing the struct like "option = {false, 1}"?


+                            (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                             errmsg("HEADER must be non-negative integer")));

This message could be confusing since HEADER can also accept a Boolean or "match".
Maybe it's better to use the same message as "%s requires a Boolean value, integer value,
or \"match\"",? "integer value"? If so, it's also better to use "non-negative integer"
instead of just "integer".

Regards,

--
Fujii Masao
NTT DATA Japan Corporation




pgsql-hackers by date:

Previous
From: Tatsuo Ishii
Date:
Subject: PG18 protocol version
Next
From: Jelte Fennema-Nio
Date:
Subject: Re: PG18 protocol version