Home > mailing lists

Re: Extend COPY FROM with HEADER to skip multiple lines - Mailing list pgsql-hackers

From	Fujii Masao
Subject	Re: Extend COPY FROM with HEADER to skip multiple lines
Date	June 9, 2025 11:27:19
Msg-id	608b08b4-b708-49c4-a186-2524b83246dc@oss.nttdata.com Whole thread Raw
In response to	Extend COPY FROM with HEADER to skip multiple lines (Shinya Kato <shinya11.kato@gmail.com>)
Responses	Re: Extend COPY FROM with HEADER to skip multiple lines Re: Extend COPY FROM with HEADER to skip multiple lines
List	pgsql-hackers

Tree view


On 2025/06/09 16:10, Shinya Kato wrote:
> Hi hackers,
> 
> I'd like to propose a new feature for the COPY FROM command to allow
> skipping multiple header lines when loading data. This enhancement
> would enable files with multi-line headers to be loaded without any
> preprocessing, which would significantly improve usability.
> 
> In real-world scenarios, it's common for data files to contain
> multiple header lines, such as file descriptions or column
> explanations. Currently, the COPY command cannot load these files
> directly, which requires users to preprocess them with tools like sed
> or tail.
> 
> Although you can use "COPY t FROM PROGRAM 'tail -n +3 /path/to/file'",
> some environments do not have the tail command available.
> Additionally, this approach requires superuser privileges or
> membership in the pg_execute_server_program role.
> 
> This feature also has precedent in other major RDBMS:
> - MySQL: LOAD DATA ... IGNORE N LINES [1]
> - SQL Server: BULK INSERT … WITH (FIRST ROW=N) [2]
> - Oracle SQL*Loader: sqlldr … SKIP=N [3]
> 
> I have not yet created a patch, but I am willing to implement an
> extension for the HEADER option. I would like to discuss the
> specification first.
> 
> The specification I have in mind is as follows:
> - Command: COPY FROM
> - Formats: text and csv
> - Option syntax: HEADER [ boolean | integer | MATCH] (Extend the
> HEADER option to accept an integer value in addition to the existing
> boolean and MATCH keywords.)
> - Behavior: Let N be the specified integer.
>    - If N < 0, raise an error.
>    - If N = 0 or 1, same behavior when boolean is specified.
>    - If N > 1, skip the first N rows.
> 
> Thoughts?

I generally like the idea.

However, a similar proposal was made earlier [1], and seemingly
some hackers weren't in favor of it. It's probably worth reading
that thread to understand the previous concerns.

Regards,


[1] https://postgr.es/m/CALAY4q8nGSXp0P5uf56vn-mD7reWqZP5k6PS1CGUm26X4FsYJA@mail.gmail.com

-- 
Fujii Masao
NTT DATA Japan Corporation

pgsql-hackers by date:

From: Michael Paquier
Date: 09 June 2025, 11:21:19
Subject: Cleanup gcc trick with varattrib_1b_e in VARATT_EXTERNAL_GET_POINTER()

From: Thomas Munro
Date: 09 June 2025, 11:33:25
Subject: Re: [PING] fallocate() causes btrfs to never compress postgresql files

Re: Extend COPY FROM with HEADER to skip multiple lines - Mailing list pgsql-hackers

Previous

Next