Extend COPY FROM with HEADER to skip multiple lines - Mailing list pgsql-hackers

From Shinya Kato
Subject Extend COPY FROM with HEADER to skip multiple lines
Date
Msg-id CAOzEurRPxfzbxqeOPF_AGnAUOYf=Wk0we+1LQomPNUNtyZGBZw@mail.gmail.com
Whole thread Raw
Responses Re: Extend COPY FROM with HEADER to skip multiple lines
List pgsql-hackers
Hi hackers,

I'd like to propose a new feature for the COPY FROM command to allow
skipping multiple header lines when loading data. This enhancement
would enable files with multi-line headers to be loaded without any
preprocessing, which would significantly improve usability.

In real-world scenarios, it's common for data files to contain
multiple header lines, such as file descriptions or column
explanations. Currently, the COPY command cannot load these files
directly, which requires users to preprocess them with tools like sed
or tail.

Although you can use "COPY t FROM PROGRAM 'tail -n +3 /path/to/file'",
some environments do not have the tail command available.
Additionally, this approach requires superuser privileges or
membership in the pg_execute_server_program role.

This feature also has precedent in other major RDBMS:
- MySQL: LOAD DATA ... IGNORE N LINES [1]
- SQL Server: BULK INSERT … WITH (FIRST ROW=N) [2]
- Oracle SQL*Loader: sqlldr … SKIP=N [3]

I have not yet created a patch, but I am willing to implement an
extension for the HEADER option. I would like to discuss the
specification first.

The specification I have in mind is as follows:
- Command: COPY FROM
- Formats: text and csv
- Option syntax: HEADER [ boolean | integer | MATCH] (Extend the
HEADER option to accept an integer value in addition to the existing
boolean and MATCH keywords.)
- Behavior: Let N be the specified integer.
  - If N < 0, raise an error.
  - If N = 0 or 1, same behavior when boolean is specified.
  - If N > 1, skip the first N rows.

Thoughts?

[1] https://dev.mysql.com/doc/refman/8.4/en/load-data.html#load-data-field-line-handling
[2]
https://learn.microsoft.com/en-us/sql/t-sql/statements/bulk-insert-transact-sql?view=sql-server-ver17#firstrow--first_row
[3]
https://docs.oracle.com/en/database/oracle/oracle-database/23/sutil/oracle-sql-loader-commands.html#SUTIL-GUID-84244C46-6AFD-412D-9240-BEB0B5C2718B

--
Best regards,
Shinya Kato
NTT OSS Center



pgsql-hackers by date:

Previous
From: Steven Niu
Date:
Subject: Re: [PATCH] Refactor: Extract XLogRecord info
Next
From: "Hayato Kuroda (Fujitsu)"
Date:
Subject: RE: [PATCH] Proposal: Improvements to PDF stylesheet and table column widths