Re: Add LIMIT option to COPY FROM - Mailing list pgsql-hackers

From Shinya Kato
Subject Re: Add LIMIT option to COPY FROM
Date
Msg-id CAOzEurRhdikO-pFnR+eQHT6Tdi05KHbSYMstPunTrfnrDLedew@mail.gmail.com
Whole thread Raw
In response to Re: Add LIMIT option to COPY FROM  (Florents Tselai <florents.tselai@gmail.com>)
Responses Re: Add LIMIT option to COPY FROM
List pgsql-hackers
Thank you for the comments.

On Tue, Feb 3, 2026 at 11:12 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:
>> Syntax example:
>> - COPY t FROM STDIN (LIMIT 100);
>>
>> This feature is useful for:
>> - Loading only the first N rows from a huge CSV file to verify data or
>> table definitions before a full import
>
>
> Would want it paired with offset for this use case.

COPY FROM already accepts HEADER <integer> to skip N leading lines, so
you can combine it with LIMIT to get OFFSET + LIMIT semantics today:
=# COPY t FROM 'file.csv' (HEADER 100, LIMIT 50);

>> Design:
>> - The LIMIT count applies after WHERE filtering and ON_ERROR skipping,
>> so it represents the actual number of rows inserted.
>
>
> Not sure about that choice.  I’d go with pre-eval or implement both and default to pre-eval.

It is consistent with SQL semantics — SELECT ... WHERE ... LIMIT N
counts rows that pass the filter, not rows scanned. Pre-eval behavior
is already achievable externally (head -n), while post-eval can only
be done server-side, which makes it the more valuable choice for a
built-in option.


On Tue, Feb 3, 2026 at 11:41 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Shinya Kato <shinya11.kato@gmail.com> writes:
> > I'd like to propose adding a LIMIT option to COPY FROM, which limits
> > the number of rows to load.
>
> Do we really need this?  Each random feature we load onto COPY
> slows it down for everybody.

When LIMIT is not specified (the default), the only overhead is a
single if (limit > 0) branch per row on a struct field already in L1
cache — the same class of cost as the existing ON_ERROR and WHERE
checks.

This is not a novel feature either. Oracle SQL*Loader provides LOAD,
and SQL Server BULK INSERT provides LASTROW. The absence of LIMIT in
COPY FROM is arguably a gap relative to other RDBMSes.


On Wed, Feb 4, 2026 at 12:07 AM Florents Tselai
<florents.tselai@gmail.com> wrote:
> I work with such scenarios a lot and I can't see why COPY should worry itself about such filtering.
> IRL most of what goes into COPY'S STDIN  has already been filtered extensively,
> like ... | head -n 100 | COPY t FROM STDIN

head -n works for the STDIN-pipe workflow, but not for all COPY FROM scenarios:
- Server-side files via remote psql: COPY t FROM
'/server/path/file.csv' is read directly by the server process. A
client connected over the network has no way to interpose a pipe on
that I/O path.
- Interaction with WHERE / ON_ERROR: head -n 100 limits input lines,
but cannot guarantee a specific number of inserted rows when some rows
are filtered by WHERE or skipped by ON_ERROR. That control is only
possible server-side.

The same "do it outside" argument could be made against the WHERE
clause ("just use grep"), yet WHERE was accepted because server-side
filtering provides value that external tools cannot fully replicate.
LIMIT fills the same kind of gap.

--
Best regards,
Shinya Kato
NTT OSS Center



pgsql-hackers by date:

Previous
From: Chao Li
Date:
Subject: Re: Use allocation macros in the logical replication code
Next
From: "David G. Johnston"
Date:
Subject: Re: Add LIMIT option to COPY FROM