Home > mailing lists

Re: postgres_fdw: Use COPY to speed up batch inserts - Mailing list pgsql-hackers

From	Matheus Alcantara
Subject	Re: postgres_fdw: Use COPY to speed up batch inserts
Date	January 29 17:02:45
Msg-id	DG14NFEZD6M8.QIUMWJP9OCA6@gmail.com Whole thread Raw
In response to	Re: postgres_fdw: Use COPY to speed up batch inserts (Masahiko Sawada <sawada.mshk@gmail.com>)
List	pgsql-hackers

Tree view

On Tue Jan 27, 2026 at 4:17 PM -03, Masahiko Sawada wrote:
> On Fri, Jan 2, 2026 at 12:33 PM Matheus Alcantara
> <matheusssilv97@gmail.com> wrote:
>>
>> On Fri Jan 2, 2026 at 5:15 PM -03, Masahiko Sawada wrote:
>> > +
>> > +   /*
>> > +    * Set batch_with_copy_threshold from foreign server/table options. We do
>> > +    * this outside of create_foreign_modify() because we only want to use
>> > +    * COPY as a remote SQL when a COPY FROM on a foreign table is executed or
>> > +    * an insert is being performed on a table partition. In both cases the
>> > +    * BeginForeignInsert fdw routine is called.
>> > +    */
>> > +   fmstate->batch_with_copy_threshold = get_batch_with_copy_threshold(rel);
>> >
>> > Does it mean that we could end up using the COPY method not only when
>> > executing COPY FROM but also when executing INSERT with tuple
>> > routings? If so, how does the EXPLAIN command show the remote SQL?
>> >
>> It meas that we could also use the COPY method to insert rows into a
>> specific table partition that is a foreign table.
>>
>> Let's say that an user execute an INSERT INTO on a partitioned table
>> that has partitions that are postgres_fdw tables, with this patch we
>> could use the COPY method to insert the rows on these partitions. On
>> this scenario we would not have issue with EXPLAIN output because
>> currently we do not show the remote SQL being executed on each partition
>> that is involved on the INSERT statement.
>>
>> If an user execute an INSERT directly into a postgres_fdw table we will
>> use the normal INSERT statement as we use today.
>
> I'm slightly concerned that it could be confusing for users if we use
> the COPY method for the same table based on not only
> batch_with_copy_threshold but also how to INSERT. For example, if we
> insert tuples directly to a leaf partition, we always use INSERT. On
> the other hand, if we insert tuples via its parent table, we would use
> either COPY or INSERT based on the number of tuples and
> batch_with_copy_threshold value. IIUC this behavior stems from FDW API
> design (BeginForeignInsert callback is called only in cases of COPY or
> tuple routing), which users would not be aware of in general. Also,
> inserting tuples directly to a leaf partition is faster in general
> than doing via the parent table, but the COPY method optimization is
> available only in the latter case.

Yeah, I agree that this patch ends up in a land that it could introduce
more confusing than improvements for the user.

> How about making use of COPY method only when users execute a COPY
> command? Which seems more intuitive and a good start. We can
> distinguish BeginForeignInsert called via COPY from called via INSERT
> (tuple routing) by adding a flag to ModifyTableState or by checking if
> the passed resultRelInfo == resultRelInfo->ri_RootResultRelInfo.

This sounds a good idea, it simplify the patch scope a lot. During my
tests I've noticed that ri_RootResultRelInfo is null when it's being
called by CopyFrom(), so on postgresBeginForeignInsert I've included a
check that if it's null it means that it's being executed by a COPY
command and then we could use the COPY command as remote SQL.

Note that using COPY as the remote SQL is not always feasible. If the
remote table has a trigger that modifies the row, and the local foreign
table also has an insert trigger, we cannot capture those changes. While
postgres_fdw typically relies on INSERT ... RETURNING * to synchronize
the TupleTableSlot with remote side effects, the COPY command does not
support a RETURNING clause. Without this synchronization, local triggers
would see the original data rather than the actual values inserted. This
limitation is why the ri_TrigDesc == NULL check is necessary; removing
it causes the "Test a combination of local and remote triggers"
regression test on postgres_fdw.sql to fail.

> Alternative idea (or an improvement) would be to use the COPY method
> whenever the number of buffered tuples exceeds the threshold. It would
> cover more cases. Regarding the issue with EXPLAIN output, we could
> output both queries (INSERT and COPY) with some contexts (e.g., the
> threshold for the COPY method etc).

We could implement this as a future improvement.

--
Matheus Alcantara
EDB: https://www.enterprisedb.com

Attachment

v10-0001-postgres_fdw-Use-COPY-as-remote-SQL-when-possibl.patch

pgsql-hackers by date:

From: Boris Mironov
Date: 29 January, 16:55:36
Subject: Re: Idea to enhance pgbench by more modes to generate data (multi-TXNs, UNNEST, COPY BINARY)

From: Robert Haas
Date: 29 January, 17:10:11
Subject: Re: pg_plan_advice

Re: postgres_fdw: Use COPY to speed up batch inserts - Mailing list pgsql-hackers

Attachment

Previous

Next