Re: postgres_fdw: Use COPY to speed up batch inserts - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: postgres_fdw: Use COPY to speed up batch inserts
Date
Msg-id CAD21AoDFisd4HYcDspqkzoM9sMofba8aBMgB0RkG8uCqt-ZQUg@mail.gmail.com
Whole thread Raw
In response to Re: postgres_fdw: Use COPY to speed up batch inserts  ("Matheus Alcantara" <matheusssilv97@gmail.com>)
List pgsql-hackers
On Thu, Nov 6, 2025 at 3:49 PM Matheus Alcantara
<matheusssilv97@gmail.com> wrote:
>
> On Fri Oct 31, 2025 at 4:02 PM -03, I wrote:
> > It's showing a bit complicated to decide at runtime if we should use the
> > COPY or INSERT for batch insert into a foreign table. Perhaps we could
> > add a new option on CREATE FOREIGN TABLE to enable this usage or not? We
> > could document the performance improvements and the limitations so the
> > user can decide if it should enable or not.
> >
> Here is v5 that implement this idea.
>
> On this version I've introduced a foreign table and foreign server
> option "use_copy_for_insert" (I'm open for a better name) that enable
> the use of the COPY as remote command to execute an INSERT into a
> foreign table. The COPY can be used if the user enable this option on
> the foreign table or the foreign server and if the original INSERT
> statement don't have a RETURNING clause.
>
> See the benchmark results:
>
> pgbench -n -c 10 -j 10 -t 100 -f bench.sql postgres
>
> Master (batch_size = 1 with a single row to insert):
> tps = 16000.768037
>
> Master (batch_size = 1 with 1000 rows to insert):
> tps = 133.451518
>
> Master (batch_size = 100 with 1000 rows to insert):
> tps = 1274.096347
>
> -----------------
>
> Patch(batch_size = 1, use_copy_for_insert = false with single row to
> insert)
> tps = 15734.155705
>
> Master (batch_size = 1, use_copy_for_insert = false with 1000 rows to
> insert):
> tps = 132.644801
>
> Master (batch_size = 100, use_copy_for_insert = false with 1000 rows to
> insert):
> tps = 1245.514591
>
> -----------------
>
> Patch(batch_size = 1, use_copy_for_insert = true with single row to
> insert)
> tps = 17604.394057
>
> Master (batch_size = 1, use_copy_for_insert = true with 1000 rows to
> insert):
> tps = 88.998804
>
> Master (batch_size = 100, use_copy_for_insert = true with 1000 rows to
> insert):
> tps = 2406.009249
>
> -----------------
>
> We can see that when batching inserting with the batch_size configured
> properly we have a very significant performance improvement and when the
> "use_copy_for_insert" option is disabled the performance are close
> compared with master.
>
> The problem is when the "batch_size" is 1 (default) and
> "use_copy_for_insert" is enabled. This is because on this scenario we
> are sending multiple COPY commands with a single row to the foreign
> server.
>
> One way to fix this would to decide at runtime (at
> execute_foreign_modify()) if the COPY can be used based on the number of
> rows being insert. I don't think that I like this option because it
> would make the EXPLAIN output different when the ANALYZE option is used
> since during planning time we don't have the number of rows being
> inserted, so if just EXPLAIN(VERBOSE) is executed we would show the
> INSERT as remote SQL, and if the ANALYZE is included and we have enough
> rows to enable the COPY usage, the remote SQL would show the COPY
> command.
>
> Since the new "use_copy_for_insert" option is be disabled by default I
> think that we could document this limitation and mention the performance
> improvements when used correctly with the batch_size option.
>
> Another option would be to use the COPY command only if the
> "use_copy_for_insert" is true and also if the "batch_size" is > 1. We
> would still have the performance issue if the user insert a single row
> but we would close to less scenarios. The attached 0002 implement this
> idea.
>
> Thoughts?

IIUC the performance regression occurs when users insert many rows
into a foreign table with batch_size = 1 and use_copy_for_insert =
true (tps: 133.451518 vs. 132.644801 vs. 88.998804). Since batch_size
defaults to 1, users might experience performance issues if they
enable use_copy_for_insert without adjusting the batch_size. I'm
worried that users could easily find themselves in this situation.

One possible solution would be to introduce a threshold, like
copy_min_row, which would specify the minimum number of rows needed
before switching to the COPY command. However, this would require
coordination with batch_size since having copy_min_row lower than
batch_size wouldn't make sense. Alternatively, when users are using
batch insertion (batch_size > 0), we could use the COPY command only
for full batches and fall back to INSERT for partial ones.

BTW I noticed that use_copy_for_insert option doesn't work with COPY
FROM command. I got the following error with use_copy_for_insert=true
and batch_size=3:

postgres(1:2546195)=# copy t from '/tmp/a.csv'; -- table 't' is a foreign table.
ERROR:  there is no parameter $1
CONTEXT:  remote SQL command: INSERT INTO public.t(c) VALUES ($1)
COPY t

Regards,


--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: Re: [PATCH] Add pg_get_subscription_ddl() function
Next
From: ocean_li_996
Date:
Subject: Re: minor improvement in snapbuild: use existing interface and remove fake code