Re: postgres_fdw: Use COPY to speed up batch inserts - Mailing list pgsql-hackers
| From | Masahiko Sawada |
|---|---|
| Subject | Re: postgres_fdw: Use COPY to speed up batch inserts |
| Date | |
| Msg-id | CAD21AoDFisd4HYcDspqkzoM9sMofba8aBMgB0RkG8uCqt-ZQUg@mail.gmail.com Whole thread Raw |
| In response to | Re: postgres_fdw: Use COPY to speed up batch inserts ("Matheus Alcantara" <matheusssilv97@gmail.com>) |
| List | pgsql-hackers |
On Thu, Nov 6, 2025 at 3:49 PM Matheus Alcantara <matheusssilv97@gmail.com> wrote: > > On Fri Oct 31, 2025 at 4:02 PM -03, I wrote: > > It's showing a bit complicated to decide at runtime if we should use the > > COPY or INSERT for batch insert into a foreign table. Perhaps we could > > add a new option on CREATE FOREIGN TABLE to enable this usage or not? We > > could document the performance improvements and the limitations so the > > user can decide if it should enable or not. > > > Here is v5 that implement this idea. > > On this version I've introduced a foreign table and foreign server > option "use_copy_for_insert" (I'm open for a better name) that enable > the use of the COPY as remote command to execute an INSERT into a > foreign table. The COPY can be used if the user enable this option on > the foreign table or the foreign server and if the original INSERT > statement don't have a RETURNING clause. > > See the benchmark results: > > pgbench -n -c 10 -j 10 -t 100 -f bench.sql postgres > > Master (batch_size = 1 with a single row to insert): > tps = 16000.768037 > > Master (batch_size = 1 with 1000 rows to insert): > tps = 133.451518 > > Master (batch_size = 100 with 1000 rows to insert): > tps = 1274.096347 > > ----------------- > > Patch(batch_size = 1, use_copy_for_insert = false with single row to > insert) > tps = 15734.155705 > > Master (batch_size = 1, use_copy_for_insert = false with 1000 rows to > insert): > tps = 132.644801 > > Master (batch_size = 100, use_copy_for_insert = false with 1000 rows to > insert): > tps = 1245.514591 > > ----------------- > > Patch(batch_size = 1, use_copy_for_insert = true with single row to > insert) > tps = 17604.394057 > > Master (batch_size = 1, use_copy_for_insert = true with 1000 rows to > insert): > tps = 88.998804 > > Master (batch_size = 100, use_copy_for_insert = true with 1000 rows to > insert): > tps = 2406.009249 > > ----------------- > > We can see that when batching inserting with the batch_size configured > properly we have a very significant performance improvement and when the > "use_copy_for_insert" option is disabled the performance are close > compared with master. > > The problem is when the "batch_size" is 1 (default) and > "use_copy_for_insert" is enabled. This is because on this scenario we > are sending multiple COPY commands with a single row to the foreign > server. > > One way to fix this would to decide at runtime (at > execute_foreign_modify()) if the COPY can be used based on the number of > rows being insert. I don't think that I like this option because it > would make the EXPLAIN output different when the ANALYZE option is used > since during planning time we don't have the number of rows being > inserted, so if just EXPLAIN(VERBOSE) is executed we would show the > INSERT as remote SQL, and if the ANALYZE is included and we have enough > rows to enable the COPY usage, the remote SQL would show the COPY > command. > > Since the new "use_copy_for_insert" option is be disabled by default I > think that we could document this limitation and mention the performance > improvements when used correctly with the batch_size option. > > Another option would be to use the COPY command only if the > "use_copy_for_insert" is true and also if the "batch_size" is > 1. We > would still have the performance issue if the user insert a single row > but we would close to less scenarios. The attached 0002 implement this > idea. > > Thoughts? IIUC the performance regression occurs when users insert many rows into a foreign table with batch_size = 1 and use_copy_for_insert = true (tps: 133.451518 vs. 132.644801 vs. 88.998804). Since batch_size defaults to 1, users might experience performance issues if they enable use_copy_for_insert without adjusting the batch_size. I'm worried that users could easily find themselves in this situation. One possible solution would be to introduce a threshold, like copy_min_row, which would specify the minimum number of rows needed before switching to the COPY command. However, this would require coordination with batch_size since having copy_min_row lower than batch_size wouldn't make sense. Alternatively, when users are using batch insertion (batch_size > 0), we could use the COPY command only for full batches and fall back to INSERT for partial ones. BTW I noticed that use_copy_for_insert option doesn't work with COPY FROM command. I got the following error with use_copy_for_insert=true and batch_size=3: postgres(1:2546195)=# copy t from '/tmp/a.csv'; -- table 't' is a foreign table. ERROR: there is no parameter $1 CONTEXT: remote SQL command: INSERT INTO public.t(c) VALUES ($1) COPY t Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: