Re: postgres_fdw: Use COPY to speed up batch inserts - Mailing list pgsql-hackers
| From | Matheus Alcantara |
|---|---|
| Subject | Re: postgres_fdw: Use COPY to speed up batch inserts |
| Date | |
| Msg-id | DEC5ZS2XP2NS.JQ6DH0KUWNL7@gmail.com Whole thread Raw |
| In response to | Re: postgres_fdw: Use COPY to speed up batch inserts (Masahiko Sawada <sawada.mshk@gmail.com>) |
| List | pgsql-hackers |
On Mon Nov 17, 2025 at 11:03 PM -03, Masahiko Sawada wrote:
> IIUC the performance regression occurs when users insert many rows
> into a foreign table with batch_size = 1 and use_copy_for_insert =
> true (tps: 133.451518 vs. 132.644801 vs. 88.998804). Since batch_size
> defaults to 1, users might experience performance issues if they
> enable use_copy_for_insert without adjusting the batch_size. I'm
> worried that users could easily find themselves in this situation.
>
Yes, you are correct. The 0002 patch aims to reduce this issue by using
the COPY command only if the use_copy_for_insert = true and if
batch_size > 1 which will reduce the cases but the regression can still
happen if the user send a single row to insert into a foreign table.
Inserting a single row into a foreign table using COPY is a bit slower
compared with using INSERT. See the followinw pgbench results:
(Single row using INSERT)
tps = 19814.535944
(Single row using COPY)
tps = 16562.324025
I think that the documentation should mention that just changing
use_copy_for_insert without also changing the batch_size option could
cause performance regression.
> One possible solution would be to introduce a threshold, like
> copy_min_row, which would specify the minimum number of rows needed
> before switching to the COPY command. However, this would require
> coordination with batch_size since having copy_min_row lower than
> batch_size wouldn't make sense.
>
The only problem that I see with this approach is that it would make
EXPLAIN(VERBOSE) and EXPLAIN(ANALYZE, VERBOSE) remote SQL output
different. The user will never know with EXPLAIN (without analyze) if
the COPY will be used or not. Is this a problem or I'm being to much
conservative?
I think that we can do such coordination on postgres_fdw_validator().
Also if we decide to go with this idea it seems to me that we would have
to much table options to configure to enable the COPY opitimization, we
would need "copy_min_row", "batch_size" and "use_copy_for_insert". What
about decide to use the COPY command if use_copy_for_insert = true and
the number of rows being inserted is >= batch_size?
> Alternatively, when users are using batch insertion (batch_size > 0),
> we could use the COPY command only for full batches and fall back to
> INSERT for partial ones.
>
IIUC in this case we would sent COPY and INSERT statements to the
foreign server for the same execution, for example, if batch_size = 100
and the user try insert 105 rows into the foreign table we will send a
COPY statement with 100 rows and then an INSERT with the 5 rows
remaining? If that's the case which SQL we should show on Remote SQL
from EXPLAIN(ANALYZE, VERBOSE) output? I think that this can cause some
confusion.
> BTW I noticed that use_copy_for_insert option doesn't work with COPY
> FROM command. I got the following error with use_copy_for_insert=true
> and batch_size=3:
>
> postgres(1:2546195)=# copy t from '/tmp/a.csv'; -- table 't' is a foreign table.
> ERROR: there is no parameter $1
> CONTEXT: remote SQL command: INSERT INTO public.t(c) VALUES ($1)
> COPY t
>
Thanks for testing this case. The problem was that I as checking if the
COPY can be used inside create_foreign_modify() that is called by
BeginForeignInsert and also BeginForeignModify() and the COPY can be
used only by the foreign modify path. To fix this issue I've moved the
check to postgresBeginForeignModify().
I'm attaching v6 with the following changes:
- I've squashed 0002 into 0001, so now the COPY will only be used if
use_copy_for_insert = true and if batch_size > 1
- Fix for the bug of COPY FROM a foreign table
- New test case for the COPY bug
--
Matheus Alcantara
EDB: http://www.enterprisedb.com
Attachment
pgsql-hackers by date: