Re: postgres_fdw: Use COPY to speed up batch inserts - Mailing list pgsql-hackers
| From | Masahiko Sawada |
|---|---|
| Subject | Re: postgres_fdw: Use COPY to speed up batch inserts |
| Date | |
| Msg-id | CAD21AoBhVS1R8pzvMePsp4ngRTvwOeM0qTkqFmFOj_F9LGNFuw@mail.gmail.com Whole thread Raw |
| In response to | Re: postgres_fdw: Use COPY to speed up batch inserts ("Matheus Alcantara" <matheusssilv97@gmail.com>) |
| List | pgsql-hackers |
On Tue, Nov 18, 2025 at 2:13 PM Matheus Alcantara <matheusssilv97@gmail.com> wrote: > > On Mon Nov 17, 2025 at 11:03 PM -03, Masahiko Sawada wrote: > > IIUC the performance regression occurs when users insert many rows > > into a foreign table with batch_size = 1 and use_copy_for_insert = > > true (tps: 133.451518 vs. 132.644801 vs. 88.998804). Since batch_size > > defaults to 1, users might experience performance issues if they > > enable use_copy_for_insert without adjusting the batch_size. I'm > > worried that users could easily find themselves in this situation. > > > Yes, you are correct. The 0002 patch aims to reduce this issue by using > the COPY command only if the use_copy_for_insert = true and if > batch_size > 1 which will reduce the cases but the regression can still > happen if the user send a single row to insert into a foreign table. > > Inserting a single row into a foreign table using COPY is a bit slower > compared with using INSERT. See the followinw pgbench results: > > (Single row using INSERT) > tps = 19814.535944 > > (Single row using COPY) > tps = 16562.324025 > > I think that the documentation should mention that just changing > use_copy_for_insert without also changing the batch_size option could > cause performance regression. > > > One possible solution would be to introduce a threshold, like > > copy_min_row, which would specify the minimum number of rows needed > > before switching to the COPY command. However, this would require > > coordination with batch_size since having copy_min_row lower than > > batch_size wouldn't make sense. > > > The only problem that I see with this approach is that it would make > EXPLAIN(VERBOSE) and EXPLAIN(ANALYZE, VERBOSE) remote SQL output > different. The user will never know with EXPLAIN (without analyze) if > the COPY will be used or not. Is this a problem or I'm being to much > conservative? I think that's a valid concern. Is it a good idea to show both queries with some additional information (e.g., threshold to switch using COPY command)? > I think that we can do such coordination on postgres_fdw_validator(). > > Also if we decide to go with this idea it seems to me that we would have > to much table options to configure to enable the COPY opitimization, we > would need "copy_min_row", "batch_size" and "use_copy_for_insert". What > about decide to use the COPY command if use_copy_for_insert = true and > the number of rows being inserted is >= batch_size? Sounds like a reasonable idea. > > > Alternatively, when users are using batch insertion (batch_size > 0), > > we could use the COPY command only for full batches and fall back to > > INSERT for partial ones. > > > IIUC in this case we would sent COPY and INSERT statements to the > foreign server for the same execution, for example, if batch_size = 100 > and the user try insert 105 rows into the foreign table we will send a > COPY statement with 100 rows and then an INSERT with the 5 rows > remaining? If that's the case which SQL we should show on Remote SQL > from EXPLAIN(ANALYZE, VERBOSE) output? I think that this can cause some > confusion. > > > BTW I noticed that use_copy_for_insert option doesn't work with COPY > > FROM command. I got the following error with use_copy_for_insert=true > > and batch_size=3: > > > > postgres(1:2546195)=# copy t from '/tmp/a.csv'; -- table 't' is a foreign table. > > ERROR: there is no parameter $1 > > CONTEXT: remote SQL command: INSERT INTO public.t(c) VALUES ($1) > > COPY t > > > Thanks for testing this case. The problem was that I as checking if the > COPY can be used inside create_foreign_modify() that is called by > BeginForeignInsert and also BeginForeignModify() and the COPY can be > used only by the foreign modify path. To fix this issue I've moved the > check to postgresBeginForeignModify(). > > I'm attaching v6 with the following changes: > - I've squashed 0002 into 0001, so now the COPY will only be used if > use_copy_for_insert = true and if batch_size > 1 > - Fix for the bug of COPY FROM a foreign table > - New test case for the COPY bug Thank you for updating the patch! I think one key point in the patch is whether or not it's okay to switch using COPY based on the actual number of tuples inserted. While it should be okay from the performance perspective, it might be an issue that the remote query shown in EXPLAIN (without ANALYZE) might be different from the actual query sent. If there is a way to distinguish the batch insertion between INSERT and COPY in postgres_fdw, it might be a good idea to use COPY command for the remote query only when the COPY FROM comes. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: