Re: postgres_fdw: Use COPY to speed up batch inserts - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: postgres_fdw: Use COPY to speed up batch inserts |
Date | |
Msg-id | af99c828-ef24-4355-92d5-b5f631cda9c9@vondra.me Whole thread Raw |
In response to | postgres_fdw: Use COPY to speed up batch inserts ("Matheus Alcantara" <matheusssilv97@gmail.com>) |
Responses |
Re: postgres_fdw: Use COPY to speed up batch inserts
|
List | pgsql-hackers |
Hi Matheus, Thanks for the patch. Please add it to the next committfest (PG19-3) at https://commitfest.postgresql.org/ so that we don't lose track of the patch. On 10/15/25 17:02, Matheus Alcantara wrote: > Hi all, > > Currently on postgres_fdw we use prepared statements to insert batches > into foreign tables. Although this works fine for the most use cases the > COPY command can also be used in some scenarios to speed up large batch > inserts. > Makes sense. > The attached patch implements this idea of using the COPY command for > batch inserts on postgres_fdw foreign tables. I've performed some > benchmarks using pgbench and the results seem good to consider this. > Thanks. The code looks sensible in general, I think. I'll have a couple minor comments. It'd be good to also update the documentation, and add some tests to postgres_fdw.sql, to exercise this new code. The sgml docs (doc/src/sgml/postgres-fdw.sgml) mention batch_size, and explain how it's related to the number of parameters in the INSERT state. Which does not matter when using COPY under the hood, so this should be amended/clarified in some way. It doesn't need to be super-detailed, though. A couple minor comments about the code: 1) buildCopySql - Does this really need the "(FORMAT TEXT, DELIMITER ',')" part? AFAIK no, if we use the default copy format in convert_slot_to_copy_text. - Shouldn't we cache the COPY SQL, similarly to how we keep insert SQL? Instead of rebuilding it over and over for every batch. 2) convert_slot_to_copy_text - It's probably better to not hard-code the delimiters etc. - I wonder if the formatting needs to do something more like what copyto.c does (through CopyToTextOneRow and CopyAttributeOutText). Maybe not, not sure. 3) execute_foreign_modify - I think the new block of code is a bit confusing. It essentially does something similar to the original code, but not entirely. I suggest we move it to a new function, and call that from execute_foreign_modify. - I agree it's probably better to do COPYBUFSIZ, or something like that to send the data in smaller (but not tiny) chunks. > I've performed the benchmark using different batch_size values to see > when this optimization could be useful. The following results are the > best tps of 3 runs. > > Command: pgbench -n -c 10 -j 10 -t 100 -f bench.sql postgres > > batch_size: 10 > master tps: 76.360406 > patch tps: 68.917109 > > batch_size: 100 > master tps: 123.427731 > patch tps: 243.737055 > > batch_size: 1000 > master tps: 132.500506 > patch tps: 239.295132 > > It seems that using a batch_size greater than 100 we can have a > considerable speed up for batch inserts. > I did a bunch of benchmarks too, and I see similar speedups (depending on the batch and data size). Attached is the script I used to run this. It runs COPY into a foreign table, pointing to a second instance on the same machine. And it does that for a range of data sizes, batch sizes, client counts and logged/unlogged table. The attached PDF summarizes the results, comparing "copy" build (with this patch) to "master". There's also two "resourceowner" builds, with this patch [1] - that helps COPY in general a lot, and it improves the case with batch_size=1000. I think the results are good, at least with larger batch sizes. The regressions with batch_size=10 on UNLOGGED table are not great, though. I have results from another machine, and there it affects even LOGGED table. I'm not sure if this inherent, or something the patch can fix. In a way, it's not surprising - batching with tiny batches adds the overhead without much benefit. I don't think that kills the patch. > The attached patch uses the COPY command whenever we have a *numSlots > > 1 but the tests show that maybe we should have a GUC to enable this? > I can imagine having a GUC for testing, but it's not strictly necessary. > I also think that we can have a better patch by removing the duplicated > code introduced on this first version, specially on the clean up phase, > but I tried to keep things more simple on this initial phase to keep the > review more easier and also just to test the idea. > > Lastly, I don't know if we should change the EXPLAIN(ANALYZE, VERBOSE) > output for batch inserts that use the COPY to mention that we are > sending the COPY command to the remote server. I guess so? > Good point. We definitely should not show SQL for INSERT, when we're actually running a COPY. regards [1] https://www.postgresql.org/message-id/84f20db7-7d57-4dc0-8144-7e38e0bbe75d%40vondra.me -- Tomas Vondra
Attachment
pgsql-hackers by date: