Re: postgres_fdw: Use COPY to speed up batch inserts - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: postgres_fdw: Use COPY to speed up batch inserts
Date
Msg-id af99c828-ef24-4355-92d5-b5f631cda9c9@vondra.me
Whole thread Raw
In response to postgres_fdw: Use COPY to speed up batch inserts  ("Matheus Alcantara" <matheusssilv97@gmail.com>)
Responses Re: postgres_fdw: Use COPY to speed up batch inserts
List pgsql-hackers
Hi Matheus,

Thanks for the patch. Please add it to the next committfest (PG19-3) at

  https://commitfest.postgresql.org/

so that we don't lose track of the patch.

On 10/15/25 17:02, Matheus Alcantara wrote:
> Hi all,
> 
> Currently on postgres_fdw we use prepared statements to insert batches
> into foreign tables. Although this works fine for the most use cases the
> COPY command can also be used in some scenarios to speed up large batch
> inserts.
> 

Makes sense.

> The attached patch implements this idea of using the COPY command for
> batch inserts on postgres_fdw foreign tables. I've performed some
> benchmarks using pgbench and the results seem good to consider this.
> 

Thanks. The code looks sensible in general, I think. I'll have a couple
minor comments. It'd be good to also update the documentation, and add
some tests to postgres_fdw.sql, to exercise this new code.

The sgml docs (doc/src/sgml/postgres-fdw.sgml) mention batch_size, and
explain how it's related to the number of parameters in the INSERT
state. Which does not matter when using COPY under the hood, so this
should be amended/clarified in some way. It doesn't need to be
super-detailed, though.

A couple minor comments about the code:

1) buildCopySql

- Does this really need the "(FORMAT TEXT, DELIMITER ',')" part? AFAIK
no, if we use the default copy format in convert_slot_to_copy_text.

- Shouldn't we cache the COPY SQL, similarly to how we keep insert SQL?
Instead of rebuilding it over and over for every batch.

2) convert_slot_to_copy_text

- It's probably better to not hard-code the delimiters etc.

- I wonder if the formatting needs to do something more like what
copyto.c does (through CopyToTextOneRow and CopyAttributeOutText). Maybe
not, not sure.

3) execute_foreign_modify

-  I think the new block of code is a bit confusing. It essentially does
something similar to the original code, but not entirely. I suggest we
move it to a new function, and call that from execute_foreign_modify.

- I agree it's probably better to do COPYBUFSIZ, or something like that
to send the data in smaller (but not tiny) chunks.



> I've performed the benchmark using different batch_size values to see
> when this optimization could be useful. The following results are the
> best tps of 3 runs.
> 
> Command: pgbench -n -c 10 -j 10 -t 100 -f bench.sql postgres
> 
> batch_size: 10
>     master tps: 76.360406
>     patch tps: 68.917109
> 
> batch_size: 100
>     master tps: 123.427731
>     patch tps: 243.737055
> 
> batch_size: 1000
>     master tps: 132.500506
>     patch tps: 239.295132
> 
> It seems that using a batch_size greater than 100 we can have a
> considerable speed up for batch inserts.
> 

I did a bunch of benchmarks too, and I see similar speedups (depending
on the batch and data size). Attached is the script I used to run this.
It runs COPY into a foreign table, pointing to a second instance on the
same machine. And it does that for a range of data sizes, batch sizes,
client counts and logged/unlogged table.

The attached PDF summarizes the results, comparing "copy" build (with
this patch) to "master". There's also two "resourceowner" builds, with
this patch [1] - that helps COPY in general a lot, and it improves the
case with batch_size=1000.

I think the results are good, at least with larger batch sizes. The
regressions with batch_size=10 on UNLOGGED table are not great, though.
I have results from another machine, and there it affects even LOGGED
table. I'm not sure if this inherent, or something the patch can fix.

In a way, it's not surprising - batching with tiny batches adds the
overhead without much benefit. I don't think that kills the patch.


> The attached patch uses the COPY command whenever we have a *numSlots >
> 1 but the tests show that maybe we should have a GUC to enable this?
> 

I can imagine having a GUC for testing, but it's not strictly necessary.

> I also think that we can have a better patch by removing the duplicated
> code introduced on this first version, specially on the clean up phase,
> but I tried to keep things more simple on this initial phase to keep the
> review more easier and also just to test the idea.
> 
> Lastly, I don't know if we should change the EXPLAIN(ANALYZE, VERBOSE)
> output for batch inserts that use the COPY to mention that we are
> sending the COPY command to the remote server. I guess so?
> 

Good point. We definitely should not show SQL for INSERT, when we're
actually running a COPY.


regards

[1]
https://www.postgresql.org/message-id/84f20db7-7d57-4dc0-8144-7e38e0bbe75d%40vondra.me

-- 
Tomas Vondra

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Optimize LISTEN/NOTIFY
Next
From: Sami Imseih
Date:
Subject: Re: Preserve index stats during ALTER TABLE ... TYPE ...