Re: postgres_fdw: Use COPY to speed up batch inserts - Mailing list pgsql-hackers

From Matheus Alcantara
Subject Re: postgres_fdw: Use COPY to speed up batch inserts
Date
Msg-id DDO2JEC1LWVS.3SASY07SXE7Y2@gmail.com
Whole thread Raw
In response to Re: postgres_fdw: Use COPY to speed up batch inserts  (Jakub Wartak <jakub.wartak@enterprisedb.com>)
List pgsql-hackers
Thanks for testing and for the comments!

On Fri Oct 17, 2025 at 6:28 AM -03, Jakub Wartak wrote:
> On Thu, Oct 16, 2025 at 10:42 PM Tomas Vondra <tomas@vondra.me> wrote:
>> Thanks for the patch. Please add it to the next committfest (PG19-3) at
>
> Hi Matheus! same here - thanks for the patch!
>
>> > The attached patch uses the COPY command whenever we have a *numSlots >
>> > 1 but the tests show that maybe we should have a GUC to enable this?
>> >
>>
>> I can imagine having a GUC for testing, but it's not strictly necessary.
>
> Just note, I've played maybe like 20mins with this patch and it works,
> however if we would like to have yet another GUCs then we would need
> to enable two of those? (enable batch_size and this hypothetical
> `batch_use_copy`?)
>
I was thinking in a GUC to enable to use the COPY command if the number
of rows being ingested during the batch insert is greater than the value
configured on this GUC, something like, batch_size_for_copy. But for now
I'm starting to think that perhaps we may use the COPY if the batch_size
is configured to a number > 1(or make this configurable?). With this it
would make more easier to show on EXPLAIN that we will send a COPY
command to the remote server instead of INSERT. The currently patch
relies on the number of rows being sent to the foreign server to enable
the COPY usage or not, and IUUC we don't have this information during
EXPLAIN. I wrote more about this on my previous reply [1].

> Some other stuff I've tried to cover:
>...
> 4. I've found that big rows don't play with COPY feature without
> memory limitation, so probably some special handling should be done
> here, it's nonsense , but:
>
>     postgres@postgres:1236 : 15836 # INSERT INTO local_t1 (id, t)
> SELECT s, repeat(md5(s::text), 10000000) from generate_series(100,
> 103) s;
>     2025-10-17 11:17:08.742 CEST [15836] LOG:  statement: INSERT INTO
> local_t1 (id, t) SELECT s, repeat(md5(s::text), 10000000) from
> generate_series(100, 103) s;
>     2025-10-17 11:17:08.743 CEST [15838] LOG:  statement: START
> TRANSACTION ISOLATION LEVEL REPEATABLE READ
>     2025-10-17 11:17:38.302 CEST [15838] LOG:  statement: COPY
> public.t1(id, t, counter) FROM STDIN (FORMAT TEXT, DELIMITER ',')
>     ERROR:  string buffer exceeds maximum allowed length (1073741823 bytes)
>     DETAIL:  Cannot enlarge string buffer containing 960000028 bytes
> by 320000000 more bytes.
>     2025-10-17 11:17:40.213 CEST [15836] ERROR:  string buffer exceeds
> maximum allowed length (1073741823 bytes)
>     2025-10-17 11:17:40.213 CEST [15836] DETAIL:  Cannot enlarge
> string buffer containing 960000028 bytes by 320000000 more bytes.
>     2025-10-17 11:17:40.213 CEST [15836] STATEMENT:  INSERT INTO
> local_t1 (id, t) SELECT s, repeat(md5(s::text), 10000000) from
> generate_series(100, 103) s;
>
>     but then it never wants to finish that backend (constant loop[ in
> PQCleanup() or somewhere close to that), server behaves unstable.
>     Without batch_size set the very same INSERT behaves OK.
>
On the last version that I sent on [1] I introduce a buffer size limit,
and testing this INSERT statement with the latest version seems to fix
this issue. Could you please check this too?

[1] https://www.postgresql.org/message-id/CAFY6G8ePwjT8GiJX1AK5FDMhfq-sOnny6optgTPg98HQw7oJ0g%40mail.gmail.com

--
Matheus Alcantara



pgsql-hackers by date:

Previous
From: Matheus Alcantara
Date:
Subject: Re: postgres_fdw: Use COPY to speed up batch inserts
Next
From: Jelte Fennema-Nio
Date:
Subject: Re: CI: Add task that runs pgindent