Re: postgres_fdw: Use COPY to speed up batch inserts - Mailing list pgsql-hackers
From | Matheus Alcantara |
---|---|
Subject | Re: postgres_fdw: Use COPY to speed up batch inserts |
Date | |
Msg-id | DDO2JEC1LWVS.3SASY07SXE7Y2@gmail.com Whole thread Raw |
In response to | Re: postgres_fdw: Use COPY to speed up batch inserts (Jakub Wartak <jakub.wartak@enterprisedb.com>) |
List | pgsql-hackers |
Thanks for testing and for the comments! On Fri Oct 17, 2025 at 6:28 AM -03, Jakub Wartak wrote: > On Thu, Oct 16, 2025 at 10:42 PM Tomas Vondra <tomas@vondra.me> wrote: >> Thanks for the patch. Please add it to the next committfest (PG19-3) at > > Hi Matheus! same here - thanks for the patch! > >> > The attached patch uses the COPY command whenever we have a *numSlots > >> > 1 but the tests show that maybe we should have a GUC to enable this? >> > >> >> I can imagine having a GUC for testing, but it's not strictly necessary. > > Just note, I've played maybe like 20mins with this patch and it works, > however if we would like to have yet another GUCs then we would need > to enable two of those? (enable batch_size and this hypothetical > `batch_use_copy`?) > I was thinking in a GUC to enable to use the COPY command if the number of rows being ingested during the batch insert is greater than the value configured on this GUC, something like, batch_size_for_copy. But for now I'm starting to think that perhaps we may use the COPY if the batch_size is configured to a number > 1(or make this configurable?). With this it would make more easier to show on EXPLAIN that we will send a COPY command to the remote server instead of INSERT. The currently patch relies on the number of rows being sent to the foreign server to enable the COPY usage or not, and IUUC we don't have this information during EXPLAIN. I wrote more about this on my previous reply [1]. > Some other stuff I've tried to cover: >... > 4. I've found that big rows don't play with COPY feature without > memory limitation, so probably some special handling should be done > here, it's nonsense , but: > > postgres@postgres:1236 : 15836 # INSERT INTO local_t1 (id, t) > SELECT s, repeat(md5(s::text), 10000000) from generate_series(100, > 103) s; > 2025-10-17 11:17:08.742 CEST [15836] LOG: statement: INSERT INTO > local_t1 (id, t) SELECT s, repeat(md5(s::text), 10000000) from > generate_series(100, 103) s; > 2025-10-17 11:17:08.743 CEST [15838] LOG: statement: START > TRANSACTION ISOLATION LEVEL REPEATABLE READ > 2025-10-17 11:17:38.302 CEST [15838] LOG: statement: COPY > public.t1(id, t, counter) FROM STDIN (FORMAT TEXT, DELIMITER ',') > ERROR: string buffer exceeds maximum allowed length (1073741823 bytes) > DETAIL: Cannot enlarge string buffer containing 960000028 bytes > by 320000000 more bytes. > 2025-10-17 11:17:40.213 CEST [15836] ERROR: string buffer exceeds > maximum allowed length (1073741823 bytes) > 2025-10-17 11:17:40.213 CEST [15836] DETAIL: Cannot enlarge > string buffer containing 960000028 bytes by 320000000 more bytes. > 2025-10-17 11:17:40.213 CEST [15836] STATEMENT: INSERT INTO > local_t1 (id, t) SELECT s, repeat(md5(s::text), 10000000) from > generate_series(100, 103) s; > > but then it never wants to finish that backend (constant loop[ in > PQCleanup() or somewhere close to that), server behaves unstable. > Without batch_size set the very same INSERT behaves OK. > On the last version that I sent on [1] I introduce a buffer size limit, and testing this INSERT statement with the latest version seems to fix this issue. Could you please check this too? [1] https://www.postgresql.org/message-id/CAFY6G8ePwjT8GiJX1AK5FDMhfq-sOnny6optgTPg98HQw7oJ0g%40mail.gmail.com -- Matheus Alcantara
pgsql-hackers by date: