Re: postgres_fdw: Use COPY to speed up batch inserts - Mailing list pgsql-hackers
| From | solaimurugan vellaipandiyan |
|---|---|
| Subject | Re: postgres_fdw: Use COPY to speed up batch inserts |
| Date | |
| Msg-id | CAHEL7KSWJD5AiDDi+06eDVNiv8fA3z+wa393nBt002TPOe=awg@mail.gmail.com Whole thread |
| In response to | Re: postgres_fdw: Use COPY to speed up batch inserts ("Matheus Alcantara" <matheusssilv97@gmail.com>) |
| List | pgsql-hackers |
Hi all, Thank you for the updated Patch. On Thu, May 7, 2026 at 4:44 PM Matheus Alcantara <matheusssilv97@gmail.com> wrote: > > On Wed Apr 1, 2026 at 12:50 PM -03, Matheus Alcantara wrote: > > On Mon Mar 30, 2026 at 4:14 PM -03, Masahiko Sawada wrote: > >>> Please see the new attached version. > >> > >> I've reviewed the v12 patch, and here are review comments: > >> > >> The COPY data sent via postgres_fdw should properly escape the input > >> data. The bug I found can be reproduced with the following scenario: > >> > >> -- on local server > >> create server remote foreign data wrapper postgres_fdw; > >> create foreign table t (a int, b text) server remote; > >> create user mapping for public server remote; > >> > >> -- on remote server > >> create table t (a int, b text); > >> > >> -- on local server > >> copy t(a, d) from stdin; > >> Enter data to be copied followed by a newline. > >> End with a backslash and a period on a line by itself, or an EOF signal. > >>>> 1 hello\nworld > >>>> \. > >> ERROR: invalid input syntax for type integer: "world" > >> CONTEXT: COPY t, line 2, column a: "world" > >> remote SQL command: COPY public.t(a, d) FROM STDIN (FORMAT TEXT) > >> > > > > I think that we need something like CopyAttributeOutText() here. > > > > To fix this I've added appendStringInfoText() which is a similar version > > of CopyAttributeOutText() that works with a StringInfo. I did not find > > any function that I could reuse here, if such function exists please let > > me know. > > > > I'm wondering if we should have this similar function or try to combine > > both to avoid duplicated logic, although it looks complicated to me at > > first look to combine these both usages. > > > > Another option is to use the BINARY format, but it is less portable > > compared to the TEXT format across machines architectures and > > PostgreSQL versions [1]. For CSV we can just wrap the string into ' but > > I think that we can have a performance issue. What do you think? > > > > I've also quickly benchmarked this change and I've got very similar > > execution time, with and without this change. > > > > I've played with changing the format from TEXT to CSV to avoid this > duplicated code and I'm attaching a new version with the results. We > still need a special function to handle the escape but I think that it's > less complicated compared with TEXT. > > I was a bit concerned about the performance so I've executed a benchmark > using pgbench that run a COPY FROM with 100 rows on a foreign table and > I've got the following results: > > Command: pgbench -n -c 10 -j 10 -t 100 -f bench.sql postgres > > batch_size: 10 > patch tps: 5588.402946 > master tps: 4691.619829 > > batch_size: 100 > patch tps: 11834.459579 > master tps: 5578.925053 > > batch_size: 1000 > patch tps: 11181.554907 > master tps: 6452.945124 > > The results looks good, so I think that CSV is a valid format option. > > The patch applied successfully in my branch tree and tested the patch by creating 2 clusters (Local cluster - as the foreign table side & Remote cluster - as the remote table side). I first reproduced the current behavior on an unpatched tree and observed that: COPY ft_emp FROM stdin; resulted in row-by-row remote INSERT execution. After applying the patch and rebuilding along with postgres_fdw, I repeated the same test and observed the following in the remote cluster logs as: COPY public.emp(id, name) FROM STDIN (FORMAT CSV) that indicates the remote COPY optimization path is being used successfully. Local Cluster: postgres=# COPY ft_emp FROM stdin; Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself, or an EOF signal. >> 200 A >> 201 B >> 202 C >> \. COPY 3 Remote Cluster: postgres=# 2026-05-07 16:30:07.800 IST [210202] LOG: statement: START TRANSACTION ISOLATION LEVEL REPEATABLE READ 2026-05-07 16:30:42.973 IST [210202] LOG: statement: COPY public.emp(id, name) FROM STDIN (FORMAT CSV) 2026-05-07 16:30:42.976 IST [210202] LOG: statement: COPY public.emp(id, name) FROM STDIN (FORMAT CSV) 2026-05-07 16:30:42.977 IST [210202] LOG: statement: COPY public.emp(id, name) FROM STDIN (FORMAT CSV) 2026-05-07 16:30:42.977 IST [210202] LOG: statement: COMMIT TRANSACTION Also I tested the behavior with a BEFORE INSERT FOR EACH ROW trigger defined on the remote table. In my testing, one thing I observed is that the remote COPY path was still chosen even with a BEFORE INSERT FOR EACH ROW trigger defined on the remote table and the trigger fired correctly during COPY execution. I wanted to confirm whether this is the expected behavior/design for the optimization path. Otherwise the patch looks good to me from my side. Looking forward to more feedback on this. Regards, Solaimurugan V
pgsql-hackers by date: