Re: postgres_fdw: Use COPY to speed up batch inserts - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: postgres_fdw: Use COPY to speed up batch inserts
Date
Msg-id CAD21AoCS4nBcJARX4RtaMFQSSTLGGndEB2kho_yPCvJnRwHFdg@mail.gmail.com
Whole thread Raw
In response to Re: postgres_fdw: Use COPY to speed up batch inserts  (Matheus Alcantara <matheusssilv97@gmail.com>)
Responses Re: postgres_fdw: Use COPY to speed up batch inserts
List pgsql-hackers
On Wed, Mar 4, 2026 at 4:17 AM Matheus Alcantara
<matheusssilv97@gmail.com> wrote:
>
> Thank you for the review.
>
> On 03/03/26 16:47, Masahiko Sawada wrote:
> > Thank you for updating the patch! Here are some review comments:
> >
> > +                Form_pg_attribute attr = TupleDescAttr(tupdesc, attnum - 1);
> > +
> > +                if (attr->attgenerated)
> > +                        continue;
> > +
> > +                if (!first)
> > +                        appendStringInfoString(buf, ", ");
> > +
> > +                first = false;
> > +
> > +                appendStringInfoString(buf,
> > quote_identifier(NameStr(attr->attname)));
> >
> > We need to take care of the 'column_name' option here. If it's set we
> > should not use attr->attname as it is.
> >
>
> Fixed
>
> > --
> > +        }
> > +        if (nattrs > 0)
> > +                appendStringInfoString(buf, ") FROM STDIN");
> > +        else
> > +                appendStringInfoString(buf, " FROM STDIN");
> > +}
> >
> > It might be better to explicitly specify the format 'text'.
> >
>
> Fixed
>
> > ---
> > +        if (useCopy)
> > +        {
> > +                deparseCopySql(&sql, rel, targetAttrs);
> > +                values_end_len = 0;            /* Keep compiler quiet */
> > +        }
> > +        else
> > +                deparseInsertSql(&sql, rte, resultRelation, rel,
> > targetAttrs, doNothing,
> > +
> > resultRelInfo->ri_WithCheckOptions,
> > +
> > resultRelInfo->ri_returningList,
> > +                                                 &retrieved_attrs,
> > &values_end_len);
> >
> > I think we should consider whether it's okay to use the COPY command
> > even if resultRelInfo->ri_WithCheckOptions is non-NULL. As far as I
> > researched, it's okay as we currently don't support COPY to a view but
> > please consider it as well. We might want to explain it too in the
> > comment.
> >
>
> Good point, fixed.
>
> > How about initializing values_end_len with 0 at its declaration?
> >
>
> Fixed
>
> > ---
> > +-- test that fdw also use COPY FROM as a remote sql
> > +set client_min_messages to 'log';
> > +
> > +create function insert_or_copy() returns trigger as $$
> > +declare query text;
> > +begin
> > +    query := current_query();
> > +    if query ~* '^COPY' then
> > +        raise notice 'COPY command';
> > +    elsif query ~* '^INSERT' then
> > +        raise notice 'INSERT command';
> > +    end if;
> > +return new;
> > +end;
> > +$$ language plpgsql;
> >
> > On second thoughts, it might be okay to output the current_query() as it is.
> >
>
> Fixed
>
> > ---
> > +copy grem1 from stdin;
> > +3
> > +\.
> >
> > I think it would be good to have more tests, for example, checking if
> > the COPY command method can work properly with batch_size and
> > column_name options.
> >
>
> I've added new test cases for these cases. To test the batch_size case
> I've added an elog(DEBUG1) because using the trigger with
> current_query() log an entry for each row that we send for the foreign
> server, with the elog(DEBUG1) we can expect one log message for each
> batch operation. Please let me know if there is better ways to do this.
>
> Please see the new attached version.

I've reviewed the v12 patch, and here are review comments:

The COPY data sent via postgres_fdw should properly escape the input
data. The bug I found can be reproduced with the following scenario:

-- on local server
create server remote foreign data wrapper postgres_fdw;
create foreign table t (a int, b text) server remote;
create user mapping for public server remote;

-- on remote server
create table t (a int, b text);

-- on local server
copy t(a, d) from stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> 1    hello\nworld
>> \.
ERROR:  invalid input syntax for type integer: "world"
CONTEXT:  COPY t, line 2, column a: "world"
remote SQL command: COPY public.t(a, d) FROM STDIN (FORMAT TEXT)

---
I think the patch misses calling set_transmission_modes() and
reset_transmission_modes() when converting the tuple slots. A
problematic scenario is:

-- on local server
create server remote foreign data wrapper postgres_fdw;
create foreign table f (a float) server remote;
create user mapping for public server remote;

-- on remote server
create table f (a float);

-- on local server
set extra_float_digits = 0;
copy test_float from stdin;
1.0000000000000002
\.
select * from f;
 a
----
  1
(1 row)

---
+
+   appendStringInfo(buf, "COPY ");
+   deparseRelation(buf, rel);
+   if (nattrs > 0)
+       appendStringInfo(buf, "(");

appendStringInfoString() or appendStringInfoChar() should be used instead.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Melanie Plageman
Date:
Subject: Re: Don't synchronously wait for already-in-progress IO in read stream
Next
From: Sami Imseih
Date:
Subject: Re: Refactor query normalization into core query jumbling