Re: [POC] Fast COPY FROM command for the table with foreign partitions - Mailing list pgsql-hackers

From Amit Langote
Subject Re: [POC] Fast COPY FROM command for the table with foreign partitions
Date
Msg-id CA+HiwqHRqG8X9W6-v-E6d0nw1k6aC7QSMk-SOvv4puOjWsQMmA@mail.gmail.com
Whole thread Raw
In response to Re: [POC] Fast COPY FROM command for the table with foreign partitions  ("Andrey V. Lepikhov" <a.lepikhov@postgrespro.ru>)
Responses Re: [POC] Fast COPY FROM command for the table with foreign partitions
List pgsql-hackers
Hi Andrey,

Thanks for updating the patch.  I will try to take a look later.

On Wed, Jul 22, 2020 at 6:09 PM Andrey V. Lepikhov
<a.lepikhov@postgrespro.ru> wrote:
> On 7/16/20 2:14 PM, Amit Langote wrote:
> > * Why the "In" in these API names?
> >
> > +   /* COPY a bulk of tuples into a foreign relation */
> > +   BeginForeignCopyIn_function BeginForeignCopyIn;
> > +   EndForeignCopyIn_function EndForeignCopyIn;
> > +   ExecForeignCopyIn_function ExecForeignCopyIn;
>
> I used an analogy from copy.c.

Hmm, if we were going to also need *ForeignCopyOut APIs, maybe it
makes sense to have "In" here, but maybe we don't, so how about
leaving out the "In" for clarity?

> > * I see that the remote copy is performed from scratch on every call
> > of postgresExecForeignCopyIn(), but wouldn't it be more efficient to
> > send the `COPY remote_table FROM STDIN` in
> > postgresBeginForeignCopyIn() and end it in postgresEndForeignCopyIn()
> > when there are no errors during the copy?
>
> It is not possible. FDW share one connection between all foreign
> relations from a server. If two or more partitions will be placed at one
> foreign server you will have problems with concurrent COPY command.

Ah, you're right.  I didn't consider multiple foreign partitions
pointing to the same server.  Indeed, we would need separate
connections to a given server to COPY to multiple remote relations on
that server in parallel.

> May be we can create new connection for each partition?

Yeah, perhaps, although it sounds like something that might be more
generally useful and so we should work on that separately if at all.

> > I tried implementing these two changes -- pgfdw_copy_data_dest_cb()
> > and sending `COPY remote_table FROM STDIN` only once instead of on
> > every flush -- and I see significant speedup.  Please check the
> > attached patch that applies on top of yours.
>
> I integrated first change and rejected the second by the reason as above.

Thanks.

Will send more comments after reading the v5 patch.

-- 
Amit Langote
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Konstantin Knizhnik
Date:
Subject: Re: [Patch] Optimize dropping of relation buffers using dlist
Next
From: Michael Paquier
Date:
Subject: Re: Ought to use heap_multi_insert() for pg_attribute/depend insertions?