Re: [POC] Fast COPY FROM command for the table with foreign partitions - Mailing list pgsql-hackers

From Amit Langote
Subject Re: [POC] Fast COPY FROM command for the table with foreign partitions
Date
Msg-id CA+HiwqFxJksYQtGhje-VozoC3GYxyOtv5QxNb3fJ5O1FHOB-+g@mail.gmail.com
Whole thread Raw
In response to RE: [POC] Fast COPY FROM command for the table with foreign partitions  ("tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com>)
Responses RE: [POC] Fast COPY FROM command for the table with foreign partitions  ("tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com>)
List pgsql-hackers
On Thu, Nov 26, 2020 at 11:42 AM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:
> From: Amit Langote <amitlangote09@gmail.com>
> > Anyway, one thing we could do is rename
> > ExecRelationAllowsMultiInsert() to ExecSetRelationUsesMultiInsert(),
> > that is, to make it actually set ri_usesMultiInsert and have places
> > like CopyFrom() call it if (and only if) its local logic allows
> > multi-insert to be used.  So, ri_usesMultiInsert starts out set to
> > false and if a module wants to use multi-insert for a given target
> > relation, it calls ExecSetRelationUsesMultiInsert() to turn the flag
> > on.  Also, given the confusion regarding how execPartition.c
>
> I think separating the setting and inspection of the property into different functions will be good, at least.
>
> > manipulates the flag, maybe change ExecFindPartition() to accept a
> > Boolean parameter multi_insert, which it will pass down to
> > ExecInitPartitionInfo(), which in turn will call
> > ExecSetRelationUsesMultiInsert() for a given partition.  Of course, if
> > the logic in ExecSetRelationUsesMultiInsert() determines that
> > multi-insert can't be used, for the reasons listed in the function,
> > then the caller will have to live with that decision.
>
> I can't say for sure, but it looks strange to me, because I can't find a good description of multi_insert argument
forExecFindPartition().  If we add multi_insert, I'm afraid we may want to add further arguments for other properties
inthe future like "Hey, get me the partition that has triggers.", "Next, pass me a partition that uses a foreign
table.",etc.  I think the current ExecFindPartition() is good -- "Get me a partition that accepts this row." 
>
> I wonder if ri_usesMultiInsert is really necessary.  Would it cut down enough costs in the intended use case(s), say
theheavyweight COPY FROM? 

Thinking on this more, I think I'm starting to agree with you on this.
I skimmed the CopyFrom()'s main loop again today and indeed it doesn't
seem that the cost of checking the individual conditions for whether
or not to buffer the current tuple for the given target relation is
all that big to save with ri_usesMultiInsert.  So my argument that it
is good for performance is perhaps not that strong.

Andrey's original patch had the flag to, as I understand it, make the
partitioning case work correctly.  When inserting into a
non-partitioned table, there's only one relation to care about.  In
that case, CopyFrom() can use either the new COPY interface or the
INSERT interface for the entire operation when talking to a foreign
target relation's FDW driver.  With partitions, that has to be
considered separately for each partition.  What complicates the matter
further is that while the original target relation (the root
partitioned table in the partitioning case) is fully initialized in
CopyFrom(), partitions are lazily initialized by ExecFindPartition().
Note that the initialization of a given target relation can also
optionally involve calling the FDW to perform any pre-COPY
initializations.  So if a given partition is a foreign table, whether
the copy operation was initialized using the COPY interface or the
INSERT interface is determined away from CopyFrom().  Andrey created
ri_usesMultiInsert to remember which was used so that CopyFrom() can
use the correct interface during the subsequent interactions with the
partition's driver.

Now, it does not seem outright impossible to do this without the flag,
but maybe Andrey thinks it is good for readability?  If it is
confusing from a modularity standpoint, maybe we should rethink that.
That said, I still think that there should be a way for CopyFrom() to
tell ExecFindPartition() which FDW interface to initialize a given
foreign table partition's copy operation with -- COPY if the copy
allows multi-insert, INSERT if not.  Maybe the multi_insert parameter
I mentioned earlier would serve that purpose.

--
Amit Langote
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Seino Yuki
Date:
Subject: Re: Feature improvement for pg_stat_statements
Next
From: Li Japin
Date:
Subject: Re: Printing LSN made easy