Re: [POC] Fast COPY FROM command for the table with foreign partitions - Mailing list pgsql-hackers

From Amit Langote
Subject Re: [POC] Fast COPY FROM command for the table with foreign partitions
Date
Msg-id CA+HiwqEZG__hJp2Zp1o5u-brPnF985nKnTqs9ox3PKHy5U=OMw@mail.gmail.com
Whole thread Raw
In response to RE: [POC] Fast COPY FROM command for the table with foreign partitions  ("tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com>)
Responses RE: [POC] Fast COPY FROM command for the table with foreign partitions  ("tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com>)
List pgsql-hackers
Hi,

On Tue, Oct 20, 2020 at 11:31 AM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:
> > > (2)
> > > +   Assert(rri->ri_usesMultiInsert == false);
> > >
> > > As the above assertion represents, I'm afraid the semantics of
> > ExecRelationAllowsMultiInsert() and ResultRelInfo->ri_usesMultiInsert are
> > unclear.  In CopyFrom(), ri_usesMultiInsert is set by also considering the
> > COPY-specific conditions:
> > >
> > > +   if (!cstate->volatile_defexprs &&
> > > +           !contain_volatile_functions(cstate->whereClause) &&
> > > +           ExecRelationAllowsMultiInsert(target_resultRelInfo, NULL))
> > > +           target_resultRelInfo->ri_usesMultiInsert = true;
> > >
> > > On the other hand, in below ExecInitPartitionInfo(), ri_usesMultiInsert is set
> > purely based on the relation's characteristics.
> > >
> > > +   leaf_part_rri->ri_usesMultiInsert =
> > > +           ExecRelationAllowsMultiInsert(leaf_part_rri,
> > rootResultRelInfo);
> > >
> > > In addition to these differences, I think it's a bit confusing that the function
> > itself doesn't record the check result in ri_usesMultiInsert.
> > >
> > > It's probably easy to understand to not add ri_usesMultiInsert, and the
> > function just encapsulates the check logic based solely on the relation
> > characteristics and returns the result.  So, the argument is just one
> > ResultRelInfo.  The caller (e.g. COPY) combines the function result with other
> > specific conditions.
>
> > I can't fully agreed with this suggestion. We do so because in the future anyone
> > can call this code from another subsystem for another purposes. And we want
> > all the relation-related restrictions contains in one routine. CopyState-related
> > restrictions used in copy.c only and taken out of this function.
>
> I'm sorry if I'm misinterpreting you, but I think the following simply serves its role sufficiently and cleanly
withoutusing ri_usesMultiInsert. 
>
> bool
> ExecRelationAllowsMultiInsert(RelationRelInfo *rri)
> {
>         check if the relation allows multiinsert based on its characteristics;
>         return true or false;
> }
>
> I'm concerned that if one subsystem sets ri_usesMultiInsert to true based on its additional specific conditions, it
mightlead to another subsystem's misjudgment.  For example, when subsystem A and B want to do different things
respectively:
>
> [Subsystem A]
> if (ExecRelationAllowsMultiInsert(rri) && {A's conditions})
>         rri->ri_usesMultiInsert = true;
> ...
> if (rri->ri_usesMultiInsert)
>         do A's business;
>
> [Subsystem B]
> if (rri->ri_usesMultiInsert)
>         do B's business;
>
> Here, what if subsystem A and B don't want each other's specific conditions to hold true?  That is, A wants to do A's
businessonly if B's specific conditions don't hold true.  If A sets rri->ri_usesMultiInsert to true and passes rri to
B,then B wrongly does B's business despite that A's specific conditions are true. 
>
> (I think this is due to some form of violation of encapsulation.)

Sorry about chiming in late, but I think Tsunakawa-san raises some
valid concerns.

First, IIUC, is whether we need the ri_usesMultiInsert flag at all.  I
think yes, because computing that information repeatedly for every row
seems wasteful, especially for a bulk operation, and even more so if
we're going to call a function when doing so.

Second is whether the interface for setting ri_usesMultiInsert
encourages situations where different modules could possibly engage in
conflicting behaviors.  I can't think of a real-life example of that
with the current implementation, but maybe the interface provided in
the patch makes it harder to ensure that that remains true in the
future.  Tsunakawa-san, have you encountered an example of this, maybe
when trying to integrate this patch with some other?

Anyway, one thing we could do is rename
ExecRelationAllowsMultiInsert() to ExecSetRelationUsesMultiInsert(),
that is, to make it actually set ri_usesMultiInsert and have places
like CopyFrom() call it if (and only if) its local logic allows
multi-insert to be used.  So, ri_usesMultiInsert starts out set to
false and if a module wants to use multi-insert for a given target
relation, it calls ExecSetRelationUsesMultiInsert() to turn the flag
on.  Also, given the confusion regarding how execPartition.c
manipulates the flag, maybe change ExecFindPartition() to accept a
Boolean parameter multi_insert, which it will pass down to
ExecInitPartitionInfo(), which in turn will call
ExecSetRelationUsesMultiInsert() for a given partition.  Of course, if
the logic in ExecSetRelationUsesMultiInsert() determines that
multi-insert can't be used, for the reasons listed in the function,
then the caller will have to live with that decision.

Any other ideas on how to make this work and look better?

--
Amit Langote
EDB: http://www.enterprisedb.com

[1] https://www.postgresql.org/message-id/d3fbf3bc93b7bcd99ff7fa9ee41e0e20%40postgrespro.ru



pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: [patch] CLUSTER blocks scanned progress reporting
Next
From: Fujii Masao
Date:
Subject: Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module