Re: Parallel INSERT (INTO ... SELECT ...) - Mailing list pgsql-hackers

From Greg Nancarrow
Subject Re: Parallel INSERT (INTO ... SELECT ...)
Date
Msg-id CAJcOf-dEYwef0J5N+bdXQDwzTxWLU-josfxbNEUW43kQ-5u42g@mail.gmail.com
Whole thread Raw
In response to Re: Parallel INSERT (INTO ... SELECT ...)  (Antonin Houska <ah@cybertec.at>)
List pgsql-hackers
On Wed, Jan 6, 2021 at 7:39 PM Antonin Houska <ah@cybertec.at> wrote:
>
>
> @@ -252,6 +252,7 @@ set_plan_references(PlannerInfo *root, Plan *plan)
>         PlannerGlobal *glob = root->glob;
>         int                     rtoffset = list_length(glob->finalrtable);
>         ListCell   *lc;
> +       Plan       *finalPlan;
>
>         /*
>          * Add all the query's RTEs to the flattened rangetable.  The live ones
> @@ -302,7 +303,17 @@ set_plan_references(PlannerInfo *root, Plan *plan)
>         }
>
>         /* Now fix the Plan tree */
> -       return set_plan_refs(root, plan, rtoffset);
> +       finalPlan = set_plan_refs(root, plan, rtoffset);
> +       if (finalPlan != NULL && IsA(finalPlan, Gather))
> +       {
> +               Plan       *subplan = outerPlan(finalPlan);
> +
> +               if (IsA(subplan, ModifyTable) && castNode(ModifyTable, subplan)->returningLists != NULL)
> +               {
> +                       finalPlan->targetlist = copyObject(subplan->targetlist);
> +               }
> +       }
> +       return finalPlan;
>  }
>
> I'm not sure if the problem of missing targetlist should be handled here (BTW,
> NIL is the constant for an empty list, not NULL). Obviously this is a
> consequence of the fact that the ModifyTable node has no regular targetlist.
>
> Actually I don't quite understand why (in the current master branch) the
> targetlist initialized in set_plan_refs()
>
>         /*
>          * Set up the visible plan targetlist as being the same as
>          * the first RETURNING list. This is for the use of
>          * EXPLAIN; the executor won't pay any attention to the
>          * targetlist.  We postpone this step until here so that
>          * we don't have to do set_returning_clause_references()
>          * twice on identical targetlists.
>          */
>         splan->plan.targetlist = copyObject(linitial(newRL));
>
> is not used. Instead, ExecInitModifyTable() picks the first returning list
> again:
>
>         /*
>          * Initialize result tuple slot and assign its rowtype using the first
>          * RETURNING list.  We assume the rest will look the same.
>          */
>         mtstate->ps.plan->targetlist = (List *) linitial(node->returningLists);
>
> So if you set the targetlist in create_modifytable_plan() (according to
> best_path->returningLists), or even in create_modifytable_path(), and ensure
> that it gets propagated to the Gather node (generate_gather_pahs currently
> uses rel->reltarget), then you should no longer need to tweak
> setrefs.c. Moreover, ExecInitModifyTable() would no longer need to set the
> targetlist. However I don't guarantee that this is the best approach - some
> planner expert should speak up.
>


I've had a bit closer look at this particular issue.
I can see what you mean about the ModifyTable targetlist (that is set
in set_plan_refs()) getting overwritten by ExecInitModifyTable() -
which also contradicts the comment in set_plan_refs() that claims the
targetlist being set is used by EXPLAIN (which it is not). So the
current Postgres master branch does seem to be broken in that respect.

I did try your suggestion (and also remove my tweak), but I found that
in the T_Gather case of set_plan_refs() it does some processing (see
set_upper_references()) of the current Gather targetlist and subplan's
targetlist (and will then overwrite the Gather targetlist after that),
but in doing that processing it produces the error:
ERROR:  variable not found in subplan target list
I think that one of the fundamental problems is that, up to now,
ModifyTable has always been the top node in a (non-parallel) plan, but
now for Parallel INSERT we have a Gather node with ModifyTable in its
subplan. So the expected order of processing and node configuration
has changed.
For the moment (until perhaps a planner expert DOES speak up) I've
parked my temporary "fix" at the bottom of set_plan_refs(), simply
pointing the Gather node's targetlist to subplan's ModifyTable
targetlist.

    if (nodeTag(plan) == T_Gather)
    {
        Plan       *subplan = plan->lefttree;

        if (IsA(subplan, ModifyTable) &&
                        castNode(ModifyTable, subplan)->returningLists != NIL)
        {
            plan->targetlist = subplan->targetlist;
        }
    }

Regards,
Greg Nancarrow
Fujitsu Australia



pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: Parallel INSERT (INTO ... SELECT ...)
Next
From: John Naylor
Date:
Subject: Re: [POC] verifying UTF-8 using SIMD instructions