Re: Parallel INSERT (INTO ... SELECT ...) - Mailing list pgsql-hackers

From Greg Nancarrow
Subject Re: Parallel INSERT (INTO ... SELECT ...)
Date
Msg-id CAJcOf-e_tzVzF+kp7HDM+UMNrfM+DUL0Ck3boOdp-DDygt5Jww@mail.gmail.com
Whole thread Raw
In response to RE: Parallel INSERT (INTO ... SELECT ...)  ("Hou, Zhijie" <houzj.fnst@cn.fujitsu.com>)
List pgsql-hackers
On Mon, Feb 1, 2021 at 8:19 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
>
>
> When developing the reloption patch, I noticed some issues in the patch.
>
> 1).
> > - Reduce Insert parallel-safety checks required for some SQL, by noting
> > that the subquery must operate on a relation (check for RTE_RELATION in
> > subquery range-table)
>
> +                       foreach(lcSub, rte->subquery->rtable)
> +                       {
> +                               rteSub = lfirst_node(RangeTblEntry, lcSub);
> +                               if (rteSub->rtekind == RTE_RELATION)
> +                               {
> +                                       hasSubQueryOnRelation = true;
> +                                       break;
> +                               }
> +                       }
> It seems we can not only search RTE_RELATION in rtable,
> because RTE_RELATION may exist in other place like:
>
> ---
> --** explain insert into target select (select * from test);
>         Subplan's subplan
>
> --** with cte as (select * from test) insert into target select * from cte;
>         In query's ctelist.
> ---
>
> May be we should use a walker function [1] to
> search the subquery and ctelist.
>

Yes, the current checks are too simple, as you point out, there seem
to be more complex cases that it doesn't pick up. Unfortunately
expanding the testing for them does detract from the original
intention of this code (which was to avoid extra parallel-safety check
processing on code which can't be run in parallel). I guess the
relation walker function should additionally check for SELECT queries
only (commandType == CMD_SELECT), and exclude SELECT FOR UPDATE/SHARE
(rowMarks != NIL) too. I'll need to look further into it, but will
certainly update the code for the next version of the patch.

> 2).
>
> +--
> +-- Test INSERT into temporary table with underlying query.
> +-- (should not use a parallel plan)
> +--
>
> May be the comment here need some change since
> we currently support parallel plan for temp table.
>

Thanks, it should say something like "should create the plan with
INSERT + parallel SELECT".

> 3)
> Do you think we can add a testcase for foreign-table ?
> To test parallel query with serial insert on foreign table.
>

I have intended to do it, but as a lower-priority task.

>
> [1]
> static bool
> relation_walker(Node *node)
> {
>         if (node == NULL)
>                 return false;
>
>         else if (IsA(node, RangeTblEntry))
>         {
>                 RangeTblEntry *rte = (RangeTblEntry *) node;
>                 if (rte->rtekind == RTE_RELATION)
>                         return true;
>
>                 return false;
>         }
>
>         else if (IsA(node, Query))
>         {
>                 Query      *query = (Query *) node;
>
>                 /* Recurse into subselects */
>                 return query_tree_walker(query, relation_walker,
>                                                                  NULL, QTW_EXAMINE_RTES_BEFORE);
>         }
>
>         /* Recurse to check arguments */
>         return expression_tree_walker(node,
>                                                                   relation_walker,
>                                                                   NULL);
> }
>

Regards,
Greg Nancarrow
Fujitsu Australia



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Single transaction in the tablesync worker?
Next
From: Yugo NAGATA
Date:
Subject: Re: [PATCH] Add extra statistics to explain for Nested Loop