Re: Parallel INSERT (INTO ... SELECT ...) - Mailing list pgsql-hackers

From Greg Nancarrow
Subject Re: Parallel INSERT (INTO ... SELECT ...)
Date
Msg-id CAJcOf-ecOEtieLSSBZAKRuXc84kYTkG1kWPb6=xawC4MiWG8xA@mail.gmail.com
Whole thread Raw
In response to RE: Parallel INSERT (INTO ... SELECT ...)  ("Hou, Zhijie" <houzj.fnst@cn.fujitsu.com>)
Responses RE: Parallel INSERT (INTO ... SELECT ...)
List pgsql-hackers
On Mon, Feb 1, 2021 at 8:19 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
>
> Hi,
>
> When developing the reloption patch, I noticed some issues in the patch.
>
> 1).
> > - Reduce Insert parallel-safety checks required for some SQL, by noting
> > that the subquery must operate on a relation (check for RTE_RELATION in
> > subquery range-table)
>
> +                       foreach(lcSub, rte->subquery->rtable)
> +                       {
> +                               rteSub = lfirst_node(RangeTblEntry, lcSub);
> +                               if (rteSub->rtekind == RTE_RELATION)
> +                               {
> +                                       hasSubQueryOnRelation = true;
> +                                       break;
> +                               }
> +                       }
> It seems we can not only search RTE_RELATION in rtable,
> because RTE_RELATION may exist in other place like:
>
> ---
> --** explain insert into target select (select * from test);
>         Subplan's subplan
>
> --** with cte as (select * from test) insert into target select * from cte;
>         In query's ctelist.
> ---
>
> May be we should use a walker function [1] to
> search the subquery and ctelist.
>
>
>
> [1]
> static bool
> relation_walker(Node *node)
> {
>         if (node == NULL)
>                 return false;
>
>         else if (IsA(node, RangeTblEntry))
>         {
>                 RangeTblEntry *rte = (RangeTblEntry *) node;
>                 if (rte->rtekind == RTE_RELATION)
>                         return true;
>
>                 return false;
>         }
>
>         else if (IsA(node, Query))
>         {
>                 Query      *query = (Query *) node;
>
>                 /* Recurse into subselects */
>                 return query_tree_walker(query, relation_walker,
>                                                                  NULL, QTW_EXAMINE_RTES_BEFORE);
>         }
>
>         /* Recurse to check arguments */
>         return expression_tree_walker(node,
>                                                                   relation_walker,
>                                                                   NULL);
> }
>

I've had a further look at this, and this walker function is doing a
lot of work recursing the parse tree, and I'm not sure that it
reliably retrieves the information that we;re looking for, for all
cases of different SQL queries. Unless it can be made much more
efficient and specific to our needs, I think we should not try to do
this optimization, because there's too much overhead. Also, keep in
mind that for the current parallel SELECT functionality in Postgres, I
don't see any similar optimization being attempted (and such
optimization should be attempted at the SELECT level). So I don't
think we should be attempting such optimization in this patch (but
could be attempted in a separate patch, just related to current
parallel SELECT functionality).

Regards,
Greg Nancarrow
Fujitsu Australia



pgsql-hackers by date:

Previous
From: "Joel Jacobson"
Date:
Subject: Re: Recording foreign key relationships for the system catalogs
Next
From: "Joel Jacobson"
Date:
Subject: Re: [PATCH] Doc: improve documentation of oid columns that can be zero. (correct version)