Re: Parallel INSERT (INTO ... SELECT ...) - Mailing list pgsql-hackers
From | Greg Nancarrow |
---|---|
Subject | Re: Parallel INSERT (INTO ... SELECT ...) |
Date | |
Msg-id | CAJcOf-ecOEtieLSSBZAKRuXc84kYTkG1kWPb6=xawC4MiWG8xA@mail.gmail.com Whole thread Raw |
In response to | RE: Parallel INSERT (INTO ... SELECT ...) ("Hou, Zhijie" <houzj.fnst@cn.fujitsu.com>) |
Responses |
RE: Parallel INSERT (INTO ... SELECT ...)
|
List | pgsql-hackers |
On Mon, Feb 1, 2021 at 8:19 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote: > > Hi, > > When developing the reloption patch, I noticed some issues in the patch. > > 1). > > - Reduce Insert parallel-safety checks required for some SQL, by noting > > that the subquery must operate on a relation (check for RTE_RELATION in > > subquery range-table) > > + foreach(lcSub, rte->subquery->rtable) > + { > + rteSub = lfirst_node(RangeTblEntry, lcSub); > + if (rteSub->rtekind == RTE_RELATION) > + { > + hasSubQueryOnRelation = true; > + break; > + } > + } > It seems we can not only search RTE_RELATION in rtable, > because RTE_RELATION may exist in other place like: > > --- > --** explain insert into target select (select * from test); > Subplan's subplan > > --** with cte as (select * from test) insert into target select * from cte; > In query's ctelist. > --- > > May be we should use a walker function [1] to > search the subquery and ctelist. > > > > [1] > static bool > relation_walker(Node *node) > { > if (node == NULL) > return false; > > else if (IsA(node, RangeTblEntry)) > { > RangeTblEntry *rte = (RangeTblEntry *) node; > if (rte->rtekind == RTE_RELATION) > return true; > > return false; > } > > else if (IsA(node, Query)) > { > Query *query = (Query *) node; > > /* Recurse into subselects */ > return query_tree_walker(query, relation_walker, > NULL, QTW_EXAMINE_RTES_BEFORE); > } > > /* Recurse to check arguments */ > return expression_tree_walker(node, > relation_walker, > NULL); > } > I've had a further look at this, and this walker function is doing a lot of work recursing the parse tree, and I'm not sure that it reliably retrieves the information that we;re looking for, for all cases of different SQL queries. Unless it can be made much more efficient and specific to our needs, I think we should not try to do this optimization, because there's too much overhead. Also, keep in mind that for the current parallel SELECT functionality in Postgres, I don't see any similar optimization being attempted (and such optimization should be attempted at the SELECT level). So I don't think we should be attempting such optimization in this patch (but could be attempted in a separate patch, just related to current parallel SELECT functionality). Regards, Greg Nancarrow Fujitsu Australia
pgsql-hackers by date: