Re: Parallel INSERT (INTO ... SELECT ...) - Mailing list pgsql-hackers

From Amit Langote
Subject Re: Parallel INSERT (INTO ... SELECT ...)
Date
Msg-id CA+HiwqGbOFjhf_Rv7pid2kWCCfXuZpnmbB1zp5Pd2VmXYzCMNg@mail.gmail.com
Whole thread Raw
In response to RE: Parallel INSERT (INTO ... SELECT ...)  ("tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com>)
List pgsql-hackers
On Wed, Feb 10, 2021 at 5:52 PM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:
> From: Amit Langote <amitlangote09@gmail.com>
> > Just to be clear, I'm not suggesting that we should put effort into
> > making INSERT ... VALUES run in parallel.  I'm just raising my concern
> > about embedding the assumption in max_parallel_hazard() that it will
> > never make sense to do so.
>
> I'm sorry I misunderstood your suggestion.  So, you're suggesting that it may be better to place the VALUES existence
checkoutside max_parallel_hazard().  (I may be a bit worried if I may misunderstanding in a different way.)
 

To add context to my comments, here's the block of code in the patch I
was commenting on:

+   /*
+    * If there is no underlying SELECT, a parallel table-modification
+    * operation is not possible (nor desirable).
+    */
+   hasSubQuery = false;
+   foreach(lc, parse->rtable)
+   {
+       rte = lfirst_node(RangeTblEntry, lc);
+       if (rte->rtekind == RTE_SUBQUERY)
+       {
+           hasSubQuery = true;
+           break;
+       }
+   }
+   if (!hasSubQuery)
+       return PROPARALLEL_UNSAFE;

For a modification query, this makes max_parallel_hazard() return that
it is unsafe to parallelize the query because it doesn't contain a
subquery RTE, or only contains a VALUES RTE.

I was trying to say that inside max_parallel_hazard() seems to be a
wrong place to reject parallelism for modification if only because
there are no subquery RTEs in the query.  Although now I'm thinking
that maybe it's okay as long as it's appropriately placed.  I shared
one suggestion in my reply to Amit K's email.

> The description of max_parallel_hazard() gave me an impression that this is the right place to check VALUES, because
itsrole can be paraphrased in simpler words like "Tell you if the given Query is safe for parallel execution."
 
>
> In that regard, the standard_planner()'s if conditions that check Query->commandType and Query->hasModifyingCTE can
bemoved into max_parallel_hazard() too.
 
>
> But looking closer to the description, it says "Returns the worst function hazard."  Function hazard?  Should this
functiononly check functions?  Or do we want to modify this description and get max_parallel_hazard() to provide more
service?

Yeah, updating the description to be more general may make sense.

-- 
Amit Langote
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: "Hou, Zhijie"
Date:
Subject: RE: Parallel INSERT (INTO ... SELECT ...)
Next
From: Amit Kapila
Date:
Subject: Re: repeated decoding of prepared transactions