Re: [HACKERS] parallelize queries containing initplans - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: [HACKERS] parallelize queries containing initplans
Date
Msg-id CAA4eK1KpwhxYUe1iRi5Q-jWD_1kOpgSaP=zj35OnoWH2MoHVoA@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] parallelize queries containing initplans  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: [HACKERS] parallelize queries containing initplans
List pgsql-hackers
On Fri, Feb 10, 2017 at 4:34 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> I could see two possibilities to determine whether the plan (for which
> we are going to generate an initplan) contains a reference to a
> correlated var param node.  One is to write a plan or path walker to
> determine any such reference and the second is to keep the information
> about the correlated param in path node.   I think the drawback of the
> first approach is that traversing path tree during generation of
> initplan can be costly, so for now I have kept the information in path
> node to prohibit generating parallel initplans which contain a
> reference to correlated vars. I think we can go with first approach of
> using path walker if people feel that is better than maintaining a
> reference in path.  Attached patch
> prohibit_parallel_correl_params_v1.patch implements the second
> approach of keeping the correlated var param reference in path node
> and pq_pushdown_initplan_v2.patch uses that to generate parallel
> initplans.
>

Two weeks back when Robert was in Bangalore, we (myself, Kuntal and
Robert) had a discussion on this patch.   He mentioned that the idea
of pulling up initplans (uncorrelated initplans) at Gather node (and
then execute them and share the values to each worker) used in this
patch doesn't sound appealing and has a chance of bugs in some corner
cases. We discussed an idea where the first worker to access the
initplan will evaluate it and then share the value with other
participating processes, but with that, we won't be able to use
parallelism in the execution of Initplan due to the restriction of
multiple levels of Gather node.  Another idea we discussed is that we
can evaluate the Initplans at Gather node if it is used as an external
param (plan->extParam) at or below the Gather node.

Based on that idea, I have modified the patch such that it will
compute the set of initplans Params that are required below gather
node and store them as bitmap of initplan params at gather node.
During set_plan_references, we can find the intersection of external
parameters that are required at Gather or nodes below it with the
initplans that are passed from same or above query level. Once the set
of initplan params are established, we evaluate those (if they are not
already evaluated) before execution of gather node and then pass the
computed value to each of the workers.   To identify whether a
particular param is parallel safe or not, we check if the paramid of
the param exists in initplans at same or above query level.  We don't
allow to generate gather path if there are initplans at some query
level below the current query level as those plans could be
parallel-unsafe or undirect correlated plans.

This restricts some of the cases for parallelism like when initplans
are below gather node, but the patch looks better. We can open up
those cases if required in a separate patch.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

pgsql-hackers by date:

Previous
From: Rushabh Lathia
Date:
Subject: Re: [HACKERS] Gather Merge
Next
From: Heikki Linnakangas
Date:
Subject: Re: [HACKERS] scram and \password