Re: [HACKERS] parallelize queries containing initplans - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: [HACKERS] parallelize queries containing initplans |
Date | |
Msg-id | CAA4eK1KpwhxYUe1iRi5Q-jWD_1kOpgSaP=zj35OnoWH2MoHVoA@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] parallelize queries containing initplans (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: [HACKERS] parallelize queries containing initplans
|
List | pgsql-hackers |
On Fri, Feb 10, 2017 at 4:34 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: > > I could see two possibilities to determine whether the plan (for which > we are going to generate an initplan) contains a reference to a > correlated var param node. One is to write a plan or path walker to > determine any such reference and the second is to keep the information > about the correlated param in path node. I think the drawback of the > first approach is that traversing path tree during generation of > initplan can be costly, so for now I have kept the information in path > node to prohibit generating parallel initplans which contain a > reference to correlated vars. I think we can go with first approach of > using path walker if people feel that is better than maintaining a > reference in path. Attached patch > prohibit_parallel_correl_params_v1.patch implements the second > approach of keeping the correlated var param reference in path node > and pq_pushdown_initplan_v2.patch uses that to generate parallel > initplans. > Two weeks back when Robert was in Bangalore, we (myself, Kuntal and Robert) had a discussion on this patch. He mentioned that the idea of pulling up initplans (uncorrelated initplans) at Gather node (and then execute them and share the values to each worker) used in this patch doesn't sound appealing and has a chance of bugs in some corner cases. We discussed an idea where the first worker to access the initplan will evaluate it and then share the value with other participating processes, but with that, we won't be able to use parallelism in the execution of Initplan due to the restriction of multiple levels of Gather node. Another idea we discussed is that we can evaluate the Initplans at Gather node if it is used as an external param (plan->extParam) at or below the Gather node. Based on that idea, I have modified the patch such that it will compute the set of initplans Params that are required below gather node and store them as bitmap of initplan params at gather node. During set_plan_references, we can find the intersection of external parameters that are required at Gather or nodes below it with the initplans that are passed from same or above query level. Once the set of initplan params are established, we evaluate those (if they are not already evaluated) before execution of gather node and then pass the computed value to each of the workers. To identify whether a particular param is parallel safe or not, we check if the paramid of the param exists in initplans at same or above query level. We don't allow to generate gather path if there are initplans at some query level below the current query level as those plans could be parallel-unsafe or undirect correlated plans. This restricts some of the cases for parallelism like when initplans are below gather node, but the patch looks better. We can open up those cases if required in a separate patch. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
pgsql-hackers by date: