Re: [HACKERS] parallelize queries containing initplans - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: [HACKERS] parallelize queries containing initplans |
Date | |
Msg-id | CA+TgmoZkZzg3sCcs67c7MFQBKKLUp1dHgB-wciRpUizRqGHktA@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] parallelize queries containing initplans (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: [HACKERS] parallelize queries containing initplans
|
List | pgsql-hackers |
On Tue, Oct 3, 2017 at 7:33 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: >> - Why do we even need contains_parallel_unsafe_param() and >> is_initplan_is_below_current_query_level() in the first place, anyway? >> I think there's been some discussion of that on this thread, but I'm >> not sure I completely understand it, and the comments in the patch >> don't help me understand why we now need this restriction. > > This is to ensure that initplans are never below Gather node. If we > allow parallelism when initplan is below current query level, (a) then > we need a way to pass the result of initplan from worker to other > workers and to master backend (b) if initplan contains reference to > some parameter above the current query level (undirect correlated > plans), then we need a mechanism to pass such parameters (basically > allow correlated initplans work). Man, this stuff makes my hurt. Leaving aside the details of what this particular patch tries to handle, and with due regard for the possibility that I don't know what I'm talking about, I feel like there are four cases: (1) the InitPlan refers to one or more parameters whose value is computed below the Gather node, (2) the InitPlan refers to one or more parameters whose value is computed above the Gather node, (3) the InitPlan refers to both parameters computed above the Gather and parameters computed below the Gather, and (4) the InitPlan refers to no parameters at all. In cases (1) and (3), we cannot compute the value at the Gather node because some of the necessary information doesn't exist yet. it wouldn't make any sense to try to compute it at that point because each the value on which the InitPlan output depends can be different in each worker and can change at different times in different workers. However, in case (1), it's definitely OK to compute the value in the worker itself, and in case (3), it's OK provided that the necessary parameter values from above the Gather are passed down to the workers. In cases (2) and (4), we can compute the value at the Gather node. If we tried to compute it within individual workers, then we'd run into a few problems: (a) the plan might not be parallel-safe, (b) it would be inefficient to recompute the same value in every worker, and (c) if the expression is volatile, the workers might not all arrive at the same answer. Now, if the plan is parallel-safe, then we could also choose NOT to compute the value at the Gather node and instead let the first participant that needs the value compute it; while it's being computed, it could fence off other participants from embarking on the same computation, and once the computation is done, all participants get the same computed value. But AFAICS all of that is strictly optional: in any case where the InitPlan doesn't depend a value produced below the Gather, it is legal (implementation difficulties aside) to compute it at the Gather node. Having said all that, I think that this patch only wants to handle the subset of cases (2) and (4) where the relevant InitPlan is attached ABOVE the Gather node -- which seems very reasonable, because evaluating an InitPlan at a level of the plan tree above the level at which it is defined sounds like it might be complex. But I still don't quite see why we need these tests. I mean, if we only allow Param references from a set of safe parameter IDs, and the safe parameter IDs include only those IDs that can be generated in a worker, then won't other InitPlans in the workers anyway be ruled out?I assume there won't be an InitPlan whose output Paramis never referenced anywhere -- and if there were, it wouldn't really matter whether a worker thought it could compute it, because there'd be nothing to trigger that computation to actually happen. If I am all mixed up, please help straighten me out. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
pgsql-hackers by date: