Re: [HACKERS] parallelize queries containing initplans - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: [HACKERS] parallelize queries containing initplans
Date
Msg-id CAA4eK1KKhMVKjAQrsUx1Yjh9P1cBcYL0EygFb9-bCSpb-HrAqQ@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] parallelize queries containing initplans  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [HACKERS] parallelize queries containing initplans  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Tue, Nov 14, 2017 at 2:39 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sat, Nov 11, 2017 at 7:19 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> I have tried to follow the practice we have used for param extern
>> params (SerializeParamList is in params.c) and most of the handling of
>> initplans is done in nodeSubplan.c, so I choose to keep the newly
>> added functions there.  However, I think keeping it in execParallel.c
>> is also okay as we do it for serialize plan.
>
> To me it feels like there is only a very loose coupling between this
> code and what's currently in nodeSubplan.c, but maybe I'm wrong.  I am
> also not sure execParallel.c is the perfect place either; we might
> eventually split execParallel.c into multiple files if it accumulates
> too many things that are too different from each other.
>

Another possibility is subselect.c, because it contains initplan
stuff, however, most of the things are related to planning, so not
sure.  I think nodeSubplan.c won't be a bad choice as it is mentioned
in the file header that it is concerned about initplan stuff.

>> I think it would need some work in execution as well because the size
>> won't be fixed in that case for varchar type of params.  We might end
>> up with something different as well.
>
> Hmm, true.  But don't we also have that problem with the patch as
> written?  I mean, didn't we determine at some point that the value of
> an InitPlan can change, if the initplan depends on a parameter that is
> correlated but not directly correlated?  The existence of
> ExecReScanSetParamPlan() seems to suggest that this is indeed
> possible, and that means that the parameter could be reevaluated and
> evaluate, on the second time through, to a value that requires more
> storage than before.  That would be a problem, because
> ExecParallelReInitializeDSM reuses the old DSM.
>

I think this would have been a matter of concern if we use initplans
below Gather/Gather Merge.  The patch uses initplans which are between
current query level and root.  So, I think we don't need to reevaluate
such parameters.  Let us try to see it via example:

                                      QUERY PLAN
----------------------------------------------------------------------------------------Aggregate  (cost=62.08..62.08
rows=1width=8)  InitPlan 1 (returns $1)    ->  Finalize Aggregate  (cost=20.64..20.65 rows=1 width=8)          ->
Gather (cost=20.63..20.64 rows=2 width=8)                Workers Planned: 2                ->  Partial Aggregate
(cost=20.63..20.64rows=1 width=8)                      ->  Parallel Seq Scan on t3  (cost=0.00..18.50
 
rows=850 width=4)  ->  Hash Join  (cost=20.75..41.42 rows=1 width=4)        Hash Cond: (t1.j = t2.j)        ->  Gather
(cost=0.00..20.63rows=10 width=12)              Workers Planned: 2              Params Evaluated: $1              ->
ParallelSeq Scan on t1  (cost=0.00..20.63 rows=4 width=12)                    Filter: (k = $1)        ->  Hash
(cost=20.63..20.63rows=10 width=8)              ->  Gather  (cost=0.00..20.63 rows=10 width=8)
WorkersPlanned: 2                    Params Evaluated: $1                    ->  Parallel Seq Scan on t2
(cost=0.00..20.63
rows=4 width=8)                          Filter: (k = $1)
(20 rows)


Now, here even if initplan would have been an undirect correlated
plan, it wouldn't have been a problem, because we don't need to
reevaluate it for Gather node below Hash.

Am I missing something?  Do you have some test or shape of the plan in
mind which can cause a problem?

>>>  I broke a lot of long lines in your version of
>>> the patch into multiple lines; please try to be attentive to this
>>> issue when writing patches in general, as it is a bit tedious to go
>>> through and insert line breaks in many places.
>>
>> Okay, but I sometimes rely on pgindent for such things as for few
>> things it becomes difficult to decide which way it will be better.
>
> pgindent doesn't in general break long lines.  There are a few cases
> where it does, like reflowing comments.  But mostly you have to do
> that manually.  For instance, if you write a =
> verylongfunctionname(longargument(thing), stuff(thing), foobar(thing,
> thing, thing)) it's going to keep all that on one line.  If you insert
> line breaks between the arguments it will keep them, though.  I think
> it's worth doing something like:
>
> git diff | awk 'length($0)>81'
>
> on your patches before you submit them.  If you see things in there
> that can be reformatted to make the long lines shorter, do it.
>

Okay, point noted.

>> I have fixed the first two in attached patch and left the last one as
>> I was not sure what you have in mind
>
> Oh, yeah, that should be changed too, something like "Serialize
> PARAM_EXEC parameters."  Sorry, I thought I caught all of the
> references to initplans specifically, but I guess I missed a few.
>

No issues, I can go through it once more to change the comments
wherever initplans is used after we settle on the above discussion.
If we decide that initplan and other PARAM_EXEC params need different
treatment, then we might want to retain initplans in the comments.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: [HACKERS] ginInsertCleanup called from vacuum could still misstuples to be deleted
Next
From: Andrea Adami
Date:
Subject: Re: [HACKERS] Row Level Security Bug ?