Re: [sqlsmith] Failed assertion in parallel worker (ExecInitSubPlan) - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: [sqlsmith] Failed assertion in parallel worker (ExecInitSubPlan)
Date
Msg-id CAA4eK1L-Uo=s4=0jvvVA51pj06u5WdDvSQg673yuxJ_Ja+x86Q@mail.gmail.com
Whole thread Raw
In response to Re: [sqlsmith] Failed assertion in parallel worker (ExecInitSubPlan)  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: [sqlsmith] Failed assertion in parallel worker (ExecInitSubPlan)  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Sat, May 7, 2016 at 6:37 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, May 6, 2016 at 8:45 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > Andreas Seltenreich <seltenreich@gmx.de> writes:
> > > when fuzz testing master as of c1543a8, parallel workers trigger the
> > > following assertion in ExecInitSubPlan every couple hours.
> > >     TRAP: FailedAssertion("!(list != ((List *) ((void *)0)))", File: "list.c", Line: 390)
> > > Sample backtraces of a worker and leader below, plan of leader attached.
> > > The collected queries don't seem to reproduce it.
> >
> > Odd.  My understanding of the restrictions on parallel query is that
> > anything involving a SubPlan ought not be parallelized;
> >
>
> Subplan references are considered parallel-restricted, so parallel plan can be generated if there are subplans in a query, but they shouldn't be pushed to workers.  I have tried a somewhat simpler example to see if we pushdown anything parallel restricted to worker in case of joins and it turned out there are cases when that can happen.  Consider below example:
>
>  
>
> From the above output it is clear that parallel restricted function is pushed down below gather node.  I found that though we have have care fully avoided to push pathtarget below GatherPath in apply_projection_to_path() if pathtarget contains any parallel unsafe or parallel restricted clause, but we are separately also trying to apply pathtarget to partialpath list which doesn't seem to be the correct way even if it is required.  I think this has been added during parallel aggregate patch and it seems to me this is not required after the changes related to GatherPath in apply_projection_to_path().
>
> After applying the attached patch, it avoids to add parallel restricted clauses below gather path.
>
> Now back to the original bug, if you notice in plan file attached in original bug report, the subplan is pushed below Gather node in target list, but not to immediate join, rather at one more level down to SeqScan path.  I am still not sure how it has manage to push the restricted clauses to that down the level.
>

On further analysis, I think I know what is going on in the original bug report.  We add the Vars (build_base_rel_tlists) and PlaceholderVars (add_placeholders_to_base_rels()) to each relations (RelOptInfo) target during qurey_planner and the Subplans are added as PlaceHolderVars in target expressions.  Now while considering whether a particular rel can be parallel in set_rel_consider_parallel(), we don't check the target expressions to allow the relation for parallelism.  I think we can prohibit the relation to be considered for parallelism if it's target expressions contain any parallel restricted clause.  Fix on those lines is attached with this mail.

Thanks to Dilip Kumar for helping me in narrowing down this particular problem.  We were not able to generate the exact test, but I think the above theory is sufficient to prove that it can cause a problem as seen in the original bug report. 



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Attachment

pgsql-hackers by date:

Previous
From: Adam Pearson
Date:
Subject: Re: [GENERAL] NULL concatenation
Next
From: Peter Eisentraut
Date:
Subject: Re: Minor documentation patch