Re: SegFault on 9.6.14 - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: SegFault on 9.6.14
Date
Msg-id CAA4eK1Jke-evw+XrL_CYuDDpcn3nH3s_8CF=it0fE1td4jZ95g@mail.gmail.com
Whole thread Raw
In response to Re: SegFault on 9.6.14  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: SegFault on 9.6.14
List pgsql-hackers
On Tue, Jul 23, 2019 at 9:11 AM Thomas Munro <thomas.munro@gmail.com> wrote:
>
> On Fri, Jul 19, 2019 at 3:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > I am thinking that why not we remove the part of destroying the
> > parallel context (and shared memory) from ExecShutdownGather (and
> > ExecShutdownGatherMerge) and then do it at the time of ExecEndGather
> > (and ExecEndGatherMerge)?   This should fix the bug in hand and seems
> > to be more consistent with our overall design principles.  I have not
> > tried to code it to see if there are any other repercussions of the
> > same but seems worth investigating.  What do you think?
>
> I tried moving ExecParallelCleanup() into ExecEndGather().  The first
> problem is that ExecutePlan() wraps execution in
> EnterParallelMode()/ExitParallelMode(), but ExitParallelMode() fails
> an assertion that no parallel context is active because
> ExecEndGather() hasn't run yet.  The enclosing
> ExecutorStart()/ExecutorEnd() calls are further down the call stack,
> in ProcessQuery().  So some more restructuring might be needed to exit
> parallel mode later, but then I feel like you might be getting way out
> of back-patchable territory, especially if it involves moving code to
> the other side of the executor hook boundary.  Is there an easier way?
>

If we have to follow the solution on these lines, then I don't see an
easier way.  One idea could be that we relax the assert in
ExitParallelMode so that it doesn't expect parallel context to be gone
by that time, but not sure if that is a good idea because it is used
in some other places as well.  I feel in general it is a good
assertion that before we leave parallel mode, the parallel context
should be gone as that ensures we won't do any parallel activity after
that.

> Another idea from the band-aid-solutions-that-are-easy-to-back-patch
> department: in ExecutePlan() where we call ExecShutdownNode(), we
> could write EXEC_FLAG_DONE into estate->es_top_eflags, and then have
> ExecGatherShutdown() only run ExecParallelCleanup() if it sees that
> flag.  That's not beautiful, but it's less churn that the 'bool final'
> argument we discussed before, and could be removed in master when we
> have a better way.
>

Right, that will be lesser code churn and it can also work.  However,
one thing that needs some thought is till now es_top_eflags is only
set in ExecutorStart and same is mentioned in comments where it is
declared and it seems we are going to change that with this idea.  How
about having a separate function ExecBlahShutdown which will clean up
resources as parallel context and can be called only from ExecutePlan
where we are calling ExecShutdownNode?  I think both these and the
other solution we have discussed are on similar lines and another idea
could be to relax the assert which again is not a superb idea.

> Stepping back a bit, it seems like we need two separate tree-walking
> calls: one to free resources not needed anymore by the current rescan
> (workers), and another to free resources not needed ever again
> (parallel context).  That could be spelled ExecShutdownNode(false) and
> ExecShutdownNode(true), or controlled with the EXEC_FLAG_DONE kluge,
> or a new additional ExecSomethingSomethingNode() function, or as you
> say, perhaps the second thing could be incorporated into
> ExecEndNode().  I suspect that the Shutdown callbacks for Hash, Hash
> Join, Custom Scan and Foreign Scan might not be needed anymore if we
> could keep the parallel context around until after the run
> ExecEndNode().
>

I think we need those to collect instrumentation information.  I guess
that has to be done before we call InstrStopNode, otherwise, we might
miss some instrumentation information.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: Re: [bug fix] Produce a crash dump before main() on Windows
Next
From: Binguo Bao
Date:
Subject: [GSoC] Second period status report for the de-TOAST iterator