Home > mailing lists

Re: SegFault on 9.6.14 - Mailing list pgsql-hackers

From	Thomas Munro
Subject	Re: SegFault on 9.6.14
Date	July 23, 2019 06:40:46
Msg-id	CA+hUKGLh1XKbqrq22UxEn3iJLoqV261OTKy9uqA8PYuxT6hHPA@mail.gmail.com Whole thread Raw
In response to	Re: SegFault on 9.6.14 (Amit Kapila <amit.kapila16@gmail.com>)
Responses	Re: SegFault on 9.6.14 (Amit Kapila <amit.kapila16@gmail.com>)
List	pgsql-hackers

Tree view

On Fri, Jul 19, 2019 at 3:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Thu, Jul 18, 2019 at 7:15 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Thomas Munro <thomas.munro@gmail.com> writes:
> > > Hmm, so something like a new argument "bool final" added to the
> > > ExecXXXShutdown() functions, which receives false in this case to tell
> > > it that there could be a rescan so keep the parallel context around.
> >
> > I think this is going in the wrong direction.  Nodes should *always*
> > assume that a rescan is possible until ExecEndNode is called.
>
> I am thinking that why not we remove the part of destroying the
> parallel context (and shared memory) from ExecShutdownGather (and
> ExecShutdownGatherMerge) and then do it at the time of ExecEndGather
> (and ExecEndGatherMerge)?   This should fix the bug in hand and seems
> to be more consistent with our overall design principles.  I have not
> tried to code it to see if there are any other repercussions of the
> same but seems worth investigating.  What do you think?

I tried moving ExecParallelCleanup() into ExecEndGather().  The first
problem is that ExecutePlan() wraps execution in
EnterParallelMode()/ExitParallelMode(), but ExitParallelMode() fails
an assertion that no parallel context is active because
ExecEndGather() hasn't run yet.  The enclosing
ExecutorStart()/ExecutorEnd() calls are further down the call stack,
in ProcessQuery().  So some more restructuring might be needed to exit
parallel mode later, but then I feel like you might be getting way out
of back-patchable territory, especially if it involves moving code to
the other side of the executor hook boundary.  Is there an easier way?

Another idea from the band-aid-solutions-that-are-easy-to-back-patch
department: in ExecutePlan() where we call ExecShutdownNode(), we
could write EXEC_FLAG_DONE into estate->es_top_eflags, and then have
ExecGatherShutdown() only run ExecParallelCleanup() if it sees that
flag.  That's not beautiful, but it's less churn that the 'bool final'
argument we discussed before, and could be removed in master when we
have a better way.

Stepping back a bit, it seems like we need two separate tree-walking
calls: one to free resources not needed anymore by the current rescan
(workers), and another to free resources not needed ever again
(parallel context).  That could be spelled ExecShutdownNode(false) and
ExecShutdownNode(true), or controlled with the EXEC_FLAG_DONE kluge,
or a new additional ExecSomethingSomethingNode() function, or as you
say, perhaps the second thing could be incorporated into
ExecEndNode().  I suspect that the Shutdown callbacks for Hash, Hash
Join, Custom Scan and Foreign Scan might not be needed anymore if we
could keep the parallel context around until after the run
ExecEndNode().

-- 
Thomas Munro
https://enterprisedb.com

pgsql-hackers by date:

From: Kyotaro Horiguchi
Date: 23 July 2019, 06:26:28
Subject: "Attachment not found" by CF-app

From: "Tsunakawa, Takayuki"
Date: 23 July 2019, 06:46:59
Subject: RE: Speed up transaction completion faster after many relations areaccessed in a transaction

Re: SegFault on 9.6.14 - Mailing list pgsql-hackers

Previous

Next