Re: SegFault on 9.6.14 - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: SegFault on 9.6.14
Date
Msg-id CAA4eK1+U9oecKb6VXSay1QtznnMgiwEnuUL3MwraWa=yuz1-1Q@mail.gmail.com
Whole thread Raw
In response to Re: SegFault on 9.6.14  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: SegFault on 9.6.14
List pgsql-hackers
On Mon, Sep 2, 2019 at 4:51 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Aug 9, 2019 at 6:29 PM Robert Haas <robertmhaas@gmail.com> wrote:
> >
> >
> > But beyond that, the issue here is that the Limit node is shutting
> > down the Gather node too early, and the right fix must be to stop
> > doing that, not to change the definition of what it means to shut down
> > a node, as this patch does.  So maybe a possible approach here - which
> > I think is more or less what Tom is proposing - is:
> >
> > 1. Remove the code from ExecLimit() that calls ExecShutdownNode().
> >
>
> Attached patch does that.  I have also added one test as a separate
> patch so that later if we introduce shutting down resources in Limit
> node, we don't break anything.  As of now, I have kept it separate for
> easy verification, but if we decide to go with this approach and test
> appears fine, we can merge it along with the fix.
>

I have merged the code change and test case patch as I felt that it is
good to cover this case.  I have slightly changed the test case to
make its output predictable (made the inner scan ordered so that the
query always produces the same result).  One more thing I am not able
to come up with some predictable test case for 9.6 branches as it
doesn't support Gather Merge which is required for this particular
test to always produce predictable output.  There could be some better
way to write this test, so any input in that regards or otherwise is
welcome.  So, if we commit this patch the containing test case will be
for branches HEAD~10, but the code will be for HEAD~9.6.

> > 2. Adjust ExecutePlan() so that it ensures that ExecuteShutdownNode()
> > gets called at the very end of execution, at least when execute_once
> > is set, before exiting parallel mode.
> >
>
> I am not sure if I completely understand this point.  AFAICS, the
> ExecuteShutdownNode is called whenever we are done getting the tuples.
> One place where it is not there in that function is when we assume
> destination is closed, basically below code:
> ExecutePlan()
> {
> ..
> if (!dest->receiveSlot(slot, dest))
> break;
> ..
> }
>
> Do you expect this case to be also dealt or you have something else in
> mind?
>

It still appears problematic, but I couldn't come up with a test case
to reproduce the problem.  I'll try some more on this, but I think
this anyway can be done separately once we have a test to show the
problem.

>  The other possibility could be that we move the shutdown of the
> node at the end of the function when we exit parallel mode but doing
> that lead to some regression failure on my machine.  I will
> investigate the same.
>

This was failing because use_parallel_mode flag in function
ExecutePlan() won't be set for workers and hence they won't get a
chance to accumulate its stats.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment

pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: changing wal segsize with pg_resetwal
Next
From: Alvaro Herrera
Date:
Subject: Re: tableam vs. TOAST