Re: crashes due to setting max_parallel_workers=0 - Mailing list pgsql-hackers

From David Rowley
Subject Re: crashes due to setting max_parallel_workers=0
Date
Msg-id CAKJS1f_DJmH44o0ODDoJb=rAUeo5b39u9PTa_xFD6AuqyptBOA@mail.gmail.com
Whole thread Raw
In response to Re: crashes due to setting max_parallel_workers=0  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: crashes due to setting max_parallel_workers=0  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On 28 March 2017 at 04:57, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sat, Mar 25, 2017 at 12:18 PM, Rushabh Lathia
> <rushabh.lathia@gmail.com> wrote:
>> About the original issue reported by Tomas, I did more debugging and
>> found that - problem was gather_merge_clear_slots() was not returning
>> the clear slot when nreader is zero (means nworkers_launched = 0).
>> Due to the same scan was continue even all the tuple are exhausted,
>> and then end up with server crash at gather_merge_getnext(). In the patch
>> I also added the Assert into gather_merge_getnext(), about the index
>> should be less then the nreaders + 1 (leader).
>
> Well, you and David Rowley seem to disagree on what the fix is here.
> His patches posted upthread do A, and yours do B, and from a quick
> look those things are not just different ways of spelling the same
> underlying fix, but actually directly conflicting ideas about what the
> fix should be.  Any chance you can review his patches, and maybe he
> can review yours, and we could try to agree on a consensus position?
> :-)

When comparing Rushabh's to my minimal patch, both change
gather_merge_clear_slots() to clear the leader's slot. My fix
mistakenly changes things so it does ExecInitExtraTupleSlot() on the
leader's slot, but seems that's not required since
gather_merge_readnext() sets the leader's slot to the output of
ExecProcNode(outerPlan). I'd ignore my minimal fix because of that
mistake. Rushabh's patch sidesteps this by adding a conditional
pfree() to not free slot that we didn't allocate in the first place.

I do think the code could be improved a bit. I don't like the way
GatherMergeState's nreaders and nworkers_launched are always the same.
I think this all threw me off a bit and may have been the reason for
the bug in the first place.


-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



pgsql-hackers by date:

Previous
From: Mike Palmiotto
Date:
Subject: Re: partitioned tables and contrib/sepgsql
Next
From: Kevin Grittner
Date:
Subject: Re: Guidelines for GSoC student proposals / EliminateO(N^2) scaling from rw-conflict tracking in serializable transactions