Home > mailing lists

Re: crashes due to setting max_parallel_workers=0 - Mailing list pgsql-hackers

From	Rushabh Lathia
Subject	Re: crashes due to setting max_parallel_workers=0
Date	March 27, 2017 22:36:57
Msg-id	CAGPqQf2=s2i578Lw5fao+EaN_FJXTXQ5MGdvJq8ECZofGOxveA@mail.gmail.com Whole thread Raw
In response to	Re: crashes due to setting max_parallel_workers=0 (Robert Haas <robertmhaas@gmail.com>)
List	pgsql-hackers

Tree view

On Mon, Mar 27, 2017 at 9:52 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Mar 27, 2017 at 12:11 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> On 28 March 2017 at 04:57, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Sat, Mar 25, 2017 at 12:18 PM, Rushabh Lathia
>> <rushabh.lathia@gmail.com> wrote:
>>> About the original issue reported by Tomas, I did more debugging and
>>> found that - problem was gather_merge_clear_slots() was not returning
>>> the clear slot when nreader is zero (means nworkers_launched = 0).
>>> Due to the same scan was continue even all the tuple are exhausted,
>>> and then end up with server crash at gather_merge_getnext(). In the patch
>>> I also added the Assert into gather_merge_getnext(), about the index
>>> should be less then the nreaders + 1 (leader).
>>
>> Well, you and David Rowley seem to disagree on what the fix is here.
>> His patches posted upthread do A, and yours do B, and from a quick
>> look those things are not just different ways of spelling the same
>> underlying fix, but actually directly conflicting ideas about what the
>> fix should be. Any chance you can review his patches, and maybe he
>> can review yours, and we could try to agree on a consensus position?
>> :-)
>
> When comparing Rushabh's to my minimal patch, both change
> gather_merge_clear_slots() to clear the leader's slot. My fix
> mistakenly changes things so it does ExecInitExtraTupleSlot() on the
> leader's slot, but seems that's not required since
> gather_merge_readnext() sets the leader's slot to the output of
> ExecProcNode(outerPlan). I'd ignore my minimal fix because of that
> mistake. Rushabh's patch sidesteps this by adding a conditional
> pfree() to not free slot that we didn't allocate in the first place.
>
> I do think the code could be improved a bit. I don't like the way
> GatherMergeState's nreaders and nworkers_launched are always the same.
> I think this all threw me off a bit and may have been the reason for
> the bug in the first place.

Yeah, if those fields are always the same, then I think that they
should be merged. That seems undeniably a good idea.

Hmm I agree that it's good idea, and I will work on that as separate patch.

Possibly we
should fix the crash bug first, though, and then do that afterwards.
What bugs me a little about Rushabh's fix is that it looks like magic.
You have to know that we're looping over two things and freeing them
up, but there's one more of one thing than the other thing. I think
that at least needs some comments or something.

So in my second version of patch I change gather_merge_clear_slots() to

just clear the slot for the worker and some other clean up. Also throwing

NULL from gather_merge_getnext() when all the queues and heap are

exhausted - which earlier gather_merge_clear_slots() was returning clear

slot. This way we make sure that we don't run over freeing the slot for

the leader and gather_merge_getnext() don't need to depend on that

clear slot.

Thanks,

Rushabh Lathia

www.EnterpriseDB.com

pgsql-hackers by date:

From: Robert Haas
Date: 27 March 2017, 22:35:50
Subject: Re: crashes due to setting max_parallel_workers=0

From: Andres Freund
Date: 27 March 2017, 22:38:38
Subject: Re: WIP: Faster Expression Processing v4

Re: crashes due to setting max_parallel_workers=0 - Mailing list pgsql-hackers

Previous

Next