Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation) - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
Date
Msg-id CAA4eK1Kg5o4RoAyg9Qr90Owr6Aj_F=fxMTCLmyHDKukPYb1GSQ@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
List pgsql-hackers
On Thu, Jan 25, 2018 at 1:24 AM, Peter Geoghegan <pg@bowt.ie> wrote:
> On Tue, Jan 23, 2018 at 9:43 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> Right, but what if the worker dies due to something proc_exit(1) or
>> something like that before calling BarrierArriveAndWait.  I think this
>> is part of the problem we have solved in
>> WaitForParallelWorkersToFinish such that if the worker exits abruptly
>> at any point due to some reason, the system should not hang.
>
> I have used Thomas' chaos-monkey-fork-process.patch to verify:
>
> 1. The problem of fork failure causing nbtsort.c to wait forever is a
> real problem. Sure enough, the coding pattern within
> _bt_leader_heapscan() can cause us to wait forever even with commit
> 2badb5afb89cd569500ef7c3b23c7a9d11718f2f, more or less as a
> consequence of the patch not using tuple queues (it uses the new
> tuplesort sharing thing instead).
>
> 2. Simply adding a single call to WaitForParallelWorkersToFinish()
> within _bt_leader_heapscan() before waiting on our condition variable
> fixes the problem -- errors are reliably propagated, and we never end
> up waiting forever.
>
> 3. This short term fix works just as well with
> parallel_leader_participation=off.
>
> At this point, my preferred solution is for someone to go implement
> Amit's WaitForParallelWorkersToAttach() idea [1] (Amit himself seems
> like the logical person for the job).
>

I can implement it and share a prototype patch with you which you can
use to test parallel sort stuff.  I would like to highlight the
difference which you will see with WaitForParallelWorkersToAttach as
compare to WaitForParallelWorkersToFinish() is that the former will
give you how many of nworkers_launched workers are actually launched
whereas latter gives an error if any of the expected workers is not
launched.  I feel former is good and your proposed way of calling it
after the leader is done with its work has alleviated the minor
disadvantage of this API which is that we need for workers to startup.

However, now I see that you and Thomas are trying to find a different
way to overcome this problem differently, so not sure if I should go
ahead or not.  I have seen that you told you wanted to look at
Thomas's proposed stuff carefully tomorrow, so I will wait for you
guys to decide which way is appropriate.

> Once that's committed, I can
> post a new version of the patch that uses that new infrastructure --
> I'll add a call to the new function, without changing anything else.
> Failing that, we could actually just use
> WaitForParallelWorkersToFinish(). I still don't want to use a barrier,
> mostly because it complicates  parallel_leader_participation=off,
> something that Amit is in agreement with [2][3].
>

I think if we want we can use barrier API's to solve this problem, but
I kind of have a feeling that it doesn't seem to be the most
appropriate API, especially because existing API like
WaitForParallelWorkersToFinish() can serve the need in a similar way.

Just to conclude, following are proposed ways to solve this problem:

1. Implement a new API WaitForParallelWorkersToAttach and use that to
solve this problem.  Peter G. and Amit thinks, this is a good way to
solve this problem.
2. Use existing API WaitForParallelWorkersToFinish to solve this
problem.  Peter G. feels that if API mentioned in #1 is not available,
we can use this to solve the problem and I agree with that position.
Thomas is not against it.
3. Use Thomas's new way to detect such failures.  It is not clear to
me at this stage if any one of us have accepted it to be the way to
proceed, but Thomas and Peter G. want to investigate it further.
4. Use of  Barrier API to solve this problem.  Robert appears to be
strongly in favor of this approach.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Redefining inet_net_ntop
Next
From: Amit Kapila
Date:
Subject: Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)