Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation) - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation) |
Date | |
Msg-id | CAA4eK1KzvXTCFF8inhcEviUPxp4yWCS3rZuwjfqMttf75x2rvA@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation) (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
|
List | pgsql-hackers |
On Sat, Jan 20, 2018 at 7:03 AM, Peter Geoghegan <pg@bowt.ie> wrote: > On Thu, Jan 18, 2018 at 5:53 PM, Peter Geoghegan <pg@bowt.ie> wrote: > > Attached patch details: > > * The patch synchronizes processes used the approach just described. > Note that this allowed me to remove several #include statements within > tuplesort.c. > > * The patch uses only a single condition variable for a single wait > within nbtsort.c, for the leader. No barriers are used at all (and, as > I said, tuplesort.c doesn't use condition variables anymore). Since > things are now very simple, I can't imagine anyone still arguing for > the use of barriers. > > Note that I'm still relying on nworkers_launched as the single source > of truth on the number of participants that can be expected to > eventually show up (even if they end up doing zero real work). This > should be fine, because I think that it will end up being formally > guaranteed to be reliable by the work currently underway from Robert > and Amit. But even if I'm wrong about things going that way, and it > turns out that the leader needs to decide how many putative launched > workers don't "get lost" due to fork() failure (the approach which > Amit apparently advocates), then there *still* isn't much that needs > to change. > > Ultimately, the leader needs to have the exact number of workers that > participated, because that's fundamental to the tuplesort approach to > parallel sort. > I think I can see why this patch needs that. Is it mainly for the work you are doing in _bt_leader_heapscan where you are waiting for all the workers to be finished? If necessary, the leader can just figure it out in > whatever way it likes at one central point within nbtsort.c, before > the leader calls its main spool's tuplesort_begin_index_btree() -- > that can happen fairly late in the process. Actually doing that (and > not just using nworkers_launched) seems questionable to me, because it > would be almost the first thing that the leader would do after > starting parallel mode -- why not just have the parallel > infrastructure do it for us, and for everyone else? > I think till now we don't have any such requirement, but if it is must for this patch, then I don't think it is tough to do that. We need to write an API WaitForParallelWorkerToAttach() and then call for each launched worker or maybe WaitForParallelWorkersToAttach() which can wait for all workers to attach and report how many have successfully attached. It will have functionality of WaitForBackgroundWorkerStartup and additionally it needs to check if the worker is attached to the error queue. We already have similar API (WaitForReplicationWorkerAttach) for logical replication workers as well. Note that it might have a slight impact on the performance because with this you need to wait for the workers to startup before doing any actual work, but I don't think it should be noticeable for large operations especially for operations like parallel create index. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: