Home > mailing lists

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation) - Mailing list pgsql-hackers

From	Amit Kapila
Subject	Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
Date	January 20, 2018 11:29:24
Msg-id	CAA4eK1KzvXTCFF8inhcEviUPxp4yWCS3rZuwjfqMttf75x2rvA@mail.gmail.com Whole thread Raw
In response to	Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation) (Peter Geoghegan <pg@bowt.ie>)
Responses	Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
List	pgsql-hackers

Tree view

On Sat, Jan 20, 2018 at 7:03 AM, Peter Geoghegan <pg@bowt.ie> wrote:
> On Thu, Jan 18, 2018 at 5:53 PM, Peter Geoghegan <pg@bowt.ie> wrote:
>
> Attached patch details:
>
> * The patch synchronizes processes used the approach just described.
> Note that this allowed me to remove several #include statements within
> tuplesort.c.
>
> * The patch uses only a single condition variable for a single wait
> within nbtsort.c, for the leader. No barriers are used at all (and, as
> I said, tuplesort.c doesn't use condition variables anymore). Since
> things are now very simple, I can't imagine anyone still arguing for
> the use of barriers.
>
> Note that I'm still relying on nworkers_launched as the single source
> of truth on the number of participants that can be expected to
> eventually show up (even if they end up doing zero real work). This
> should be fine, because I think that it will end up being formally
> guaranteed to be reliable by the work currently underway from Robert
> and Amit. But even if I'm wrong about things going that way, and it
> turns out that the leader needs to decide how many putative launched
> workers don't "get lost" due to fork() failure (the approach which
> Amit apparently advocates), then there *still* isn't much that needs
> to change.
>
> Ultimately, the leader needs to have the exact number of workers that
> participated, because that's fundamental to the tuplesort approach to
> parallel sort.
>

I think I can see why this patch needs that.  Is it mainly for the
work you are doing in _bt_leader_heapscan where you are waiting for
all the workers to be finished?

 If necessary, the leader can just figure it out in
> whatever way it likes at one central point within nbtsort.c, before
> the leader calls its main spool's tuplesort_begin_index_btree() --
> that can happen fairly late in the process. Actually doing that (and
> not just using nworkers_launched) seems questionable to me, because it
> would be almost the first thing that the leader would do after
> starting parallel mode -- why not just have the parallel
> infrastructure do it for us, and for everyone else?
>

I think till now we don't have any such requirement, but if it is must
for this patch, then I don't think it is tough to do that.  We need to
write an API WaitForParallelWorkerToAttach() and then call for each
launched worker or maybe WaitForParallelWorkersToAttach() which can
wait for all workers to attach and report how many have successfully
attached.   It will have functionality of
WaitForBackgroundWorkerStartup and additionally it needs to check if
the worker is attached to the error queue.  We already have similar
API (WaitForReplicationWorkerAttach) for logical replication workers
as well.  Note that it might have a slight impact on the performance
because with this you need to wait for the workers to startup before
doing any actual work, but I don't think it should be noticeable for
large operations especially for operations like parallel create index.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

From: Tom Lane
Date: 20 January 2018, 11:20:44
Subject: Re: [HACKERS] postgres_fdw bug in 9.6

From: Amit Kapila
Date: 20 January 2018, 12:01:20
Subject: Re: Test-cases for exclusion constraints is missing inalter_table.sql file

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation) - Mailing list pgsql-hackers

Previous

Next