Re: Parallel tuplesort (for parallel B-Tree index creation) - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Parallel tuplesort (for parallel B-Tree index creation)
Date
Msg-id CAA4eK1J-=1mzxgGqQHnA-eQ1d-7adCErqd2GmOCc2oAnn6-Qwg@mail.gmail.com
Whole thread Raw
In response to Re: Parallel tuplesort (for parallel B-Tree index creation)  (Peter Geoghegan <pg@heroku.com>)
Responses Re: Parallel tuplesort (for parallel B-Tree index creation)
Re: Parallel tuplesort (for parallel B-Tree index creation)
List pgsql-hackers
On Thu, Oct 13, 2016 at 12:35 AM, Peter Geoghegan <pg@heroku.com> wrote:
> On Wed, Oct 12, 2016 at 11:09 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>
>> On the flip side, what if anything can queries hope to get out of
>> parallel sort that they can't get out of Gather Merge?  One
>> possibility is that a parallel sort might end up being substantially
>> faster than Gather Merge-over-non-parallel sort.  In that case, we
>> obviously should prefer it.
>
> I must admit that I don't know enough about it to comment just yet.
> Offhand, it occurs to me that the Gather Merge sorted input could come
> from a number of different types of paths/nodes, whereas adopting what
> I've done here could only work more or less equivalently to "Gather
> Merge -> Sort -> Seq Scan" -- a special case, really.
>
>> For example, it's possible that you might want to have all
>> workers participate in sorting some data and then divide the result of
>> the sort into equal ranges that are again divided among the workers,
>> or that you might want all workers to sort and then each worker to
>> read a complete copy of the output data.  But these don't seem like
>> particularly mainstream needs, nor do they necessarily seem like
>> problems that parallel sort itself should be trying to solve.
>
> This project of mine is about parallelizing tuplesort.c, which isn't
> really what you want for parallel query -- you shouldn't try to scope
> the problem as "make the sort more scalable using parallelism" there.
> Rather, you want to scope it at "make the execution of the entire
> query more scalable using parallelism", which is really quite a
> different thing, which necessarily involves the executor having direct
> knowledge of partition boundaries.
>

Okay, but what is the proof or why do you think second is going to
better than first?  One thing which strikes as a major difference
between your approach and Gather Merge is that in your approach leader
has to wait till all the workers have done with their work on sorting
whereas with Gather Merge as soon as first one is done, leader starts
with merging.  I could be wrong here, but if I understood it
correctly, then there is a argument that Gather Merge kind of approach
can win in cases where some of the workers can produce sorted outputs
ahead of others and I am not sure if we can dismiss such cases.

+struct Sharedsort
+{
..
+ * Workers increment workersFinished to indicate having finished.  If
+ * this is equal to state.launched within the leader, leader is ready
+ * to merge runs.
+ *
+ * leaderDone indicates if leader is completely done (i.e., was
+ * tuplesort_end called against the state through which parallel output
+ * was consumed?)
+ */
+ int currentWorker;
+ int workersFinished;
..
}

By looking at 'workersFinished' usage, it looks like you have devised
a new way for leader to know when workers have finished which might be
required for this patch.  However, have you tried to use or
investigate if existing infrastructure which serves same purpose could
be used for it?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: make coverage-html on OS X
Next
From: Kevin Grittner
Date:
Subject: Re: Question about behavior of snapshot too old feature