On Tue, Nov 24, 2015 at 7:53 PM, Robert Haas <robertmhaas@gmail.com> wrote: > > On Tue, Nov 24, 2015 at 7:59 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Nov 24, 2015 at 8:59 AM, Robert Haas <robertmhaas@gmail.com> wrote: > >> One idea about parallel sort is that perhaps if multiple workers feed > >> data into the sort, they can each just sort what they have and then > >> merge the results. > > > > Sounds like a good approach for parallel sorting, however small extension > > to it that could avoid merging the final results is that workers allocated > > for sort will perform range-based sorting. A simple example to sort integers > > from 1-100 will be, worker-1 will be responsible for sorting any integer > > between 1-50 and worker-2 will be responsible for sorting integers from > > 51-100 and then master backend just needs to ensure that it first returns > > the tuples from worker-1 and then from worker-2. I think it has some > > similarity to your idea-5 (use of repartition), but not exactly same. > > This is not so easy to accomplish for a couple of reasons. First, how > would you know where to partition the range?
I was thinking to form range map by referring histogram from stats.
> > That would work fine if > you had all the data in sorted order to begin with, but of course if > you had that you wouldn't be sorting it. Second, remember that the > data is probably arriving in separate streams in each worker - e.g. > the sort may be being fed by a parallel sequential scan.
True, at this moment I am not sure what is the best way to reduce that overhead, but may be some form of min tuple can be used for the same.
> > If you do > what I'm proposing, those workers don't need to communicate with each > other except for the final merge at the end; but to do what you're > proposing, you'd need to move each tuple from the worker that got it > originally to the correct worker. I would guess that would be at > least as expensive as the final merge pass you are hoping to avoid, > and maybe significantly moreso. >
I think we can evaluate pros and cons of each approach and then proceed with one which is more promising.