Re: parallelism and sorting - Mailing list pgsql-hackers

From Robert Haas
Subject Re: parallelism and sorting
Date
Msg-id CA+TgmoZ2B5_DA+N3oCuwN0F4LrbOPQx5xhrE5AdMbKETZmHAeg@mail.gmail.com
Whole thread Raw
In response to Re: parallelism and sorting  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: parallelism and sorting  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Tue, Nov 24, 2015 at 7:59 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Tue, Nov 24, 2015 at 8:59 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> One idea about parallel sort is that perhaps if multiple workers feed
>> data into the sort, they can each just sort what they have and then
>> merge the results.
>
> Sounds like a good approach for parallel sorting, however small extension
> to it that could avoid merging the final results is that workers allocated
> for sort will perform range-based sorting. A simple example to sort integers
> from 1-100 will be, worker-1 will be responsible for sorting any integer
> between 1-50 and worker-2 will be responsible for sorting integers from
> 51-100 and then master backend just needs to ensure that it first returns
> the tuples from worker-1 and then from worker-2.  I think it has some
> similarity to your idea-5 (use of repartition), but not exactly same.

This is not so easy to accomplish for a couple of reasons.  First, how
would you know where to partition the range?  That would work fine if
you had all the data in sorted order to begin with, but of course if
you had that you wouldn't be sorting it.  Second, remember that the
data is probably arriving in separate streams in each worker - e.g.
the sort may be being fed by a parallel sequential scan.  If you do
what I'm proposing, those workers don't need to communicate with each
other except for the final merge at the end; but to do what you're
proposing, you'd need to move each tuple from the worker that got it
originally to the correct worker.  I would guess that would be at
least as expensive as the final merge pass you are hoping to avoid,
and maybe significantly moreso.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
Next
From: Chapman Flack
Date:
Subject: Re: problem with msvc linker - cannot build orafce