Re: Sorting Improvements for 8.4 - Mailing list pgsql-hackers

From Ron Mayer
Subject Re: Sorting Improvements for 8.4
Date
Msg-id 47696249.4090602@cheapcomplexdevices.com
Whole thread Raw
In response to Re: Sorting Improvements for 8.4  (Mark Mielke <mark@mark.mielke.cc>)
Responses Re: Sorting Improvements for 8.4  (Mark Mielke <mark@mark.mielke.cc>)
Re: Sorting Improvements for 8.4  (James Mansion <james@mansionfamily.plus.com>)
List pgsql-hackers
Mark Mielke wrote:
> I am curious - what algorithms exist to efficiently do a parallel sort?
> Do you mean if sorting 1 million items, it is possible to separate this
> into  2 sets of 500 thousand each, execute them in separate threads
> (with task administration and synchronization overhead) , combine the
> results, and complete the task in significantly less time than doing it
> in one thread? I am skeptical that this is possible...

The link in the beginning of the thread points to articles
that seem to describe one such algorithm; along with benchmarks.
(http://tinyurl.com/3bvu4u, http://tinyurl.com/32wg2m)
The improvements were pretty consistent from set sizes ranging
from very small sets (hundreds) to quite large ones (hundreds of K).

Interestingly, even multi-threading helped a lot.
  "Our tests correlate well with previous research that showed   Intel’s implementation of SMT (Hyper-Threading) to be
adept at hiding this latency [6, 20, 12].Table 4 shows that by   having two threads access memory at the same time,
performance  improved over 80% when compared to the singlethreaded version.
 

It uses both quicksort phases and merge phases; for the merge phase
using 2CPUs (no hyperthreading) apparently gave more than 2X speed
improvement; apparently because it could parallelize memory access
with CPU more.

> Or do you mean being able to perform parts of the query plan fully in
> parallel? If this, then one would need a lot more than ParallelSort...

I wouldn't recommend that - it seems like a Hard Problem.

My guess is that the best way to use multiple threads in one backend
would be to find specific algorithms like sorting that  would be
easier to isolate.


pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: Re: Sorting Improvements for 8.4
Next
From: Andrew Dunstan
Date:
Subject: Re: Proposal for Null Bitmap Optimization(for TrailingNULLs)