Mark Mielke wrote:
> I am curious - what algorithms exist to efficiently do a parallel sort?
> Do you mean if sorting 1 million items, it is possible to separate this
> into 2 sets of 500 thousand each, execute them in separate threads
> (with task administration and synchronization overhead) , combine the
> results, and complete the task in significantly less time than doing it
> in one thread? I am skeptical that this is possible...
The link in the beginning of the thread points to articles
that seem to describe one such algorithm; along with benchmarks.
(http://tinyurl.com/3bvu4u, http://tinyurl.com/32wg2m)
The improvements were pretty consistent from set sizes ranging
from very small sets (hundreds) to quite large ones (hundreds of K).
Interestingly, even multi-threading helped a lot.
"Our tests correlate well with previous research that showed Intel’s implementation of SMT (Hyper-Threading) to be
adept at hiding this latency [6, 20, 12].Table 4 shows that by having two threads access memory at the same time,
performance improved over 80% when compared to the singlethreaded version.
It uses both quicksort phases and merge phases; for the merge phase
using 2CPUs (no hyperthreading) apparently gave more than 2X speed
improvement; apparently because it could parallelize memory access
with CPU more.
> Or do you mean being able to perform parts of the query plan fully in
> parallel? If this, then one would need a lot more than ParallelSort...
I wouldn't recommend that - it seems like a Hard Problem.
My guess is that the best way to use multiple threads in one backend
would be to find specific algorithms like sorting that would be
easier to isolate.