2007/12/19, Mark Mielke <mark@mark.mielke.cc>:
> Jeff Davis wrote:
> > My first thought would be that we would need a new executor node (e.g.
> > "ParallelSort") that would only be chosen when the cost of the sort is
> > large enough to outweigh other factors (such as process creation time,
> > dividing available work_mem, and any necessary IPC).
> >
> > It seems to me the simplest way to do it would be to allow each sub
> > process to allocate work_mem/P where P is the degree of parallelization.
> > However, that somewhat works against our schemes for dynamic run
> > handling and forecasting, and may lead to more random reading from disk.
> > Any other scheme I can think of would involve more IPC, which might make
> > the idea just too complex.
> >
> I am curious - what algorithms exist to efficiently do a parallel sort?
> Do you mean if sorting 1 million items, it is possible to separate this
> into 2 sets of 500 thousand each, execute them in separate threads
> (with task administration and synchronization overhead) , combine the
> results, and complete the task in significantly less time than doing it
> in one thread? I am skeptical that this is possible, and suspect that
> the overall efficiency of the system would go down even if the
> throughput of a single execution increases.
>
Ok - we want to sort table with quick sort and we want to do it on - N threads.
Every thread - gets begin and end of indices of the table. First step starts
at 0 and lasts with count -1. Single step: find medium value and move
lover before it and bigger after. In normal case - we use recursive call - so
the same procedure is being called for that two parts. In thread we can put
indices at side list - and use queue of threads to pick up data from the list.
We can use common table, access to side list with indices has to be serialized.
> Or do you mean being able to perform parts of the query plan fully in
> parallel? If this, then one would need a lot more than ParallelSort...
>
Nice to have, but rather for data warehouses. In other cases... IE - backend
for Internet - there are many requests and every processor / core works nice.
--
Regards, Michał Zaborowski (TeXXaS)