Re: Sorting Improvements for 8.4 - Mailing list pgsql-hackers

From Mark Mielke
Subject Re: Sorting Improvements for 8.4
Date
Msg-id 4768D4BE.6050700@mark.mielke.cc
Whole thread Raw
In response to Re: Sorting Improvements for 8.4  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: Sorting Improvements for 8.4
Re: Sorting Improvements for 8.4
List pgsql-hackers
Jeff Davis wrote:
> My first thought would be that we would need a new executor node (e.g.
> "ParallelSort") that would only be chosen when the cost of the sort is
> large enough to outweigh other factors (such as process creation time,
> dividing available work_mem, and any necessary IPC).
>
> It seems to me the simplest way to do it would be to allow each sub
> process to allocate work_mem/P where P is the degree of parallelization.
> However, that somewhat works against our schemes for dynamic run
> handling and forecasting, and may lead to more random reading from disk.
> Any other scheme I can think of would involve more IPC, which might make
> the idea just too complex.
>   
I am curious - what algorithms exist to efficiently do a parallel sort? 
Do you mean if sorting 1 million items, it is possible to separate this 
into  2 sets of 500 thousand each, execute them in separate threads 
(with task administration and synchronization overhead) , combine the 
results, and complete the task in significantly less time than doing it 
in one thread? I am skeptical that this is possible, and suspect that 
the overall efficiency of the system would go down even if the 
throughput of a single execution increases.

Or do you mean being able to perform parts of the query plan fully in 
parallel? If this, then one would need a lot more than ParallelSort...

Cheers,
mark

-- 
Mark Mielke <mark@mielke.cc>


pgsql-hackers by date:

Previous
From: "Koichi Suzuki"
Date:
Subject: Benchmark for GiST index?
Next
From: "Michał Zaborowski"
Date:
Subject: Re: Sorting Improvements for 8.4