Re: Threaded Sorting - Mailing list pgsql-hackers

From Greg Copeland
Subject Re: Threaded Sorting
Date
Msg-id 1033744949.19466.48.camel@mouse.copelandconsulting.net
Whole thread Raw
In response to Threaded Sorting  (Hans-Jürgen Schönig <postgres@cybertec.at>)
List pgsql-hackers
On Fri, 2002-10-04 at 09:40, Hans-Jürgen Schönig wrote:
>
> I had a brief look at the code used for sorting. It is very well
> documented so maybe it is worth thinking about a parallel algorithm.
>
> When talking about threads: A pool of processes for sorting? Maybe this
> could be useful but I doubt if it the best solution to avoid overhead.
> Somewhere in the TODO it says that there will be experiments with a
> threaded backend. This make me think that threads are not a big no no.
>
>     Hans

That was a fork IIRC.  Threading is not used in baseline PostgreSQL nor
is there any such plans that I'm aware of.  People from time to time ask
about threads for this or that and are always told what I'm telling
you.  The use of threads leads to portability issues not to mention
PostgreSQL is entirely built around the process model.

Tom is right to dismiss the notion of adding additional CPUs to
something that is already I/O bound, however, the concept it self should
not be dismissed.  Applying multiple CPUs to a sort operation is well
accepted and understood technology.

At this point, perhaps Tom or one of the other core developers having
insight in this area would be willing to address how readily such a
mechanism could could be put in place.

Also, don't be so fast to dismiss what the process model can do.  There
is not reason to believe that having a process pool would not be able to
perform wonderful things if implemented properly.  Basically, the notion
would be that the backend processing the query would solicit assistance
from the sort pool if one or more processes were available.  At that
point, several methods could be employed to divide the work.  Some form
of threshold would also have to be created to prevent the pool from
being used when a single backend is capable of addressing the need.
Basically the idea is, you only have the pool assist with large tuple
counts and then, only when resources are available and resource are
available from within the pool.  By doing this, you avoid additional
overhead for small sort efforts and gain when it matters the most.


Regards,
Greg


pgsql-hackers by date:

Previous
From: Manfred Koizar
Date:
Subject: Re: Correlation in cost_index()
Next
From: Bruce Momjian
Date:
Subject: Re: Return of INSTEAD rules