Re: CUDA Sorting - Mailing list pgsql-hackers

From Dann Corbit
Subject Re: CUDA Sorting
Date
Msg-id 87F42982BF2B434F831FCEF4C45FC33E505D48EE@EXCHANGE.corporate.connx.com
Whole thread Raw
In response to Re: CUDA Sorting  (Gaetano Mendola <mendola@gmail.com>)
List pgsql-hackers
-----Original Message-----
From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Gaetano Mendola
Sent: Wednesday, February 15, 2012 2:54 PM
To: Peter Geoghegan; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] CUDA Sorting

On 15/02/2012 23:11, Peter Geoghegan wrote:
> On 15 February 2012 20:00, Gaetano Mendola<mendola@gmail.com>  wrote:
>> On 13/02/2012 19:48, Greg Stark wrote:
>>>
>>> I don't think we should be looking at either CUDA or OpenCL directly.
>>> We should be looking for a generic library that can target either
>>> and is well maintained and actively developed. Any GPU code we write
>>> ourselves would rapidly be overtaken by changes in the hardware and
>>> innovations in parallel algorithms. If we find a library that
>>> provides a sorting api and adapt our code to use it then we'll get
>>> the benefits of any new hardware feature as the library adds support for them.
>>>
>>
>> I think one option is to make the sort function pluggable with a
>> shared library/dll. I see several benefits from this:
>>
>>   - It could be in the interest of the hardware vendor to provide the
>> most powerful sort implementation (I'm sure for example that TBB sort
>> implementation is faster that pg_sort)
>>
>>   - It can permit people to "play" with it without being deep
>> involved in pg development and stuffs.
>
> Sorry, but I find it really hard to believe that the non-availability
> of pluggable sorting is what's holding people back here. Some vanguard
> needs to go and prove the idea by building a rough prototype before we
> can even really comment on what an API should look like. For example,
> I am given to understand that GPUs generally sort using radix sort -
> resolving the impedance mismatch that prevents someone from using a
> non-comparison based sort sure sounds like a lot of work for an
> entirely speculative reward.

AFAIK thrust library uses the radix sort if the keys you are sorting are POD data comparable with a "<" operator
otherwiseit does the comparison based sort using the operator provided. 

http://docs.thrust.googlecode.com/hg/modules.html

I'm not saying that the non-availability of pluggable sort completely holds people back, I'm saying that it will
simplifythe process now and int the future, of course that's my opinion. 

> Someone who cannot understand tuplesort, which is not all that
> complicated, has no business trying to build GPU sorting into
> Postgres.

That sounds a bit harsh. I'm one of those indeed, I haven't look in the details not having enough time for it. At work
wedo GPU computing (not the sort type stuff) and given the fact I'm a Postgres enthusiast I asked my self: "my server
isable to sort around 500 milions integer per seconds, if postgres was able to do that as well it would be very nice". 

What I have to say? Sorry for my thoughts.

> I had a patch committed a few hours ago that almost included the
> capability of assigning an alternative sorting function, but only one
> with the exact same signature as my variant of qsort_arg. pg_qsort
> isn't used to sort tuples at all, by the way.

Then I did look in the wrong direction. Thank you for point that out.

> Threading building blocks is not going to form the basis of any novel
> sorting implementation, because comparators in general are not thread
> safe, and it isn't available on all the platforms we support, and
> because of how longjmp interacts with C++ stack unwinding and so on
> and so on. Now, you could introduce some kind of parallelism into
> sorting integers and floats, but that's an awful lot of work for a
> marginal reward.

The TBB was just example that did come in my mind.
What do you mean with you could introduce some kind of parallelism?
As far as I know any algorithm using the divide and conquer can be parallelized.
>>
Radix sorting can be used for any data type, if you create a callback that provides the most significant bits in
"width"buckets.  At any rate, I can't imagine why anyone would want to complain about sorting 40 times faster than
before,considering the amount of time database spend in ordering data. 

I have a Cuda card in this machine (NVIDIA GeForce GTX 460) and I would not mind it a bit if my database "ORDER BY"
clausesuddenly started running ten times faster than before when I am dealing with a huge volume of data. 

There have been other experiments along these lines such as:
GPU-based Sorting in PostgreSQL Naju Mancheril, School of Computer Science - Carnegie Mellon University
www.cs.virginia.edu/~skadron/Papers/bakkum_sqlite_gpgpu10.pdf (This is for SQLite, but the grammar of SQLite is almost
apure subset of PostgreSQL, including things like vacuum...) 
http://wiki.postgresql.org/images/6/65/Pgopencl.pdf
http://dl.acm.org/citation.cfm?id=1807207
http://www.scribd.com/doc/51484335/PostgreSQL-OpenCL-Procedural-Language-pgEast-March-2011

See also
http://highscalability.com/scaling-postgresql-using-cuda


<<


pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: CUDA Sorting
Next
From: Kyotaro HORIGUCHI
Date:
Subject: Re: Speed dblink using alternate libpq tuple storage