Re: CUDA Sorting - Mailing list pgsql-hackers
From | Dann Corbit |
---|---|
Subject | Re: CUDA Sorting |
Date | |
Msg-id | 87F42982BF2B434F831FCEF4C45FC33E505D48EE@EXCHANGE.corporate.connx.com Whole thread Raw |
In response to | Re: CUDA Sorting (Gaetano Mendola <mendola@gmail.com>) |
List | pgsql-hackers |
-----Original Message----- From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Gaetano Mendola Sent: Wednesday, February 15, 2012 2:54 PM To: Peter Geoghegan; pgsql-hackers@postgresql.org Subject: Re: [HACKERS] CUDA Sorting On 15/02/2012 23:11, Peter Geoghegan wrote: > On 15 February 2012 20:00, Gaetano Mendola<mendola@gmail.com> wrote: >> On 13/02/2012 19:48, Greg Stark wrote: >>> >>> I don't think we should be looking at either CUDA or OpenCL directly. >>> We should be looking for a generic library that can target either >>> and is well maintained and actively developed. Any GPU code we write >>> ourselves would rapidly be overtaken by changes in the hardware and >>> innovations in parallel algorithms. If we find a library that >>> provides a sorting api and adapt our code to use it then we'll get >>> the benefits of any new hardware feature as the library adds support for them. >>> >> >> I think one option is to make the sort function pluggable with a >> shared library/dll. I see several benefits from this: >> >> - It could be in the interest of the hardware vendor to provide the >> most powerful sort implementation (I'm sure for example that TBB sort >> implementation is faster that pg_sort) >> >> - It can permit people to "play" with it without being deep >> involved in pg development and stuffs. > > Sorry, but I find it really hard to believe that the non-availability > of pluggable sorting is what's holding people back here. Some vanguard > needs to go and prove the idea by building a rough prototype before we > can even really comment on what an API should look like. For example, > I am given to understand that GPUs generally sort using radix sort - > resolving the impedance mismatch that prevents someone from using a > non-comparison based sort sure sounds like a lot of work for an > entirely speculative reward. AFAIK thrust library uses the radix sort if the keys you are sorting are POD data comparable with a "<" operator otherwiseit does the comparison based sort using the operator provided. http://docs.thrust.googlecode.com/hg/modules.html I'm not saying that the non-availability of pluggable sort completely holds people back, I'm saying that it will simplifythe process now and int the future, of course that's my opinion. > Someone who cannot understand tuplesort, which is not all that > complicated, has no business trying to build GPU sorting into > Postgres. That sounds a bit harsh. I'm one of those indeed, I haven't look in the details not having enough time for it. At work wedo GPU computing (not the sort type stuff) and given the fact I'm a Postgres enthusiast I asked my self: "my server isable to sort around 500 milions integer per seconds, if postgres was able to do that as well it would be very nice". What I have to say? Sorry for my thoughts. > I had a patch committed a few hours ago that almost included the > capability of assigning an alternative sorting function, but only one > with the exact same signature as my variant of qsort_arg. pg_qsort > isn't used to sort tuples at all, by the way. Then I did look in the wrong direction. Thank you for point that out. > Threading building blocks is not going to form the basis of any novel > sorting implementation, because comparators in general are not thread > safe, and it isn't available on all the platforms we support, and > because of how longjmp interacts with C++ stack unwinding and so on > and so on. Now, you could introduce some kind of parallelism into > sorting integers and floats, but that's an awful lot of work for a > marginal reward. The TBB was just example that did come in my mind. What do you mean with you could introduce some kind of parallelism? As far as I know any algorithm using the divide and conquer can be parallelized. >> Radix sorting can be used for any data type, if you create a callback that provides the most significant bits in "width"buckets. At any rate, I can't imagine why anyone would want to complain about sorting 40 times faster than before,considering the amount of time database spend in ordering data. I have a Cuda card in this machine (NVIDIA GeForce GTX 460) and I would not mind it a bit if my database "ORDER BY" clausesuddenly started running ten times faster than before when I am dealing with a huge volume of data. There have been other experiments along these lines such as: GPU-based Sorting in PostgreSQL Naju Mancheril, School of Computer Science - Carnegie Mellon University www.cs.virginia.edu/~skadron/Papers/bakkum_sqlite_gpgpu10.pdf (This is for SQLite, but the grammar of SQLite is almost apure subset of PostgreSQL, including things like vacuum...) http://wiki.postgresql.org/images/6/65/Pgopencl.pdf http://dl.acm.org/citation.cfm?id=1807207 http://www.scribd.com/doc/51484335/PostgreSQL-OpenCL-Procedural-Language-pgEast-March-2011 See also http://highscalability.com/scaling-postgresql-using-cuda <<
pgsql-hackers by date: