Thread: GPU Accelerated Sorting
Not sure if anyone else saw this, but it struck me as an interesting idea if it could be added to PostgreSQL. GPU accelerated database operations could be very... interesting. Of course, this could be difficult to do in a way that usefully increases performance of PostgreSQL, but I'll leave that up to you guys to figure out. http://code.google.com/p/back40computing/wiki/RadixSorting -- Eliot Gable "We do not inherit the Earth from our ancestors: we borrow it from our children." ~David Brower "I decided the words were too conservative for me. We're not borrowing from our children, we're stealing from them--and it's not even considered to be a crime." ~David Brower "Esse oportet ut vivas, non vivere ut edas." (Thou shouldst eat to live; not live to eat.) ~Marcus Tullius Cicero
Eliot Gable wrote: > Not sure if anyone else saw this, but it struck me as an interesting > idea if it could be added to PostgreSQL. GPU accelerated database > operations could be very... interesting. Of course, this could be > difficult to do in a way that usefully increases performance of > PostgreSQL, but I'll leave that up to you guys to figure out. > This comes up every year or so. The ability of GPU offloading to help with sorting has to overcome the additional latency that comes from copying everything over to it and then getting all the results back. If you look at the typical types of sorting people see in PostgreSQL, it's hard to find ones that are a) big enough to benefit from being offloaded to the GPU like that, while also being b) not so bottlenecked on disk I/O that speeding up the CPU part matters. And if you need to sort something in that category, you probably just put an index on it instead and call it a day. If you made me make a list of things I'd think would be worthwhile to spend effort improving in PostgreSQL, this would be on the research list, but unlikely to even make my personal top 100 things that are work fiddling with. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
Hello, In my humble opinion, while it can sound interesting from a theorical point of view to outloads some operations to the GPU, there is a huge pratical problem in current world : databases which are big enough to require such heavy optimization are usually runned on server hardware, which very rarely have powerful GPU. There may be a small target of computers having both GPU and heavy database, but that sounds very exceptional to me, so investing effort into it sounds a bit unjustified to me. Except as a matter of scientific research, which is it of course important for the future. Regards, -- Gaël Le Mignot - gael@pilotsystems.net Pilot Systems - 9, rue Desargues - 75011 Paris Tel : +33 1 44 53 05 55 - www.pilotsystems.net Gérez vos contacts et vos newsletters : www.cockpit-mailing.com
Well, from that perspective, it becomes a "chicken and egg" problem. Without the software support to use a GPU in a server for acceleration, nobody's going to build a server with a GPU. However, as previously stated, I can understand the challenges with determining whether the offloading would even be worthwhile (given disk performance constraints and GPU memory loading constraints vs payoff for acceleration), much less finding actual real-world queries that would benefit from it. I am sure there are probably hundreds of other performance improvements that could be made that would have much more wide-spread appeal, but it's an interesting thing to consider. This is not the first thing that's come up that a database could use a GPU for in order to improve performance, and it won't be the last. It's something to keep in mind, and if enough additional items come up where GPUs can do better than CPUs, it might be worthwhile to start implementing some of those things. Maybe if enough of them get implemented, there will be enough overall performance increase to make it worthwhile to put a GPU in a database server. Besides, if you can sort data on a GPU faster than a CPU, you can probably also search for data faster on the GPU than on the CPU under similar conditions. With memory increasing like crazy in servers, it might be worthwhile to keep indexes entirely in memory (but keep them sync'd to the disk, obviously) and for extremely large tables, dump them to the GPU for a massively parallel search. In fact, if you can get enough GPU memory that you could keep them entirely in the GPU memory and keep them updated there any time they change, you could see some real performance pay-offs. I'm not saying someone should just go out and do this right now, but it might be worthwhile to keep it in mind as code is re-written or updated in the future how it might be structured to more easily implement something like this in the future. On Mon, Aug 30, 2010 at 10:56 AM, Gaël Le Mignot <gael@pilotsystems.net> wrote: > Hello, > > In my humble opinion, while it can sound interesting from a theorical > point of view to outloads some operations to the GPU, there is a huge > pratical problem in current world : databases which are big enough to > require such heavy optimization are usually runned on server hardware, > which very rarely have powerful GPU. > > There may be a small target of computers having both GPU and heavy > database, but that sounds very exceptional to me, so investing effort > into it sounds a bit unjustified to me. > > Except as a matter of scientific research, which is it of course > important for the future. > > Regards, > > -- > Gaël Le Mignot - gael@pilotsystems.net > Pilot Systems - 9, rue Desargues - 75011 Paris > Tel : +33 1 44 53 05 55 - www.pilotsystems.net > Gérez vos contacts et vos newsletters : www.cockpit-mailing.com > -- Eliot Gable "We do not inherit the Earth from our ancestors: we borrow it from our children." ~David Brower "I decided the words were too conservative for me. We're not borrowing from our children, we're stealing from them--and it's not even considered to be a crime." ~David Brower "Esse oportet ut vivas, non vivere ut edas." (Thou shouldst eat to live; not live to eat.) ~Marcus Tullius Cicero
Greg Smith wrote: > This comes up every year or so. The ability of GPU offloading to help > with sorting has to overcome the additional latency that comes from > copying everything over to it and then getting all the results back. > If you look at the typical types of sorting people see in PostgreSQL, > it's hard to find ones that are a) big enough to benefit from being > offloaded to the GPU like that, while also being b) not so > bottlenecked on disk I/O that speeding up the CPU part matters. And > if you need to sort something in that category, you probably just put > an index on it instead and call it a day. > > If you made me make a list of things I'd think would be worthwhile to > spend effort improving in PostgreSQL, this would be on the research > list, but unlikely to even make my personal top 100 things that are > work fiddling with. Related is 'Parallelizing query optimization' (http://www.vldb.org/pvldb/1/1453882.pdf) in which they actually experiment with PostgreSQL. Note that their target platform is general purpose CPU, not a SIMD GPU processor. -- Yeb
On Mon, Aug 30, 2010 at 8:56 AM, Gaël Le Mignot <gael@pilotsystems.net> wrote: > Hello, > > In my humble opinion, while it can sound interesting from a theorical > point of view to outloads some operations to the GPU, there is a huge > pratical problem in current world : databases which are big enough to > require such heavy optimization are usually runned on server hardware, > which very rarely have powerful GPU. That's changed recently: http://www.aberdeeninc.com/abcatg/GPUservers.htm > There may be a small target of computers having both GPU and heavy > database, but that sounds very exceptional to me, so investing effort > into it sounds a bit unjustified to me. I tend to agree. OTOH, imagine using a 400 core GPU for offloading stuff that isn't just a sort, like travelling salesman type problems. The beauty of if is, that with pgsql support dozens of scripting languages, you wouldn't have to build anything into pg's backends, you could just write it in a pl langauge. -- To understand recursion, one must first understand recursion.
Feels like I fell through a worm hole in space/time, back to inmos in 1987, and a guy from marketing has just walked in the office going on about there's a customer who wants to use our massively parallel hardware to speed up databases...
On Mon, 2010-08-30 at 09:51 -0400, Eliot Gable wrote: > Not sure if anyone else saw this, but it struck me as an interesting > idea if it could be added to PostgreSQL. GPU accelerated database > operations could be very... interesting. Of course, this could be > difficult to do in a way that usefully increases performance of > PostgreSQL, but I'll leave that up to you guys to figure out. > > http://code.google.com/p/back40computing/wiki/RadixSorting > Radix sort is not a comparison sort. Comparison sorts work for any data type for which you define a total order; and any total order is allowed. Radix sort only works for some data types and some total orders. However, it would be very nice to use radix sorting where it does work. That would require some extensions to the type system, but it could be done. The GPU issue is orthogonal. Regards, Jeff Davis
david_list@boreham.org (David Boreham) writes: > Feels like I fell through a worm hole in space/time, back to inmos in > 1987, and a guy from marketing has just > walked in the office going on about there's a customer who wants to > use our massively parallel hardware to speed up databases... ... As long as you're willing to rewrite PostgreSQL in Occam 2... -- http://projects.cs.kent.ac.uk/projects/tock/trac/ The statistics on sanity are that one out of every four Americans is suffering from some form of mental illness. Think of your three best friends. If they're okay, then it's you. -- Rita Mae Brown
On 8/30/2010 3:18 PM, Chris Browne wrote: > ... As long as you're willing to rewrite PostgreSQL in Occam 2... Just re-write it in Google's new language 'Go' : it's close enough to Occam and they'd probably fund the project.. ;)
On a similar note, is Postgres' Quicksort a dual-pivot quicksort? This can be up to 2x as fast as a normal quicksort (25%fewer swap operations, and swap operations are more expensive than compares for most sorts). Just google 'dual pivot quicksort' for more info. And before anyone asks -- two pivots (3 partitions) is optimal. See http://mail.openjdk.java.net/pipermail/core-libs-dev/2009-September/002676.html On Aug 30, 2010, at 12:25 PM, Yeb Havinga wrote: > Greg Smith wrote: >> This comes up every year or so. The ability of GPU offloading to help >> with sorting has to overcome the additional latency that comes from >> copying everything over to it and then getting all the results back. >> If you look at the typical types of sorting people see in PostgreSQL, >> it's hard to find ones that are a) big enough to benefit from being >> offloaded to the GPU like that, while also being b) not so >> bottlenecked on disk I/O that speeding up the CPU part matters. And >> if you need to sort something in that category, you probably just put >> an index on it instead and call it a day. >> >> If you made me make a list of things I'd think would be worthwhile to >> spend effort improving in PostgreSQL, this would be on the research >> list, but unlikely to even make my personal top 100 things that are >> work fiddling with. > Related is 'Parallelizing query optimization' > (http://www.vldb.org/pvldb/1/1453882.pdf) in which they actually > experiment with PostgreSQL. Note that their target platform is general > purpose CPU, not a SIMD GPU processor. > > -- Yeb > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance
Scott Carey <scott@richrelevance.com> writes: > On a similar note, is Postgres' Quicksort a dual-pivot quicksort? This can be up to 2x as fast as a normal quicksort (25%fewer swap operations, and swap operations are more expensive than compares for most sorts). In Postgres, the swaps are pretty much free compared to the comparisons. Sorry, but the above doesn't especially tempt me... regards, tom lane