Thread: Using the GPU
Does anyone think that PostgreSQL could benefit from using the video card as a parallel computing device? I'm working on a project using Nvidia's CUDA with an 8800 series video card to handle non-graphical algorithms. I'm curious if anyone thinks that this technology could be used to speed up a database? If so which part of the database, and what kind of parallel algorithms would be used?
Thanks,
-- John Billings
John L. Billings Principal Applications Developer 585.413.2219 Office 585.339.8580 Mobile John.Billings@PAETEC.com | |||
Attachment
On 6/8/07, Billings, John <John.Billings@paetec.com> wrote: > Does anyone think that PostgreSQL could benefit from using the video card > as a parallel computing device? I'm working on a project using Nvidia's > CUDA with an 8800 series video card to handle non-graphical algorithms. > I'm curious if anyone thinks that this technology could be used to speed up > a database? Absolutely. > If so which part of the database, and what kind of parallel algorithms would be used? GPUs are parallel vector processing pipelines, which as far as I can tell do not lend themselves right away to the data structures that PostgreSQL uses; they're optimized for processing high volumes of homogenously typed values in sequence. From what I know about its internals, like most relational databases PostgreSQL stores each tuple as a sequence of values (v1, v2, ..., vN). Each tuple has a table of offsets into the tuple so that you can quickly find a value based on an attribute; in other words, data is not fixed-length or in fixed positions, table scans need to process one tuple at a time. GPUs would be a lot easier to integrate with databases such as Monet, KDB and C-Store, which partition tables vertically -- each column in a table is stored separately a vector of values. Alexander.
On 6/8/07, Billings, John <John.Billings@paetec.com> wrote: > > > > Does anyone think that PostgreSQL could benefit from using the video card as a parallel computing device? I'm workingon a project using Nvidia's CUDA with an 8800 series video card to handle non-graphical algorithms. I'm curiousif anyone thinks that this technology could be used to speed up a database? If so which part of the database, andwhat kind of parallel algorithms would be used? You might want to look at: http://www.andrew.cmu.edu/user/ngm/15-823/project/Final.pdf ...haven't used it though... Regards, Dawid
Alexander Staubo wrote: > On 6/8/07, Billings, John <John.Billings@paetec.com> wrote: >> If so which part of the database, and what kind of parallel >> algorithms would be used? > > GPUs are parallel vector processing pipelines, which as far as I can > tell do not lend themselves right away to the data structures that > PostgreSQL uses; they're optimized for processing high volumes of > homogenously typed values in sequence. But wouldn't vector calculations on database data be sped up? I'm thinking of GIS data, joins across ranges like matching one (start, end) range with another, etc. I realize these are rather specific calculations, but if they're important to your application... OTOH modern PC GPU's are optimized for pushing textures; basically transferring a lot of data in as short a time as possible. Maybe it'd be possible to move result sets around that way? Do joins even maybe? And then there are the vertex and pixel shaders... It'd be kind of odd though, to order a big time database server with a high-end gaming card in it :P -- Alban Hertroys alban@magproductions.nl magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede // Integrate Your World //
Billings, John wrote: > Does anyone think that PostgreSQL could benefit from using the video > card as a parallel computing device? I'm working on a project using > Nvidia's CUDA with an 8800 series video card to handle non-graphical > algorithms. I'm curious if anyone thinks that this technology could > be used to speed up a database? If so which part of the database, and > what kind of parallel algorithms would be used? > Looking at nvidia's cuda homepage (http://developer.nvidia.com/object/cuda.html), I see that the parallel bitonic sorting could be used instead of qsort/heapsort/mergesort (I don't know which is used) -- Alejandro Torras
Alejandro Torras wrote: > Billings, John wrote: >> Does anyone think that PostgreSQL could benefit from using the video >> card as a parallel computing device? I'm working on a project using >> Nvidia's CUDA with an 8800 series video card to handle non-graphical >> algorithms. I'm curious if anyone thinks that this technology could >> be used to speed up a database? If so which part of the database, >> and what kind of parallel algorithms would be used? >> > > Looking at nvidia's cuda homepage > (http://developer.nvidia.com/object/cuda.html), I see that the > parallel bitonic sorting could be used instead of > qsort/heapsort/mergesort (I don't know which is used) > I think that the function cublasIsamax() explained at http://developer.download.nvidia.com/compute/cuda/0_8/NVIDIA_CUBLAS_Library_0.8.pdf can be used to find the maximum of a single precision vector, but according with a previous post of Alexander Staubo, this function is best suited for fixed-length tuple values. But could the data be separated into two zones, one for varying-length data and other for fixed-length data? With this approach fixed-length data may be susceptible for more and deeper optimizations like parallelization processing. -- Alejandro Torras
On Jun 11, 2007, at 4:31 AM, Alban Hertroys wrote: > > Alexander Staubo wrote: >> On 6/8/07, Billings, John <John.Billings@paetec.com> wrote: >>> If so which part of the database, and what kind of parallel >>> algorithms would be used? >> >> GPUs are parallel vector processing pipelines, which as far as I can >> tell do not lend themselves right away to the data structures that >> PostgreSQL uses; they're optimized for processing high volumes of >> homogenously typed values in sequence. > > But wouldn't vector calculations on database data be sped up? I'm > thinking of GIS data, joins across ranges like matching one (start, > end) > range with another, etc. > I realize these are rather specific calculations, but if they're > important to your application... > > OTOH modern PC GPU's are optimized for pushing textures; basically > transferring a lot of data in as short a time as possible. Maybe > it'd be > possible to move result sets around that way? Do joins even maybe? OTOH databases might not be running on modern desktop PC's with the GPU investment. Rather they might be running on a "headless" machine that has little consideration for the GPU. It might make an interesting project, but I would be really depressed if I had to go buy an NVidia card instead of investing in more RAM to optimize my performance! <g>
On 6/16/07, Tom Allison <tom@tacocat.net> wrote: > It might make an interesting project, but I would be really depressed > if I had to go buy an NVidia card instead of investing in more RAM to > optimize my performance! <g> Why does it matter what kind of hardware you can (not "have to") buy to give your database a performance boost? With a GPU, you would have one more component that you could upgrade to improve performance; that's more possibilities, not less. I only see a problem with a database that would *require* a GPU to achieve adequate performance, or to function at all, but that's not what this thread is about. Alexander.
"Alexander Staubo" <alex@purefiction.net> writes: > On 6/16/07, Tom Allison <tom@tacocat.net> wrote: >> It might make an interesting project, but I would be really depressed >> if I had to go buy an NVidia card instead of investing in more RAM to >> optimize my performance! <g> > Why does it matter what kind of hardware you can (not "have to") buy > to give your database a performance boost? With a GPU, you would have > one more component that you could upgrade to improve performance; > that's more possibilities, not less. I only see a problem with a > database that would *require* a GPU to achieve adequate performance, > or to function at all, but that's not what this thread is about. Too often, arguments of this sort disregard the opportunity costs of development going in one direction vs another. If we make any significant effort to make Postgres use a GPU, that's development effort spent on that rather than some other optimization; and more effort, ongoing indefinitely, to maintain that code; and perhaps the code will preclude other possible optimizations or features because of assumptions wired into it. So you can't just claim that using a GPU might be interesting; you have to persuade people that it's more interesting than other places where we could spend our performance-improvement efforts. regards, tom lane
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > So you can't just claim that using a GPU might be interesting; you have to > persuade people that it's more interesting than other places where we could > spend our performance-improvement efforts. I have a feeling something as sexy as that could attract new developers though. I think the hard part here is coming up with an abstract enough interface that it doesn't tie Postgres to a particular implementation. I would want to see a library that provided primitives that Postgres could use. Then that library could have drivers for GPUs, or perhaps also for various other kinds of coprocessors available in high end hardware. I wonder if it exists already though. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Tom Lane wrote: > "Alexander Staubo" <alex@purefiction.net> writes: >> On 6/16/07, Tom Allison <tom@tacocat.net> wrote: >>> It might make an interesting project, but I would be really depressed >>> if I had to go buy an NVidia card instead of investing in more RAM to >>> optimize my performance! <g> > >> Why does it matter what kind of hardware you can (not "have to") buy >> to give your database a performance boost? With a GPU, you would have >> one more component that you could upgrade to improve performance; >> that's more possibilities, not less. I only see a problem with a >> database that would *require* a GPU to achieve adequate performance, >> or to function at all, but that's not what this thread is about. > > Too often, arguments of this sort disregard the opportunity costs of > development going in one direction vs another. If we make any > significant effort to make Postgres use a GPU, that's development effort > spent on that rather than some other optimization; and more effort, > ongoing indefinitely, to maintain that code; and perhaps the code > will preclude other possible optimizations or features because of > assumptions wired into it. So you can't just claim that using a GPU > might be interesting; you have to persuade people that it's more > interesting than other places where we could spend our > performance-improvement efforts. You have a good point. I don't know enough about how/what people use databases for in general to know what would be a good thing to work on. I'm still trying to find out the particulars of postgresql, which are always sexy. I'm also trying to fill in the gaps between what I already know in Oracle and how to implement something similar in postgresq. But I probably don't know enough about Oracle to do much there either. I'm a believer in strong fundamentals over glamour.
On 6/16/07, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Alexander Staubo" <alex@purefiction.net> writes: > > On 6/16/07, Tom Allison <tom@tacocat.net> wrote: > >> It might make an interesting project, but I would be really depressed > >> if I had to go buy an NVidia card instead of investing in more RAM to > >> optimize my performance! <g> > > > Why does it matter what kind of hardware you can (not "have to") buy > > to give your database a performance boost? With a GPU, you would have > > one more component that you could upgrade to improve performance; > > that's more possibilities, not less. I only see a problem with a > > database that would *require* a GPU to achieve adequate performance, > > or to function at all, but that's not what this thread is about. > > Too often, arguments of this sort disregard the opportunity costs of > development going in one direction vs another. If we make any > significant effort to make Postgres use a GPU, that's development effort > spent on that rather than some other optimization [...] I don't see how this goes against what I wrote. I was merely addressing Tom Allison's comment, which seems to be an unnecessary fear. By analogy, not everyone uses hardware RAID, for example, but PostgreSQL can benefit greatly from it, so it does not make sense to worry about "having to buy" it. Then again, Tom's comment may have been in jest. Alexanderr.