On Wed, 2008-01-30 at 17:55 +0000, Christopher Browne wrote:
> 2008/1/30 Dann Corbit <DCorbit@connx.com>:
>
http://www.scientificcomputing.com/ShowPR~PUBCODE~030~ACCT~3000000100~ISSUE~0801~RELTYPE~HPCC~PRODCODE~00000000~PRODLETT~C.html
> >
> > http://www.nvidia.com/object/cuda_learn.html
> >
> > http://www.nvidia.com/object/cuda_get.html
>
> Someone at CMU has tried this, somewhat fruitfully.
>
> http://www.andrew.cmu.edu/user/ngm/15-823/project/Draft.pdf
> http://www.andrew.cmu.edu/user/ngm/15-823/project/Final.pdf
Well done that man! Excellent piece of research.
Clearly GPUsort is cool; is it cool enough? Here's a few thoughts and
questions that we still need answers to:
The concept of CPU offload can be generalised to any specialised
hardware. Can we offload such tasks easily? If so, to what? Should it be
a GPU, or just another more general CPU? Is the cost and difficulty of
making the GPU work in generalised form better than spending that money
on more resources e.g. memory?
Can the sorting network really be reused in the general case, or must we
realistically recreate it for each new sort set?
Can we have multiple concurrent sorts of the GPU, or is it one user at a
time? Would we need multiple GPUs? Is such an architecture available?
I note that the comparison with HeapSort is worst case, since we sort 1
GB of memory without increasing work_mem beyond 1MB.
There doesn't seem to be a discussion of how GPUsort would handle sorts
too large to fit within the GPU, so we would need to have an external
sort mechanism. So if qsort is better for smaller inputs and external
sorts are needed for larger, then there seems to be a narrow-ish middle
band of benefit.
Another thought would be to replace external heap sort with an external
sort based around qsort or GPUsort, which we extend the range of
usefulness. But then we're back to redesigning external sorts.
-- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com