I did find this:
http://www.andrew.cmu.edu/user/ngm/15-823/project/Draft.pdf
But there are several reasons this seems to be a dead-end route for Postgres:
1) It's limited to in-memory sorts. Speeding up in-memory sorts by a linear
factor seems uninteresting. Anything large enough for a small linear
speedup to be interesting will be doing a disk sort anyways.
2) It's limited to one concurrent sort. There doesn't seem to be any facility
for managing the shared resource of the GPU.
3) It's limited to sorting a single float. Postgres has an extensible type
system. The use case for sorting a list of floats is pretty narrow. It's
also limited to 32-bit floats and it isn't clear if that's an
implementation detail or a hardware limitation of current GPUs.
4) It uses a hardware-specific driver for Nvidia GPUs. Ideally there would be
some kind of kernel driver which took care of managing the shared resource
(like the kernel manages things like disk, network, memory, etc) and either
that or a library layer would provide an abstract interface so that the
Postgres code would be hardware independent.
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com