Re: GPUSort project - Mailing list pgsql-hackers

From Mischa Sandberg
Subject Re: GPUSort project
Date
Msg-id 443D618A.7060604@activestate.com
Whole thread Raw
In response to Re: GPUSort project  (Simon Riggs <simon@2ndquadrant.com>)
List pgsql-hackers
[short]
This probably would be an uneasy fit into generic backend code.
Was hoping the GPUSort project might have fleeced/sorted out some issues.

[long]
Simon Riggs wrote:
> On Wed, 2006-04-12 at 10:00 -0700, Mischa Sandberg wrote:
...
>>Long answer: we're shipping a server (appliance) product built on stock 
>>rackmount hardware, that includes an ATI Rage (8MB) with nothing to do. Much of 
>>what the box does is a single cpu-bound process, sorting  maillog extracts. The 
>>GPU is an asset, even at 8MB; the headwork is in mapping/truncating sort keys 
>>down to dense ~32bit prefixes; and in making smooth judgements as to when to 
>>give the job to (a) the GPU (b) quicksort (c) a tiny bitonic sort in the SSE2 
>>registers.

> It sounds like its possible, but it would have to give incredible gains
> before its worth the effort to make it happen. 8MB of video RAM doesn't
> score much against 256MB of normal RAM, which is pretty cheap these
> days.

A better comparison is 8MB of video RAM vs 512K of L2 cache. GPU's (also) have 
faster access (>32GB/s) to RAM than the CPU, using AGP/PCI with no contention. 
Our product uses Xeons instead of Opterons; the 3GHz CPUs are just slogging, 
waiting >70% for RAM fetch.

> The hardware dependency would make this extremely sensitive to change,
> so effort in this area might not give lasting benefit. As it happens,
> I'm in favour of making code changes to exploit hardware, but this one
> is too far for me to encourage anybody to pursue it further.

Fair comment. I'm using OpenGL, and looking at Glift, so it's not as 
hardware-specific as you might think. Other projects at gpgpu.org seem to be 
able to switch among GPU's.

That being said, humbly admit that targetting specific hardware tends to give 
one tunnel vision. Coding "if all these conditions are true, use the fast 
algorithm, else do it the normal way" is also messier to extend than a nice 
clean interface layer :-(

>>Any of this would apply to postgres, if tuplesort.c can tolerate a preprocessing 
>>step that looks for special cases, and degrades gracefully into the standard 
>>case. 

> For other techniques, I think it can, depending upon the cost of the
> preprocessing step. But the overall improvement from improving small
> sorts could well be lost in the noise...so maybe not worth it.

Agreed. GPU setup makes sorts <1MB not worth it.

Small sorts get a boost from bitonic sort in SSE2, which wires into the bottom 
of a special-case quicksort, where any subrange of 9..16 elements gets done in 
xmm registers.

I think the preprocessing to test and format keys for such sorts
is useful anyway. I was trying to make radix sort usable, and that requires the 
same key prep. Even if the key prep hits its space limit and says,
the input is unsuitable for radix sort, it still makes the normal quicksort 
faster, since some key prefixes are shorter.

-- 
Engineers think that equations approximate reality.
Physicists think that reality approximates the equations.
Mathematicians never make the connection.


pgsql-hackers by date:

Previous
From: "Jim C. Nasby"
Date:
Subject: Re: Get explain output of postgresql in Tables
Next
From: Germán Poó Caamaño
Date:
Subject: Re: Get explain output of postgresql in Tables