Thread: Amazon now supporting GPU focused EC2 instances

Amazon now supporting GPU focused EC2 instances

From

Greg Stark

Date:

15 November 2010, 07:26:31

I keep wondering if there's a role for GPUs in Postgres and haven't
figure out how to integrate them yet but the day when we'll be
expected to exploit them seems to be getting nearer:

http://aws.typepad.com/aws/2010/11/new-ec2-instance-type-the-cluster-gpu-instance.html

-- 
greg

Re: Amazon now supporting GPU focused EC2 instances

From

Thom Brown

Date:

15 November 2010, 07:38:44

On 15 November 2010 11:26, Greg Stark <stark@mit.edu> wrote:

I keep wondering if there's a role for GPUs in Postgres and haven't
figure out how to integrate them yet but the day when we'll be
expected to exploit them seems to be getting nearer:

http://aws.typepad.com/aws/2010/11/new-ec2-instance-type-the-cluster-gpu-instance.html

Is this somewhere OpenCL is an option?

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

Re: Amazon now supporting GPU focused EC2 instances

From

Greg Stark

Date:

15 November 2010, 07:50:27

On Mon, Nov 15, 2010 at 11:37 AM, Thom Brown <thom@linux.com> wrote:
> Is this somewhere OpenCL is an option?
>

Sure.

Personally I wonder whether the context switching is fast enough to
handle a multi-user system. In the past graphics cards have always
been targeted at the idea of a single-user workstation playing games
where the game wants to talk directly to the graphics card. But
increasingly that's not true, with things like opengl based window
managers and so on. Perhaps things have changed enough that it would
be conceivable to have dozens of separate processes all downloading
snippets of code and time-sharing the GPU now.

The other problem is that Postgres is really not architected in a way
to make this easy. Since our data types are all flexible pluggable
sets of functions it's unclear whether any of them match the data
types that GPUs know about. The obvious algorithm to toss to the GPU
would be sorting -- but that would only work for floats and even then
it's not clear to me that the float semantics on the GPU necessarily
match those of the host processor.

I've seen papers on doing relational joins using GPUs and I'm sure
there are other algorithms we wonderful stuff we could do. But if it
comes at the cost of being able to handle arbitrary join clauses it'll
be a tough sacrifice to make.

-- 
greg

Re: Amazon now supporting GPU focused EC2 instances

From

Yeb Havinga

Date:

15 November 2010, 07:54:33

On 2010-11-15 12:37, Thom Brown wrote:

On 15 November 2010 11:26, Greg Stark <stark@mit.edu> wrote:
I keep wondering if there's a role for GPUs in Postgres and haven't
figure out how to integrate them yet but the day when we'll be
expected to exploit them seems to be getting nearer:

http://aws.typepad.com/aws/2010/11/new-ec2-instance-type-the-cluster-gpu-instance.html

Is this somewhere OpenCL is an option?

There is a talk about PgOpenCL upcoming -
http://www.postgresql.eu/events/schedule/pgday2010/session/56-introduction-to-pgopencl-a-new-procedural-language-for-postgresql-unlocking-the-power-of-the-gpgpu/

Maybe I've sent this link earlier, but the VLDB 2008 paper Parallelizing Query Optimization (http://www.vldb.org/pvldb/1/1453882.pdf) might be interesting: not much IO between CPU and GPU. (hmm how much catalog access is necessary for cost estimization)). I suspect the biggest challenge is rewriting essential parts into a SIMD algorithm.

regards,
Yeb Havinga

Re: Amazon now supporting GPU focused EC2 instances

From

Jeroen Vermeulen

Date:

15 November 2010, 12:18:05

On 2010-11-15 18:49, Greg Stark wrote:

> I've seen papers on doing relational joins using GPUs and I'm sure
> there are other algorithms we wonderful stuff we could do. But if it
> comes at the cost of being able to handle arbitrary join clauses it'll
> be a tough sacrifice to make.

Perhaps the coolest use of all is as an intermediate filtering stage for 
spatial joins, using collision detection.  Draw your data and your 
search region (slightly enlarged) as objects and ask the graphics card 
if the search region collides with anything.  Much like it might ask "is 
this player character bumping into any walls?"

IIRC it's been prototyped in Oracle with impressive results.  Using 
OpenGL, so quite portable between GPUs.

Jeroen

Re: Amazon now supporting GPU focused EC2 instances

From

Bruce Momjian

Date:

27 November 2010, 14:41:53

Jeroen Vermeulen wrote:
> On 2010-11-15 18:49, Greg Stark wrote:
> 
> > I've seen papers on doing relational joins using GPUs and I'm sure
> > there are other algorithms we wonderful stuff we could do. But if it
> > comes at the cost of being able to handle arbitrary join clauses it'll
> > be a tough sacrifice to make.
> 
> Perhaps the coolest use of all is as an intermediate filtering stage for 
> spatial joins, using collision detection.  Draw your data and your 
> search region (slightly enlarged) as objects and ask the graphics card 
> if the search region collides with anything.  Much like it might ask "is 
> this player character bumping into any walls?"

Netezza has FPGAs (Field-Programmable Gate Arrays) for each disk drive
that performs a similar function:
http://www.dbms2.com/2009/08/08/netezza-fpga/The longer answer is:    * Projections    * Restrictions/selections    *
Visibility,which for now seems to mean recognizing which rows areand arent valid under Netezzas form of MVCC
(MultiVersionConcurrencyControl).    * Compression and/or decompression (Im a little confused as towhich, but I imagine
itsboth)    * Netezzas form of UDFs (User-Defined Functions)

More details:
http://www.theregister.co.uk/2006/10/03/netezza_annual_conference_roundup/Another area in which Netezza has been hiding
itslight under a bushelis in the matter of FPGAs (field programmable gate arrays). FPGAs areused to process data as it
isstreamed off disk. Note that this isimportant to understand. Most data warehouse appliances (and, indeed,conventional
products)use a caching architecture whereby data is readfrom disk and then held in cache for processing. Netezza, on
theotherhand, uses an approach that queries the data as it comes off disk beforepassing the results on to memory. In
otherwords it uses a streamingarchitecture in which the data is streamed through the queries (whoseprograms have been
loadedinto the FPGA) rather than being stored (evenif in memory) and then queried.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +

Re: Amazon now supporting GPU focused EC2 instances

From

3dmashup

Date:

19 December 2010, 16:55:31

Thom Brown-2 wrote:
> 
> On 15 November 2010 11:26, Greg Stark <stark@mit.edu> wrote:
> 
>> I keep wondering if there's a role for GPUs in Postgres and haven't
>> figure out how to integrate them yet but the day when we'll be
>> expected to exploit them seems to be getting nearer:
>>
>>
>> http://aws.typepad.com/aws/2010/11/new-ec2-instance-type-the-cluster-gpu-instance.html
>>
>>  <http://www.postgresql.org/mailpref/pgsql-hackers>
> 
> 
> Is this somewhere OpenCL is an option?
> 
> -- 
> Thom Brown
> Twitter: @darkixion
> IRC (freenode): dark_ixion
> Registered Linux user: #516935
> 
> 

OpenCL is a good option as its

1) Open -  supported by many vendors Apple, AMD/ATI, Nvidia, IBM, Intel, ..

2) Runs on CPU's and GPU's.  - There is  definitely  I/O overhead and other
challenges to running code on a GPU.  See  my Pg EU  Day  2010 prezzy 
http://www.scribd.com/doc/44661593/PostgreSQL-OpenCL-Procedural-Language.

3)  Supports SIMD  instructions  and multi-threading on CPU's  -   No GPU
required!  This can boost  an app by  executing 4 floating points  ops 
concurrently per FPU. 

There is an interesting class of chip due from AMD, due in mid 2011, with
combined CPU & GPU.  Hopefully these chips will  reduce the GPU I/O 
overhead and be more amenable to  program concurrency.  See 
http://sites.amd.com/us/fusion/apu/Pages/fusion.aspx.

For some applications running SIMD  instructions and using all the cores
will provide a boost. Intel are pushing this point staying  running SSE4 
instructions  on 4 or 8 cores can get up to 16x increase.  Of course you can
hand-roll  all this  SSE4  code, But  why not let an OpenCL  compiler  get
you 80% of the increases?

My  first phase was to implement some OpenCL code  as  User Defined
Functions. This comes with the overhead of creating OpenCL  run-time context
and compiling OpenCL Kernels.  Phase Two is the OpenCL as procedure language
and moving run-time context and program compiling to PG_init() and  the SQL
create function  statement.

For floating point numeric and image processing  apps OpenCL and GPU's are
pretty much  a no-brainer.
They are are easier to parallelize and and have been  proven in many case to
benefit from GPU's

The big question is can OpenCL improve  any existing apps with sorting and
searching  in parallel? We can find out more  in Jan  2011 when I'll make a
drop of  the pGOpenCL  code.

Regards

Tim

-- 
View this message in context:
http://postgresql.1045698.n5.nabble.com/Amazon-now-supporting-GPU-focused-EC2-instances-tp3265420p3311407.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.