Thread: PG-Strom - A GPU optimized asynchronous executor module
Hi, I tried to implement a fdw module that is designed to utilize GPU devices to execute qualifiers of sequential-scan on foreign tables managed by this module. It was named PG-Strom, and the following wikipage gives a brief overview of this module. http://wiki.postgresql.org/wiki/PGStrom In our measurement, it achieves about x10 times faster on sequential-scan with complex- qualifiers, of course, it quite depends on type of workloads. Example) A query counts number of records with (x,y) located within a particular range. A regular table 'rtbl' and foreign table 'ftbl' contains same contents; with 10 million of records. postgres=# SELECT count(*) FROM rtbl WHERE sqrt((x-25.6)^2 + (y-12.8)^2) < 51.2;count -------43134 (1 row) Time: 10537.069 ms postgres=# SELECT count(*) FROM ftbl WHERE sqrt((x-25.6)^2 + (y-12.8)^2) < 51.2;count -------43134 (1 row) Time: 744.252 ms (*) Let's see the "How to use" section of the wikipage to reproduce my testcase. It seems to me quite good result. However, I doubt myself whether the case of sequential-scan on regular table was not tuned appropriately. Could you tell me some hint to tune up sequential scan on large tables? All I did on the test case is expansion of shared_buffers to 1024MB that is enough to load whole of the example tables on memory. Thanks, -- KaiGai Kohei <kaigai@kaigai.gr.jp>
On Sun, Jan 22, 2012 at 10:48 AM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote: > I tried to implement a fdw module that is designed to utilize GPU > devices to execute > qualifiers of sequential-scan on foreign tables managed by this module. > > It was named PG-Strom, and the following wikipage gives a brief > overview of this module. > http://wiki.postgresql.org/wiki/PGStrom > > In our measurement, it achieves about x10 times faster on > sequential-scan with complex- > qualifiers, of course, it quite depends on type of workloads. That's pretty neat. In terms of tuning the non-GPU based implementation, have you done any profiling? Sometimes that leads to an "oh, woops" moment. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
2012/1/23 Robert Haas <robertmhaas@gmail.com>: > On Sun, Jan 22, 2012 at 10:48 AM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote: >> I tried to implement a fdw module that is designed to utilize GPU >> devices to execute >> qualifiers of sequential-scan on foreign tables managed by this module. >> >> It was named PG-Strom, and the following wikipage gives a brief >> overview of this module. >> http://wiki.postgresql.org/wiki/PGStrom >> >> In our measurement, it achieves about x10 times faster on >> sequential-scan with complex- >> qualifiers, of course, it quite depends on type of workloads. > > That's pretty neat. In terms of tuning the non-GPU based > implementation, have you done any profiling? Sometimes that leads to > an "oh, woops" moment. > Not yet, except for \timing. What options are available to see rate of workloads of components within a particular query? I tried to google some keywords, but does not hit to me. As an aside, I also tries to modify is_device_executable_qual() always return false to disable qualifiers pushed-down. In this case, 2100ms of 7679ms was consumed within this module, thus, I guess rest of 5500ms was mostly consumed by ExecQual(), although it is just an estimation... postgres=# SET pg_strom.exec_profile = on; SET Time: 1.075 ms postgres=# SELECT count(*) FROM ftbl WHERE sqrt((x-25.6)^2 + (y-12.8)^2) < 10; INFO: PG-Strom Exec Profile on "ftbl" INFO: Total PG-Strom consumed time: 2100.898 ms INFO: Time to JIT Compile GPU code: 0.000 ms INFO: Time to initialize devices: 0.000 ms INFO: Time to Load column-stores: 7.013 ms INFO: Time to Scan column-stores: 1219.746 ms INFO: Time to Fetch virtual tuples: 874.095 ms INFO: Time of GPU Synchronization: 0.000 ms INFO: Time of Async memcpy: 0.000 ms INFO: Time of Async kernel exec: 0.000 mscount ------- 3159 (1 row) Time: 7679.342 ms Thanks, -- KaiGai Kohei <kaigai@kaigai.gr.jp>
On Mon, Jan 23, 2012 at 1:38 AM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote: > What options are available to see rate of workloads of components > within a particular query? I usually use oprofile, though I'm given to understand it's been superseded by a new tool called perf. I haven't had a chance to experiment with perf yet, though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sun, Jan 22, 2012 at 3:48 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote: > I tried to implement a fdw module that is designed to utilize GPU > devices to execute > qualifiers of sequential-scan on foreign tables managed by this module. > > It was named PG-Strom, and the following wikipage gives a brief > overview of this module. > http://wiki.postgresql.org/wiki/PGStrom > > In our measurement, it achieves about x10 times faster on > sequential-scan with complex- > qualifiers, of course, it quite depends on type of workloads. Very cool. Someone's been busy. I see you've introduced 3 new features here at same time * GPU access * column store * compiled WHERE clauses It would be useful to see if we can determine which of those gives the most benefit and whether other directions emerge. Also, the query you mention is probably the best performing query you can come up with. It looks like a GIS query, yet isn't. Would it be possible to run tests on the TPC-H suite and do a full comparison of strengths/weaknesses so we can understand the breadth of applicability of the techniques. This is a very interesting line of discussion, but please can we hold off further posts about it until after the CF is over? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
2012/1/23 Simon Riggs <simon@2ndquadrant.com>: > On Sun, Jan 22, 2012 at 3:48 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote: > >> I tried to implement a fdw module that is designed to utilize GPU >> devices to execute >> qualifiers of sequential-scan on foreign tables managed by this module. >> >> It was named PG-Strom, and the following wikipage gives a brief >> overview of this module. >> http://wiki.postgresql.org/wiki/PGStrom >> >> In our measurement, it achieves about x10 times faster on >> sequential-scan with complex- >> qualifiers, of course, it quite depends on type of workloads. > > Very cool. Someone's been busy. > > I see you've introduced 3 new features here at same time > * GPU access > * column store > * compiled WHERE clauses > > It would be useful to see if we can determine which of those gives the > most benefit and whether other directions emerge. > > Also, the query you mention is probably the best performing query you > can come up with. It looks like a GIS query, yet isn't. Would it be > possible to run tests on the TPC-H suite and do a full comparison of > strengths/weaknesses so we can understand the breadth of applicability > of the techniques. > DBT-2 is a good alternative, even though TPC-H is expensive to run. > This is a very interesting line of discussion, but please can we hold > off further posts about it until after the CF is over? > Yep, I agree. We should handle existing patches first, then new features of v9.3. I'll back to review the pgsql_fdw. Thanks, -- KaiGai Kohei <kaigai@kaigai.gr.jp>
On Mon, Jan 23, 2012 at 2:49 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote: >> Also, the query you mention is probably the best performing query you >> can come up with. It looks like a GIS query, yet isn't. Would it be >> possible to run tests on the TPC-H suite and do a full comparison of >> strengths/weaknesses so we can understand the breadth of applicability >> of the techniques. >> > DBT-2 is a good alternative, even though TPC-H is expensive to run. DBT-2 is an OLTP test, not a DSS/DW test. I'm not interested in the full TPC-H test, just a query by query comparison of how well this stacks up. If there are other tests that are also balanced/representative, I'd like to see those also. Just so we can see the benefit envelope. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services