2012/1/23 Robert Haas <robertmhaas@gmail.com>:
> On Sun, Jan 22, 2012 at 10:48 AM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
>> I tried to implement a fdw module that is designed to utilize GPU
>> devices to execute
>> qualifiers of sequential-scan on foreign tables managed by this module.
>>
>> It was named PG-Strom, and the following wikipage gives a brief
>> overview of this module.
>> http://wiki.postgresql.org/wiki/PGStrom
>>
>> In our measurement, it achieves about x10 times faster on
>> sequential-scan with complex-
>> qualifiers, of course, it quite depends on type of workloads.
>
> That's pretty neat. In terms of tuning the non-GPU based
> implementation, have you done any profiling? Sometimes that leads to
> an "oh, woops" moment.
>
Not yet, except for \timing.
What options are available to see rate of workloads of components
within a particular query?
I tried to google some keywords, but does not hit to me.
As an aside, I also tries to modify is_device_executable_qual() always
return false to disable qualifiers pushed-down.
In this case, 2100ms of 7679ms was consumed within this module, thus,
I guess rest of 5500ms was mostly consumed by ExecQual(), although
it is just an estimation...
postgres=# SET pg_strom.exec_profile = on;
SET
Time: 1.075 ms
postgres=# SELECT count(*) FROM ftbl WHERE sqrt((x-25.6)^2 + (y-12.8)^2) < 10;
INFO: PG-Strom Exec Profile on "ftbl"
INFO: Total PG-Strom consumed time: 2100.898 ms
INFO: Time to JIT Compile GPU code: 0.000 ms
INFO: Time to initialize devices: 0.000 ms
INFO: Time to Load column-stores: 7.013 ms
INFO: Time to Scan column-stores: 1219.746 ms
INFO: Time to Fetch virtual tuples: 874.095 ms
INFO: Time of GPU Synchronization: 0.000 ms
INFO: Time of Async memcpy: 0.000 ms
INFO: Time of Async kernel exec: 0.000 mscount
------- 3159
(1 row)
Time: 7679.342 ms
Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>