Re: How to make ResourceOwnerForgetBuffer() O(1), instead of O(N^2) scale - Mailing list pgsql-hackers
From | Kouhei Kaigai |
---|---|
Subject | Re: How to make ResourceOwnerForgetBuffer() O(1), instead of O(N^2) scale |
Date | |
Msg-id | 9A28C8860F777E439AA12E8AEA7694F801056956@BPXM15GP.gisp.nec.co.jp Whole thread Raw |
In response to | Re: How to make ResourceOwnerForgetBuffer() O(1), instead of O(N^2) scale (Kouhei Kaigai <kaigai@ak.jp.nec.com>) |
List | pgsql-hackers |
> > On 10/03/2014 07:08 AM, Kouhei Kaigai wrote: > > > Hello, > > > > > > I recently got a trouble on development of my extension that > > > utilizes the shared buffer when it released each buffer page. > > > > > > This extension transfers contents of the shared buffers to GPU > > > device using DMA feature, then kicks a device kernel code. > > > > Wow, that sounds crazy. > > > > > Once backend/extension calls ReadBuffer(), resowner.c tracks which > > > buffer was referenced by the current resource owner, to ensure these > > > buffers being released at end of the transaction. > > > However, it seems to me implementation of resowner.c didn't assume > > > many buffers are referenced by a particular resource owner > simultaneously. > > > It manages the buffer index using an expandable array, then looks up > > > the target buffer by sequential walk but from the tail because > > > recently pinned buffer tends to be released first. > > > It made a trouble in my case. My extension pinned multiple thousands > > > buffers, so owner->buffers[] were enlarged and takes expensive cost > > > to walk on. > > > In my measurement, ResourceOwnerForgetBuffer() takes 36 seconds in > > > total during hash-joining 2M rows; even though hash-joining itself > > > takes less than 16 seconds. > > > > > > What is the best way to solve the problem? > > > > How about creating a separate ResourceOwner for these buffer pins, and > > doing a wholesale ResourceOwnerRelease() on it when you're done? > > > Let me clarify your idea. > > 1. Create a separate ResourceOwner under the CurrentResourceOwner. > 2. Switch CurrentResourceOwner to the self-constructed resource owner > during ReadBuffer() on thousands buffers. > 3. Switch back to the original CurrentResourceOwner. > 4. Kick DMA and run GPU device kernel (actual job running...) 5. Switch > CurrentResourceOwner to the self-constructed resource owner > again, during ReleaseBuffer() in reverse order. > 6. Switch back to the original CurrentResourceOwner, then calls > ResourceOwnerDelete(). > > Hmm... at this moment, I cannot find something not easy to implement. > I'd like to try this idea, then report it later. > Let me share the result. The above approach could eliminated my problem. Individual resource-owner for each set of shared-buffer and ReleaseBuffer() in reverse order reduced time to release pinned buffers from 36sec->67.3msec when I run hash-joining on 20M records; that involves 59168 buffers are pinned concurrently in maximum (32 asynchronous requests, a request packs 1849 buffers). Thanks for the suggestion! postgres=# explain analyze select * from t0 natural join t1 natural join t2; INFO: time to release buffers: 67.289ms QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------- Custom (GpuHashJoin) (cost=3468.00..471640.11 rows=19740924 width=139) (actual time=193.086..5913.459 rows=20000000 loops=1) pseudo scan tlist: 1:(t0.bid), 2:(t0.aid), 3:<t0.id>, 4:<t0.cat>, 5:<t0.cid>, 6:<t0.did>, 7:<t0.x>, 8:<t0.y>, 9:<t0.z>,11:<t2.btext>, 12:[t2.bid], 10:<t1.atext>, 13:[t1.aid] hash clause 1: (t0.bid = t2.bid) hash clause 2: (t0.aid= t1.aid) -> Custom (GpuScan) on t0 (cost=500.00..467167.24 rows=20000024 width=73) (actual time=7.757..1056.426rows=20000000 loops=1) -> Custom (MultiHash) (cost=734.00..734.00 rows=40000 width=37) (actual time=23.382..23.382rows=40000 loops=1) hash keys: bid -> Seq Scan on t2 (cost=0.00..734.00 rows=40000 width=37)(actual time=0.007..5.124 rows=40000 loops=1) -> Custom (MultiHash) (cost=734.00..734.00 rows=40000 width=37)(actual time=11.919..11.919 rows=40000 loops=1) hash keys: aid -> Seq Scan on t1 (cost=0.00..734.00rows=40000 width=37) (actual time=0.010..5.299 rows=40000 loops=1)Planning time: 0.904 msExecution time:6667.986 ms (13 rows) -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
pgsql-hackers by date: