Re: How to make ResourceOwnerForgetBuffer() O(1), instead of O(N^2) scale - Mailing list pgsql-hackers

From Kouhei Kaigai
Subject Re: How to make ResourceOwnerForgetBuffer() O(1), instead of O(N^2) scale
Date
Msg-id 9A28C8860F777E439AA12E8AEA7694F801056956@BPXM15GP.gisp.nec.co.jp
Whole thread Raw
In response to Re: How to make ResourceOwnerForgetBuffer() O(1), instead of O(N^2) scale  (Kouhei Kaigai <kaigai@ak.jp.nec.com>)
List pgsql-hackers
> > On 10/03/2014 07:08 AM, Kouhei Kaigai wrote:
> > > Hello,
> > >
> > > I recently got a trouble on development of my extension that
> > > utilizes the shared buffer when it released each buffer page.
> > >
> > > This extension transfers contents of the shared buffers to GPU
> > > device using DMA feature, then kicks a device kernel code.
> >
> > Wow, that sounds crazy.
> >
> > > Once backend/extension calls ReadBuffer(), resowner.c tracks which
> > > buffer was referenced by the current resource owner, to ensure these
> > > buffers being released at end of the transaction.
> > > However, it seems to me implementation of resowner.c didn't assume
> > > many buffers are referenced by a particular resource owner
> simultaneously.
> > > It manages the buffer index using an expandable array, then looks up
> > > the target buffer by sequential walk but from the tail because
> > > recently pinned buffer tends to be released first.
> > > It made a trouble in my case. My extension pinned multiple thousands
> > > buffers, so owner->buffers[] were enlarged and takes expensive cost
> > > to walk on.
> > > In my measurement, ResourceOwnerForgetBuffer() takes 36 seconds in
> > > total during hash-joining 2M rows; even though hash-joining itself
> > > takes less than 16 seconds.
> > >
> > > What is the best way to solve the problem?
> >
> > How about creating a separate ResourceOwner for these buffer pins, and
> > doing a wholesale ResourceOwnerRelease() on it when you're done?
> >
> Let me clarify your idea.
>
> 1. Create a separate ResourceOwner under the CurrentResourceOwner.
> 2. Switch CurrentResourceOwner to the self-constructed resource owner
>    during ReadBuffer() on thousands buffers.
> 3. Switch back to the original CurrentResourceOwner.
> 4. Kick DMA and run GPU device kernel (actual job running...) 5. Switch
> CurrentResourceOwner to the self-constructed resource owner
>    again, during ReleaseBuffer() in reverse order.
> 6. Switch back to the original CurrentResourceOwner, then calls
>    ResourceOwnerDelete().
>
> Hmm... at this moment, I cannot find something not easy to implement.
> I'd like to try this idea, then report it later.
>
Let me share the result.

The above approach could eliminated my problem. Individual resource-owner
for each set of shared-buffer and ReleaseBuffer() in reverse order reduced
time to release pinned buffers from 36sec->67.3msec when I run hash-joining
on 20M records; that involves 59168 buffers are pinned concurrently in
maximum (32 asynchronous requests, a request packs 1849 buffers).

Thanks for the suggestion!


postgres=# explain analyze select * from t0 natural join t1 natural join t2;
INFO:  time to release buffers: 67.289ms
  QUERY PLAN 


-----------------------------------------------------------------------------------------------------------------------------------------------
Custom (GpuHashJoin)  (cost=3468.00..471640.11 rows=19740924 width=139) (actual time=193.086..5913.459 rows=20000000
loops=1) pseudo scan tlist: 1:(t0.bid), 2:(t0.aid), 3:<t0.id>, 4:<t0.cat>, 5:<t0.cid>, 6:<t0.did>, 7:<t0.x>, 8:<t0.y>,
9:<t0.z>,11:<t2.btext>, 12:[t2.bid], 10:<t1.atext>, 13:[t1.aid]  hash clause 1: (t0.bid = t2.bid)  hash clause 2:
(t0.aid= t1.aid)  ->  Custom (GpuScan) on t0  (cost=500.00..467167.24 rows=20000024 width=73) (actual
time=7.757..1056.426rows=20000000 loops=1)  ->  Custom (MultiHash)  (cost=734.00..734.00 rows=40000 width=37) (actual
time=23.382..23.382rows=40000 loops=1)        hash keys: bid        ->  Seq Scan on t2  (cost=0.00..734.00 rows=40000
width=37)(actual time=0.007..5.124 rows=40000 loops=1)        ->  Custom (MultiHash)  (cost=734.00..734.00 rows=40000
width=37)(actual time=11.919..11.919 rows=40000 loops=1)              hash keys: aid              ->  Seq Scan on t1
(cost=0.00..734.00rows=40000 width=37) (actual time=0.010..5.299 rows=40000 loops=1)Planning time: 0.904 msExecution
time:6667.986 ms 
(13 rows)

--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>




pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Proposal for better support of time-varying timezone abbreviations
Next
From: Tom Lane
Date:
Subject: Re: Locking for Rename To new_name works differently for different objects