Re: Way to check whether a particular block is on the shared_buffer? - Mailing list pgsql-hackers

From Kouhei Kaigai
Subject Re: Way to check whether a particular block is on the shared_buffer?
Date
Msg-id 9A28C8860F777E439AA12E8AEA7694F8011A7600@BPXM15GP.gisp.nec.co.jp
Whole thread Raw
In response to Re: Way to check whether a particular block is on the shared_buffer?  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
Responses Re: Way to check whether a particular block is on the shared_buffer?
List pgsql-hackers
> -----Original Message-----
> From: Jim Nasby [mailto:Jim.Nasby@BlueTreble.com]
> Sent: Friday, February 05, 2016 9:17 AM
> To: Kaigai Kouhei(海外 浩平); pgsql-hackers@postgresql.org; Robert Haas
> Cc: Amit Langote
> Subject: Re: [HACKERS] Way to check whether a particular block is on the
> shared_buffer?
>
> On 2/4/16 12:30 AM, Kouhei Kaigai wrote:
> >> 2. A feature to suspend i/o write-out towards a particular blocks
> >> >    that are registered by other concurrent backend, unless it is not
> >> >    unregistered (usually, at the end of P2P DMA).
> >> >    ==> to be discussed.
>
> I think there's still a race condition here though...
>
> A
> finds buffer not in shared buffers
>
> B
> reads buffer in
> modifies buffer
> starts writing buffer to OS
>
> A
> Makes call to block write, but write is already in process; thinks
> writes are now blocked
> Reads corrupted block
> Much hilarity ensues
>
> Or maybe you were just glossing over that part for brevity.
>
> ...
>
> > I tried to design a draft of enhancement to realize the above i/o write-out
> > suspend/resume, with less invasive way as possible as we can.
> >
> >    ASSUMPTION: I intend to implement this feature as a part of extension,
> >        because this i/o suspend/resume checks are pure overhead increment
> >        for the core features, unless extension which utilizes it.
> >
> > Three functions shall be added:
> >
> > extern int    GetStorageMgrNumbers(void);
> > extern f_smgr GetStorageMgrHandlers(int smgr_which);
> > extern void   SetStorageMgrHandlers(int smgr_which, f_smgr smgr_handlers);
> >
> > As literal, GetStorageMgrNumbers() returns the number of storage manager
> > currently installed. It always return 1 right now.
> > GetStorageMgrHandlers() returns the currently configured f_smgr table to
> > the supplied smgr_which. It allows extensions to know current configuration
> > of the storage manager, even if other extension already modified it.
> > SetStorageMgrHandlers() assigns the supplied 'smgr_handlers', instead of
> > the current one.
> > If extension wants to intermediate 'smgr_write', extension will replace
> > the 'smgr_write' by own function, then call the original function, likely
> > mdwrite, from the alternative function.
> >
> > In this case, call chain shall be:
> >
> >    FlushBuffer, and others...
> >     +-- smgrwrite(...)
> >          +-- (extension's own function)
> >               +-- mdwrite
>
> ISTR someone (Robert Haas?) complaining that this method of hooks is
> cumbersome to use and can be fragile if multiple hooks are being
> installed. So maybe we don't want to extend it's usage...
>
> I'm also not sure whether this is better done with an smgr hook or a
> hook into shared buffer handling...
>
# sorry, I oversight the later part of your reply.

I can agree that smgr hooks shall be primarily designed to make storage
systems pluggable, even if we can use this hooks for suspend & resume of
write i/o stuff.
In addition, "pluggable storage" is a long-standing feature, even though
it is not certain whether existing smgr hooks are good starting point.
It may be a risk if we implement a grand feature on top of the hooks
but out of its primary purpose.

So, my preference is a mechanism to hook buffer write to implement this
feature. (Or, maybe a built-in write i/o suspend / resume stuff if it
has nearly zero cost when no extension activate the feature.)
One downside of this approach is larger number of hook points.
We have to deploy the hook nearby existing smgrwrite of LocalBufferAlloc
and FlushRelationBuffers, in addition to FlushBuffer, at least.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>




pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: [PATCH] Refactoring of LWLock tranches
Next
From: Kyotaro HORIGUCHI
Date:
Subject: Re: Incorrect formula for SysV IPC parameters