Re: Way to check whether a particular block is on the shared_buffer? - Mailing list pgsql-hackers
From | Kouhei Kaigai |
---|---|
Subject | Re: Way to check whether a particular block is on the shared_buffer? |
Date | |
Msg-id | 9A28C8860F777E439AA12E8AEA7694F8011A6AA7@BPXM15GP.gisp.nec.co.jp Whole thread Raw |
In response to | Re: Way to check whether a particular block is on the shared_buffer? (Kouhei Kaigai <kaigai@ak.jp.nec.com>) |
Responses |
Re: Way to check whether a particular block is on the
shared_buffer?
|
List | pgsql-hackers |
> > KaiGai-san, > > > > On 2016/02/01 10:38, Kouhei Kaigai wrote: > > > As an aside, background of my motivation is the slide below: > > > http://www.slideshare.net/kaigai/sqlgpussd-english > > > (LT slides in JPUG conference last Dec) > > > > > > I'm under investigation of SSD-to-GPU direct feature on top of > > > the custom-scan interface. It intends to load a bunch of data > > > blocks on NVMe-SSD to GPU RAM using P2P DMA, prior to the data > > > loading onto CPU/RAM, to preprocess the data to be filtered out. > > > It only makes sense if the target blocks are not loaded to the > > > CPU/RAM yet, because SSD device is essentially slower than RAM. > > > So, I like to have a reliable way to check the latest status of > > > the shared buffer, to kwon whether a particular block is already > > > loaded or not. > > > > Quite interesting stuff, thanks for sharing! > > > > I'm in no way expert on this but could this generally be attacked from the > > smgr API perspective? Currently, we have only one implementation - md.c > > (the hard-coded RelationData.smgr_which = 0). If we extended that and > > provided end-to-end support so that there would be md.c alternatives to > > storage operations, I guess that would open up opportunities for > > extensions to specify smgr_which as an argument to ReadBufferExtended(), > > provided there is already support in place to install md.c alternatives > > (perhaps in .so). Of course, these are just musings and, perhaps does not > > really concern the requirements of custom scan methods you have been > > developing. > > > Thanks for your idea. Indeed, smgr hooks are good candidate to implement > the feature, however, what I need is a thin intermediation layer rather > than alternative storage engine. > > It becomes clear we need two features here. > 1. A feature to check whether a particular block is already on the shared > buffer pool. > It is available. BufTableLookup() under the BufMappingPartitionLock > gives us the information we want. > > 2. A feature to suspend i/o write-out towards a particular blocks > that are registered by other concurrent backend, unless it is not > unregistered (usually, at the end of P2P DMA). > ==> to be discussed. > > When we call smgrwrite(), like FlushBuffer(), it fetches function pointer > from the 'smgrsw' array, then calls smgr_write. > > void > smgrwrite(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum, > char *buffer, bool skipFsync) > { > (*(smgrsw[reln->smgr_which].smgr_write)) (reln, forknum, blocknum, > buffer, skipFsync); > } > > If extension would overwrite smgrsw[] array, then call the original > function under the control by extension, it allows to suspend the call > of the original smgr_write until completion of P2P DMA. > > It may be a minimum invasive way to implement, and portable to any > further storage layers. > > How about your thought? Even though it is a bit different from your > original proposition. > I tried to design a draft of enhancement to realize the above i/o write-out suspend/resume, with less invasive way as possible as we can. ASSUMPTION: I intend to implement this feature as a part of extension, because this i/o suspend/resume checks are pureoverhead increment for the core features, unless extension which utilizes it. Three functions shall be added: extern int GetStorageMgrNumbers(void); extern f_smgr GetStorageMgrHandlers(int smgr_which); extern void SetStorageMgrHandlers(int smgr_which, f_smgr smgr_handlers); As literal, GetStorageMgrNumbers() returns the number of storage manager currently installed. It always return 1 right now. GetStorageMgrHandlers() returns the currently configured f_smgr table to the supplied smgr_which. It allows extensions to know current configuration of the storage manager, even if other extension already modified it. SetStorageMgrHandlers() assigns the supplied 'smgr_handlers', instead of the current one. If extension wants to intermediate 'smgr_write', extension will replace the 'smgr_write' by own function, then call the original function, likely mdwrite, from the alternative function. In this case, call chain shall be: FlushBuffer, and others... +-- smgrwrite(...) +-- (extension's own function) +-- mdwrite Once extension's own function blocks write i/o until P2P DMA completed by concurrent process, we don't need to care about partial update of OS cache or storage device. It is not difficult for extensions to implement a feature to track/untrack a pair of (relFileNode, forkNum, blockNum), automatic untracking according to the resource-owner, and a mechanism to block the caller by P2P DMA completion. On the other hands, its flexibility seems to me a bit larger than necessity (what I want to implement is just a blocker of buffer write i/o). And, it may give people wrong impression for the feature of pluggable storage. How about folk's thought? Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
pgsql-hackers by date: