Re: Way to check whether a particular block is on the shared_buffer? - Mailing list pgsql-hackers
From | Kouhei Kaigai |
---|---|
Subject | Re: Way to check whether a particular block is on the shared_buffer? |
Date | |
Msg-id | 9A28C8860F777E439AA12E8AEA7694F8011BA901@BPXM15GP.gisp.nec.co.jp Whole thread Raw |
In response to | Re: Way to check whether a particular block is on the shared_buffer? (Kouhei Kaigai <kaigai@ak.jp.nec.com>) |
Responses |
Re: Way to check whether a particular block is on the shared_buffer?
|
List | pgsql-hackers |
I found one other, but tiny, problem to implement SSD-to-GPU direct data transfer feature under the PostgreSQL storage. Extension cannot know the raw file descriptor opened by smgr. I expect an extension issues an ioctl(2) on the special device file on behalf of the special kernel driver, to control the P2P DMA. This ioctl(2) will pack file descriptor of the DMA source and some various information (like base position, range, destination device pointer, ...). However, the raw file descriptor is wrapped in the fd.c, instead of the File handler, thus, not visible to extension. oops... The attached patch provides a way to obtain raw file descriptor (and relevant flags) of a particular File virtual file descriptor on PostgreSQL. (No need to say, extension has to treat the raw descriptor carefully not to give an adverse effect to the storage manager.) How about this tiny enhancement? > > -----Original Message----- > > From: pgsql-hackers-owner@postgresql.org > > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Robert Haas > > Sent: Saturday, February 13, 2016 1:46 PM > > To: Kaigai Kouhei(海外 浩平) > > Cc: Jim Nasby; pgsql-hackers@postgresql.org; Amit Langote > > Subject: Re: [HACKERS] Way to check whether a particular block is on the > > shared_buffer? > > > > On Thu, Feb 11, 2016 at 9:05 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > > > Hmm. In my experience, it is often not a productive discussion whether > > > a feature is niche or commodity. So, let me change the viewpoint. > > > > > > We may utilize OS-level locking mechanism here. > > > > > > Even though it depends on filesystem implementation under the VFS, > > > we may use inode->i_mutex lock that shall be acquired during the buffer > > > copy from user to kernel, at least, on a few major filesystems; ext4, > > > xfs and btrfs in my research. As well, the modified NVMe SSD driver can > > > acquire the inode->i_mutex lock during P2P DMA transfer. > > > > > > Once we can consider the OS buffer is updated atomically by the lock, > > > we don't need to worry about corrupted pages, but still needs to pay > > > attention to the scenario when updated buffer page is moved to GPU. > > > > > > In this case, PD_ALL_VISIBLE may give us a hint. GPU side has no MVCC > > > infrastructure, so I intend to move all-visible pages only. > > > If someone updates the buffer concurrently, then write out the page > > > including invisible tuples, PD_ALL_VISIBLE flag shall be cleared because > > > updated tuples should not be visible to the transaction which issued > > > P2P DMA. > > > > > > Once GPU met a page with !PD_ALL_VISIBLE, it can return an error status > > > that indicates CPU to retry this page again. In this case, this page is > > > likely loaded to the shared buffer already, so retry penalty is not so > > > much. > > > > > > I'll try to investigate the implementation in this way. > > > Please correct me, if I misunderstand something (especially, treatment > > > of PD_ALL_VISIBLE). > > > > I suppose there's no theoretical reason why the buffer couldn't go > > from all-visible to not-all-visible and back to all-visible again all > > during the time you are copying it. > > > The backend process that is copying the data to GPU has a transaction > in-progress (= not committed). Is it possible to get the updated buffer > page back to the all-visible state again? > I expect that in-progress transactions works as a blocker for backing > to all-visible. Right? > > > Honestly, I think trying to access buffers without going through > > shared_buffers is likely to be very hard to make correct and probably > > a loser. > > > No challenge, no outcome. ;-) > > > Copying the data into shared_buffers and then to the GPU is, > > doubtless, at least somewhat slower. But I kind of doubt that it's > > enough slower to make up for all of the problems you're going to have > > with the approach you've chosen. > > > Honestly, I'm still uncertain whether it works well as I expects. > However, scan workload on the table larger than main memory is > headache for PG-Strom, so I'd like to try ideas we can implement. > > Thanks, > -- > NEC Business Creation Division / PG-Strom Project > KaiGai Kohei <kaigai@ak.jp.nec.com> >
Attachment
pgsql-hackers by date: