Re: Extended Prefetching using Asynchronous IO - proposal and patch - Mailing list pgsql-hackers
From | John Lumby |
---|---|
Subject | Re: Extended Prefetching using Asynchronous IO - proposal and patch |
Date | |
Msg-id | BAY175-W526880A96F1B36CD529D4BA31E0@phx.gbl Whole thread Raw |
In response to | Re: Extended Prefetching using Asynchronous IO - proposal and patch (Heikki Linnakangas <hlinnakangas@vmware.com>) |
Responses |
Re: Extended Prefetching using Asynchronous IO - proposal and patch
Re: Extended Prefetching using Asynchronous IO - proposal and patch |
List | pgsql-hackers |
Thanks Heikki, ---------------------------------------- > Date: Tue, 24 Jun 2014 17:02:38 +0300 > From: hlinnakangas@vmware.com > To: johnlumby@hotmail.com; stark@mit.edu > CC: klaussfreire@gmail.com; pgsql-hackers@postgresql.org > Subject: Re: Extended Prefetching using Asynchronous IO - proposal and patch > > On 06/24/2014 04:29 PM, John Lumby wrote: >>> On Mon, Jun 23, 2014 at 2:43 PM, John Lumby <johnlumby@hotmail.com> wrote: >>>> It is when some *other* backend gets there first with the ReadBuffer that >>>> things are a bit trickier. The current version of the patch did polling for that case >>>> but that drew criticism, and so an imminent new version of the patch >>>> uses the sigevent mechanism. And there are other ways still. >>> >>> I'm a bit puzzled by this though. Postgres *already* has code for this >>> case. When you call ReadBuffer you set the bits on the buffer >> >> Good question. Let me explain. >> Yes, postgresql has code for the case of a backend is inside a synchronous >> read() or write(), performed from a ReadBuffer(), and some other backend >> wants that buffer. asynchronous aio is initiated not from ReadBuffer >> but from PrefetchBuffer, and performs its aio_read into an allocated, pinned, >> postgresql buffer. This is entirely different from the synchronous io case. >> Why? Because the issuer of the aio_read (the "originator") is unaware >> of this buffer pinned on its behalf, and is then free to do any other >> reading or writing it wishes, such as more prefetching or any other operation. >> And furthermore, it may *never* issue a ReadBuffer for the block which it >> prefetched. > > I still don't see the difference. Once an asynchronous read is initiated > on the buffer, it can't be used for anything else until the read has > finished. This is exactly the same situation as with a synchronous read: > after read() is called, the buffer can't be used for anything else until > the call finishes. Ah, now I see what you and Greg are asking. See my next imbed below. > > In particular, consider the situation from another backend's point of > view. Looking from another backend (i.e. one that didn't initiate the > read), there's no difference between a synchronous and asynchronous > read. So why do we need a different IPC mechanism for the synchronous > and asynchronous cases? We don't. > > I understand that *within the backend*, you need to somehow track the > I/O, and you'll need to treat synchronous and asynchronous I/Os > differently. But that's all within the same backend, and doesn't need to > involve the flags or locks in shared memory at all. The inter-process > communication doesn't need any changes. > >>> The problem with using the Buffers I/O in progress bit is that the I/O >>> might complete while the other backend is busy doing stuff. As long as >>> you can handle the I/O completion promptly -- either in callback or >>> thread or signal handler then that wouldn't matter. But I'm not clear >>> that any of those will work reliably. >> >> They both work reliably, but the criticism was that backend B polling >> an aiocb of an aio issued by backend A is not documented as >> being supported (although it happens to work), hence the proposed >> change to use sigevent. > > You didn't understand what Greg meant. You need to handle the completion > of the I/O in the same process that initiated it, by clearing the > in-progress bit of the buffer and releasing the I/O in-progress lwlock > on it. And you need to do that very quickly after the I/O has finished, > because there might be another backend waiting for the buffer and you > don't want him to wait longer than necessary. I think I understand the question now. I didn't spell out the details earlier. Let me explain a little more. With this patch, when read is issued, it is either a synchronous IO (as before), or an asynchronous aio_read (new, represented by both BM_IO_IN_PROGRESS *and* BM_AIO_IN_PROGRESS). The way other backends wait on a synchronous IO in progress is unchanged. But if BM_AIO_IN_PROGRESS, then *any* backend which requests ReadBuffer on this block (including originator) follows a new path through BufCheckAsync() which, depending on various flags and context, send the backend down to FileCompleteaio to check and maybe wait. So *all* backends who are waiting for a BM_AIO_IN_PROGRESS buffer will wait in that way. > > The question is, if you receive the notification of the I/O completion > using a signal or a thread, is it safe to release the lwlock from the > signal handler or a separate thread? In the forthcoming new version of the patch that uses sigevent, the originator locks a LWlock associated with that BAaiocb eXclusive, and , when signalled, in the signal handler it places that LWlock on a process-local queue of LWlocks awaiting release. (No, It cannot be safely released inside the signal handler or in a separate thread). Whenever the mainline passes a CHECK_INTERRUPTS macro and at a few additional points in bufmgr, the backend walks this process-local queue and releases those LWlocks. This is also done if the originator itself issues a ReadBuffer, which is the most frequent case in which it is released. Meanwhile, any other backend will simply acquire Shared and release. I think you are right that the existing io_in_progress_lock LWlock in the buf_header could be used for this, because if there is a aio in progress, then that lock cannot be in use for synchronous IO. I chose not to use it because I preferred to keep the wait/post for asynch io separate, but they could both use the same LWlock. However, the way the LWlock is acquired and released would still be a bit different because of the need to have the originator release it in its mainline. > >> By the way, on the "will it actually work though?" question which several folks >> have raised, I should mention that this patch has been in semi-production >> use for almost 2 years now in different stages of completion on all postgresql >> releases from 9.1.4 to 9.5 devel. I would guess it has had around >> 500 hours of operation by now. I'm sure there are bugs still to be >> found but I am confident it is fundamentally sound. > > Well, a committable version of this patch is going to look quite > different from the first version that you posted, so I don't put much > weight on how long you've tested the first version. Yes, I am quite willing to change it, time permitting. I take the works "committable version" as a positive sign ... > > - Heikki
pgsql-hackers by date: