Re: Extended Prefetching using Asynchronous IO - proposal and patch - Mailing list pgsql-hackers

From John Lumby
Subject Re: Extended Prefetching using Asynchronous IO - proposal and patch
Date
Msg-id BAY175-W412FF89303686022A9F16AA3190@phx.gbl
Whole thread Raw
In response to Re: Extended Prefetching using Asynchronous IO - proposal and patch  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: Extended Prefetching using Asynchronous IO - proposal and patch  (John Lumby <johnlumby@hotmail.com>)
List pgsql-hackers
I am attaching the new version of the patch with support for use of sigevent
to report completion of asynch io and the new LWlock for waiters to
wait for originator to release it.

This patch is based on up-to-date git head but the asyncio design is
a bit behind the curve of recent discussions here.   Specifically,
sigevent is still optional (see README for how to configure it)
and non-originators still refer to the originator's aiocb to check the
completion code  (but not for waiting for it to complete).

All sigevent-related code is marked by
#ifdef USE_AIO_SIGEVENT
to make it easier to find.

Also although it "works" it is not functionally complete  -
search for FIXME in
src/backend/storage/buffer/buf_async.c
and has not had nearly as much testing.

________________________________________________________________________

I will try to paste earlier points and imbed comments :


Heikki wrote :

> Ok, doing the work in CHECK_FOR_INTERRUPTS sounds safe. But is that fast
> enough? We have never made any hard guarantees on how often
> CHECK_FOR_INTERRUPTS() is called. In particular, if you're running 3rd
> party C code or PL code, there might be no CHECK_FOR_INTERRUPTS() calls
> for many seconds, or even more. That's a long time to hold onto a buffer
> I/O lock. I don't think that's acceptable :-(.

true but remember this case is the very infrequent one,  less than 0.2 %
 (in my benchmark).

>
>> I think you are right that the existing io_in_progress_lock LWlock in the
>> buf_header  could be used for this,  because if there is a aio in progress,
>> then that lock cannot be in use for synchronous IO.  I chose not to use it
>> because I preferred to keep the wait/post for asynch io separate,
>>   but they could both use the same LWlock.   However,   the way the LWlock
>> is acquired and released would still be a bit different because of the need
>> to have the originator release it in its mainline.
>
> It would be nice to use the same LWLock.

I think that will work.

>
> However, if releasing a regular LWLock in a signal handler is not safe,
> and cannot be made safe, perhaps we should, after all, invent a whole
> new mechanism. One that would make it safe to release the lock in a
> signal handler.

It would take rewriting lwlock.c and still be devillish hard to test -
I would prefer not.

>
 >>
>>> Well, a committable version of this patch is going to look quite
>>> different from the first version that you posted, so I don't put much
>>> weight on how long you've tested the first version.
>>
>> Yes,  I am quite willing to change it,  time permitting.
>> I take the works "committable version" as a positive sign ...
>
> BTW, sorry if I sound negative, I'm actually quite excited about this
> feature. A patch like this take a lot of work, and usually several
> rewrites, until it's ready ;-). But I'm looking forward for it.

I am grateful you spent the time to study it.    
knowing the code,  I am not surprised it made you a bit grumpy ...


________________________________________________________

discussion about what is the difference between a synchronous read
versus an asynchronous read as far as non-originator waiting on it is concerned.

I thought a bit more about this.   There are currently two differences,
one of which can easily be changed and one not so easy.

1)     The current code,  even with sigevent,  still makes the non-originator waiter
         call aio_error on the originator's aiocb to get the completion code.
         For sigevent variation,  easily changed to have the originator always call aio_error
         (from its CHECK_INTERRUPTS or from FIleCompleteaio)        
         and store that in the BAiocb.
         My idea of why not to do that  was that,  by having the non-originator check the aiocb,
        this would allow the waiter to proceed sooner.   But for a different reason it actually
         doesn't.   (The non-originator must still wait for the LWlock release)

  2)   Buffer pinning and  returning the BufferAiocb to the free list
        With synchronous IO,    each backend that calls a ReadBuffer must pin the buffer
        early in the process.
        With asynchronous IO,    initially only the originator gets the pin
        (and that is during PrefetchBuffer,  not Readbuffer)
         When the aio completes and some backend checks that completion,
        then the backend has various responsibilities:

               .   pin the buffer if it did not already have one (from prefetch)
               .  if it was the last such backend to make that check
                  (amongst the cohort waiting on it)
                   then pin the buffer if it did not already have one (from prefetch)

        Those functions are different from the synchronous IO case.
        I think code clarity alone may dictate keeping these separate.

___________________________________________________________________
various discussion of semaphores and LWlocks:


----------------------------------------
> Date: Wed, 25 Jun 2014 23:17:53 +0200
> From: andres@2ndquadrant.com
> To: hlinnakangas@vmware.com
> CC: klaussfreire@gmail.com; johnlumby@hotmail.com; stark@mit.edu; pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal and patch
>
> On 2014-06-26 00:08:48 +0300, Heikki Linnakangas wrote:
>> LWLocks are implemented with semaphores, so if you can increment a semaphore
>> in the signal handler / callback thread, then in theory you should be able
>> to release a LWLock.
>
> I don't think that's a convincing argument even if semop et al were
> signal safe. LWLocks also use spinlocks and it's a fairly bad idea to do
> anything with the same spinlock both inside and outside a signal
> handler.
> I don't think we're going to get lwlock.c/LWLockRelease to work
> reasonably from a signal handler without a lot of work.

I agree -  see earlier.




Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: sorting a union over inheritance vs pathkeys
Next
From: John Lumby
Date:
Subject: Re: Extended Prefetching using Asynchronous IO - proposal and patch