Re: Extended Prefetching using Asynchronous IO - proposal and patch - Mailing list pgsql-hackers

From John Lumby
Subject Re: Extended Prefetching using Asynchronous IO - proposal and patch
Date
Msg-id BAY175-W364D73BCD54C4E4351AA63A31F0@phx.gbl
Whole thread Raw
In response to Re: Extended Prefetching using Asynchronous IO - proposal and patch  (Claudio Freire <klaussfreire@gmail.com>)
Responses Re: Extended Prefetching using Asynchronous IO - proposal and patch  (Greg Stark <stark@mit.edu>)
List pgsql-hackers

----------------------------------------
> Date: Thu, 19 Jun 2014 15:43:44 -0300
> Subject: Re: Extended Prefetching using Asynchronous IO - proposal and patch
> From: klaussfreire@gmail.com
> To: stark@mit.edu
> CC: hlinnakangas@vmware.com; johnlumby@hotmail.com; pgsql-hackers@postgresql.org
>
> On Thu, Jun 19, 2014 at 2:49 PM, Greg Stark <stark@mit.edu> wrote:
>> I don't think the threaded implementation on Linux is the one to use
>> though.  [... ] The overhead of thread communication
>> will completely outweigh any advantage over posix_fadvise's partial
>> win.
>
> What overhead?
>
> The only communication is through a "done" bit and associated
> synchronization structure when *and only when* you want to wait on it.
>

Threads do cost some extra CPU,  but provided the system had some
spare CPU capacity,  then performance improves because of reduced IO wait.
I quoted a measured improvement of  17% - 18% improvement in the README
along with some more explanation of when the asyc IO gives and improvement.

> Furthermore, posix_fadvise is braindead on this use case, been there,
> done that. What you win with threads is a better postgres-kernel
> interaction, even if you loose some CPU performance it's gonna beat
> posix_fadvise by a large margin.
>
> [...]
>
>> When I investigated this I found the buffer manager's I/O bits seemed
>> to already be able to represent the state we needed (i/o initiated on
>> this buffer but not completed). The problem was in ensuring that a
>> backend would process the i/o completion promptly when it might be in
>> the midst of handling other tasks and might even get an elog() stack
>> unwinding. The interface that actually fits Postgres best might be the
>> threaded interface (orthogonal to the threaded implementation
>> question) which is you give aio a callback which gets called on a
>> separate thread when the i/o completes. The alternative is you give
>> aio a list of operation control blocks and it tells you the state of
>> all the i/o operations. But it's not clear to me how you arrange to do
>> that regularly, promptly, and reliably.
>
> Indeed we've been musing about using librt's support of completion
> callbacks for that.

For the most common case of a backend issues a PrefetchBuffer
and then that *same* backend issues ReadBuffer,  the posix aio works
ideally,  since there is no need for any callback or completion signal,
we simply check "is it complete" during the ReadBuffer.

It is when some *other* backend gets there first with the ReadBuffer that
things are a bit trickier.    The current version of the patch did polling for that case
but that drew criticism,    and so an imminent new version of the patch
uses the sigevent mechanism.    And there are other ways still.

In an earlier posting I reported that ,  in my benchmark,
99.8% of [FileCompleteaio]  calls are from originator and only < 0.2% are not.so,  from a performance perspective, 
onlythe common case really matters.                           


pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Sending out a request for more buildfarm animals?
Next
From: Jeff Janes
Date:
Subject: Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]