On Thu, May 29, 2014 at 5:39 PM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> On 05/29/2014 11:34 PM, Claudio Freire wrote:
>>
>> On Thu, May 29, 2014 at 5:23 PM, Heikki Linnakangas
>> <hlinnakangas@vmware.com> wrote:
>>>
>>> On 05/29/2014 04:12 PM, John Lumby wrote:
>>>>
>>>>
>>>>> On 05/28/2014 11:52 PM, John Lumby wrote:
>>>>>
>>>>> The patch seems to assume that you can put the aiocb struct in shared
>>>>> memory, initiate an asynchronous I/O request from one process, and wait
>>>>> for its completion from another process. I'm pretty surprised if that
>>>>> works on any platform.
>>>>
>>>>
>>>> It works on linux. Actually this ability allows the asyncio
>>>> implementation to
>>>> reduce complexity in one respect (yes I know it looks complex enough) :
>>>> it makes waiting for completion of an in-progress IO simpler than for
>>>> the existing synchronous IO case,. since librt takes care of the
>>>> waiting.
>>>> specifically, no need for extra wait-for-io control blocks
>>>> such as in bufmgr's WaitIO()
>>>
>>>
>>> [checks]. No, it doesn't work. See attached test program.
>>>
>>> It kinda seems to work sometimes, because of the way it's implemented in
>>> glibc. The aiocb struct has a field for the result value and errno, and
>>> when
>>> the I/O is finished, the worker thread fills them in. aio_error() and
>>> aio_return() just return the values of those fields, so calling
>>> aio_error()
>>> or aio_return() do in fact happen to work from a different process.
>>> aio_suspend(), however, is implemented by sleeping on a process-local
>>> mutex,
>>> which does not work from a different process.
>>>
>>> Even if it worked on Linux today, it would be a bad idea to rely on it
>>> from
>>> a portability point of view. No, the only sane way to make this work is
>>> that
>>> the process that initiates an I/O request is responsible for completing
>>> it.
>>> If another process needs to wait for an async I/O to complete, we must
>>> use
>>> some other means to do the waiting. Like the io_in_progress_lock that we
>>> already have, for the same purpose.
>>
>>
>> But calls to it are timeouted by 10us, effectively turning the thing
>> into polling mode.
>
>
> We don't want polling... And even if we did, calling aio_suspend() in a way
> that's known to be broken, in a loop, is a pretty crappy way of polling.
Didn't fix that, but the attached patch does fix regression tests when
scanning over index types other than btree (was invoking elog when the
index am didn't have ampeeknexttuple)