Re: Extended Prefetching using Asynchronous IO - proposal and patch - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Extended Prefetching using Asynchronous IO - proposal and patch
Date
Msg-id 53A984FE.1070309@vmware.com
Whole thread Raw
In response to Re: Extended Prefetching using Asynchronous IO - proposal and patch  (John Lumby <johnlumby@hotmail.com>)
Responses Re: Extended Prefetching using Asynchronous IO - proposal and patch  (John Lumby <johnlumby@hotmail.com>)
List pgsql-hackers
On 06/24/2014 04:29 PM, John Lumby wrote:
>> On Mon, Jun 23, 2014 at 2:43 PM, John Lumby <johnlumby@hotmail.com> wrote:
>>> It is when some *other* backend gets there first with the ReadBuffer that
>>> things are a bit trickier. The current version of the patch did polling for that case
>>> but that drew criticism, and so an imminent new version of the patch
>>> uses the sigevent mechanism. And there are other ways still.
>>
>> I'm a bit puzzled by this though. Postgres *already* has code for this
>> case. When you call ReadBuffer you set the bits on the buffer
>
> Good question.     Let me explain.
> Yes, postgresql has code for the case of a backend is inside a synchronous
> read() or write(),  performed from a ReadBuffer(),  and some other backend
> wants that buffer.    asynchronous aio is initiated not from ReadBuffer
> but from PrefetchBuffer,    and performs its aio_read into an allocated,  pinned,
> postgresql buffer.    This is entirely different from the synchronous io case.
> Why?      Because the issuer of the aio_read (the "originator") is unaware
> of this buffer pinned on its behalf,  and is then free to do any other
> reading or writing it wishes,   such as more prefetching  or any other operation.
> And furthermore,  it may *never* issue a ReadBuffer for the block which it
> prefetched.

I still don't see the difference. Once an asynchronous read is initiated 
on the buffer, it can't be used for anything else until the read has 
finished. This is exactly the same situation as with a synchronous read: 
after read() is called, the buffer can't be used for anything else until 
the call finishes.

In particular, consider the situation from another backend's point of 
view. Looking from another backend (i.e. one that didn't initiate the 
read), there's no difference between a synchronous and asynchronous 
read. So why do we need a different IPC mechanism for the synchronous 
and asynchronous cases? We don't.

I understand that *within the backend*, you need to somehow track the 
I/O, and you'll need to treat synchronous and asynchronous I/Os 
differently. But that's all within the same backend, and doesn't need to 
involve the flags or locks in shared memory at all. The inter-process 
communication doesn't need any changes.

>> The problem with using the Buffers I/O in progress bit is that the I/O
>> might complete while the other backend is busy doing stuff. As long as
>> you can handle the I/O completion promptly -- either in callback or
>> thread or signal handler then that wouldn't matter. But I'm not clear
>> that any of those will work reliably.
>
> They both work reliably,  but the criticism was that backend B polling
> an aiocb of an aio issued by backend A is not documented as
> being supported  (although it happens to work),  hence the proposed
> change to use sigevent.

You didn't understand what Greg meant. You need to handle the completion 
of the I/O in the same process that initiated it, by clearing the 
in-progress bit of the buffer and releasing the I/O in-progress lwlock 
on it. And you need to do that very quickly after the I/O has finished, 
because there might be another backend waiting for the buffer and you 
don't want him to wait longer than necessary.

The question is, if you receive the notification of the I/O completion 
using a signal or a thread, is it safe to release the lwlock from the 
signal handler or a separate thread?

> By the way,   on the "will it actually work though?" question which several folks
> have raised,    I should mention that this patch has been in semi-production
> use for almost 2 years now in different stages of completion on all postgresql
> releases from 9.1.4 to 9.5 devel.       I would guess it has had around
> 500 hours of operation by now.     I'm sure there are bugs still to be
> found but I am confident it is fundamentally sound.

Well, a committable version of this patch is going to look quite 
different from the first version that you posted, so I don't put much 
weight on how long you've tested the first version.

- Heikki



pgsql-hackers by date:

Previous
From: Vik Fearing
Date:
Subject: Re: idle_in_transaction_timeout
Next
From: Robert Haas
Date:
Subject: Re: idle_in_transaction_timeout