Re: Extended Prefetching using Asynchronous IO - proposal and patch - Mailing list pgsql-hackers

From John Lumby
Subject Re: Extended Prefetching using Asynchronous IO - proposal and patch
Date
Msg-id BAY175-W27A559A507831A31D34CB6A31E0@phx.gbl
Whole thread Raw
In response to Re: Extended Prefetching using Asynchronous IO - proposal and patch  (Greg Stark <stark@mit.edu>)
Responses Re: Extended Prefetching using Asynchronous IO - proposal and patch  (Heikki Linnakangas <hlinnakangas@vmware.com>)
List pgsql-hackers

----------------------------------------
> From: stark@mit.edu
> Date: Mon, 23 Jun 2014 16:04:50 -0700
> Subject: Re: Extended Prefetching using Asynchronous IO - proposal and patch
> To: johnlumby@hotmail.com
> CC: klaussfreire@gmail.com; hlinnakangas@vmware.com; pgsql-hackers@postgresql.org
>
> On Mon, Jun 23, 2014 at 2:43 PM, John Lumby <johnlumby@hotmail.com> wrote:
>> It is when some *other* backend gets there first with the ReadBuffer that
>> things are a bit trickier. The current version of the patch did polling for that case
>> but that drew criticism, and so an imminent new version of the patch
>> uses the sigevent mechanism. And there are other ways still.
>
> I'm a bit puzzled by this though. Postgres *already* has code for this
> case. When you call ReadBuffer you set the bits on the buffer

Good question.     Let me explain.
Yes, postgresql has code for the case of a backend is inside a synchronous
read() or write(),  performed from a ReadBuffer(),  and some other backend
wants that buffer.    asynchronous aio is initiated not from ReadBuffer
but from PrefetchBuffer,    and performs its aio_read into an allocated,  pinned,
postgresql buffer.    This is entirely different from the synchronous io case.
Why?      Because the issuer of the aio_read (the "originator") is unaware
of this buffer pinned on its behalf,  and is then free to do any other
reading or writing it wishes,   such as more prefetching  or any other operation.
And furthermore,  it may *never* issue a ReadBuffer for the block which it
prefetched.

Therefore,  asynchronous IO is different from synchronous IO,  and
a new bit,  BM_AIO_IN_PROGRESS, in the buf_header  is required to
track this aio operation until completion.

I would encourage you to read the new
postgresql-prefetching-asyncio.README
in the patch file where this is explained in greater detail.

> indicating I/O is in progress. If another backend does ReadBuffer for
> the same block they'll get the same buffer and then wait until the
> first backend's I/O completes. ReadBuffer goes through some hoops to
> handle this (and all the corner cases such as the other backend's I/O
> completing and the buffer being reused for another block before the
> first backend reawakens). It would be a shame to reinvent the wheel.

No re-invention!   Actually some effort has been made to use the
existing functions in bufmgr.c as much as possible rather than
rewriting them.

>
> The problem with using the Buffers I/O in progress bit is that the I/O
> might complete while the other backend is busy doing stuff. As long as
> you can handle the I/O completion promptly -- either in callback or
> thread or signal handler then that wouldn't matter. But I'm not clear
> that any of those will work reliably.

They both work reliably,  but the criticism was that backend B polling
an aiocb of an aio issued by backend A is not documented as
being supported  (although it happens to work),  hence the proposed
change to use sigevent.

By the way,   on the "will it actually work though?" question which several folks
have raised,    I should mention that this patch has been in semi-production
use for almost 2 years now in different stages of completion on all postgresql
releases from 9.1.4 to 9.5 devel.       I would guess it has had around
500 hours of operation by now.     I'm sure there are bugs still to be
found but I am confident it is fundamentally sound.
 
>
> --
> greg


pgsql-hackers by date:

Previous
From: David G Johnston
Date:
Subject: Re: idle_in_transaction_timeout
Next
From: Kevin Grittner
Date:
Subject: Re: idle_in_transaction_timeout