Re: Extended Prefetching using Asynchronous IO - proposal and patch - Mailing list pgsql-hackers

From johnlumby
Subject Re: Extended Prefetching using Asynchronous IO - proposal and patch
Date
Msg-id BLU436-SMTP6AB1FF61C05A13A944357A3210@phx.gbl
Whole thread Raw
In response to Re: Extended Prefetching using Asynchronous IO - proposal and patch  (Claudio Freire <klaussfreire@gmail.com>)
Responses Re: Extended Prefetching using Asynchronous IO - proposal and patch
Re: Extended Prefetching using Asynchronous IO - proposal and patch
List pgsql-hackers
On 05/30/14 09:36, Claudio Freire wrote:
On Fri, May 30, 2014 at 4:15 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
>> I see no such a rule in the man pages of any of the functions involved. And it wouldn't matter anyway; the behavior is exactly the same if you aio_error() first.

You are completely correct and I was wrong.
For the case of different-process,
It is not the order of the aio calls that makes it work/not work,
it is whether it polls with a timeout.
I think I was confusing this with a rule concerning order for
calling aio_return.


We don't want polling... And even if we did, calling aio_suspend() in a
way
that's known to be broken, in a loop, is a pretty crappy way of polling.

I have made a change to the complete_aio functions
to use polling *only* if the caller is not the originator of the aio_read.
If it is the originator,  it now calls aio_suspend with no timeout
(i.e. wait till complete).
The loop also now checks for the EINTR case which someone
pointed out.

In my test runs,    with a debug message in  FileCompleteaio to tell me
whether caller is or is not the originator of the aio_read,
I see > 99.8% of calls are from originator and only < 0.2% are not.
e.g.  (samples from two backends)

different : 10
same : 11726

different : 38
same : 12105

new patch based on today 140531 is attached,

This improves one of my benchmarks by about 10% throughput,
and now shows an overall 23% improvement relative to existing code with posix_fadvise.

So hopefully this addresses your performance concern.

If you look at the new patch,  you'll see that for the different-pid case,
I still call aio_suspend with a timeout.
As you or Claudio pointed out earlier,    it could just as well sleep for the same timeout,
but the small advantage of calling aio_suspend is if the io completed just between
the aio_error returning EINPROGRESS and the aio_suspend call.
Also it makes the code simpler.   In fact this change is quite small,   just a few lines
in backend/storage/buffer/buf_async.c and  backend/storage/file/fd.c

Based on this,    I think it is not necessary  to get rid of the polling altogether
(and in any case, as far as I can see,  very difficult).


Well,  as mentioned earlier,  it is not broken.     Whether it is
efficient I am not sure.
I have looked at the mutex in aio_suspend that you mentioned and I am not
quite convinced that,  if caller is not the original aio_read process,
it renders the suspend() into an instant timeout.      I will see if I can
verify that.

I don't see the point of pursuing this design further. Surely we don't want
to use polling here, and you're relying on undefined behavior anyway. I'm
pretty sure aio_return/aio_error won't work from a different process on all
platforms, even if it happens to work on Linux. Even on Linux, it might stop
working if the underlying implementation changes from the glibc pthread
emulation to something kernel-based.

Good point.     I have included the guts of your little test program
(modified to do polling) into the existing autoconf test program
that decides on the
#define  USE_AIO_ATOMIC_BUILTIN_COMP_SWAP.
See config/c-library.m4.
I hope this goes some way to answer your concern about robustness,
as at least now if the implementation changes in some way that
renders the polling ineffective,  it will be caught in configure.


I'll try to do some measuring of performance with:
a) git head
b) git head + patch as-is
c) git head + patch without aio_suspend in foreign processes (just re-read)
d) git head + patch with a lwlock (or whatever works) instead of aio_suspend

a-c will be the fastest, d might take some while.

I'll let you know of the results as I get them.


Claudio,  I am not quite sure if what I am submitting now is
quite the same as any of yours.       As I promised before,  but have
not yet done,  I will package one or two of my benchmarks and
send them in.



Attachment

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Re: Logical slots not mentioned in CREATE_REPLICATION_SLOT for replication protocol docs
Next
From: johnlumby
Date:
Subject: Re: Extended Prefetching using Asynchronous IO - proposal and patch