Re: Extended Prefetching using Asynchronous IO - proposal and patch - Mailing list pgsql-hackers

From johnlumby
Subject Re: Extended Prefetching using Asynchronous IO - proposal and patch
Date
Msg-id BLU436-SMTP167F40CBAC7CD274E669274A3DE0@phx.gbl
Whole thread Raw
In response to Re: Extended Prefetching using Asynchronous IO - proposal and patch  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Responses Re: Extended Prefetching using Asynchronous IO - proposal and patch
List pgsql-hackers
<div class="moz-cite-prefix"><font face="Nimbus Roman No9 L">Thanks for the replies and thoughts.</font><br /><br /> On
08/19/1418:27, Heikki Linnakangas wrote:<br /></div><blockquote cite="mid:53F3CF5B.4000607@vmware.com" type="cite">On
08/20/201412:17 AM, John Lumby wrote: <br /><blockquote type="cite">I am attaching a new version of the patch for
considerationin the current commit fest. <br /></blockquote><br /> Thanks for working on this! <br /><br /><blockquote
type="cite">Relativeto the one I submitted on 25 June in <a class="moz-txt-link-abbreviated"
href="mailto:BAY175-W412FF89303686022A9F16AA3190@phx.gbl">BAY175-W412FF89303686022A9F16AA3190@phx.gbl</a><br/> the
methodfor handling aio completion using sigevent has been re-written to use <br /> signals exclusively rather than a
compositeof signals and LWlocks, <br /> and this has fixed the problem I mentioned before with the LWlock method. <br
/></blockquote><br/> ISTM the patch is still allocating stuff in shared memory that really doesn't belong there.
Namely,the BufferAiocb structs. Or at least parts of it; there's also a waiter queue there which probably needs to live
inshared memory, but the rest of it does not. <br /></blockquote><br /><font face="Nimbus Roman No9 L">Actually the
reasonthe BufferAiocb ( the postgresql block corresponding to the aio's aiocb)<br /> must be located in shared memory
isthat, as you said,  it acts as anchor for the waiter list.<br /> See further comment below.</font><br /><br
/><blockquotecite="mid:53F3CF5B.4000607@vmware.com" type="cite"><br /> At least BufAWaitAioCompletion is still calling
aio_error()on an AIO request that might've been initiated by another backend. That's not OK. <br /></blockquote><br
/><fontface="Nimbus Roman No9 L">Yes,   you are right,  and I agree with this one  -<br /> I will add a
aio_error_return_codefield in the </font><font face="Nimbus Roman No9 L"><font face="Nimbus Roman No9
L">BufferAiocb</font><br/> and only the originator will set this from the real aiocb.</font><br /><br /><blockquote
cite="mid:53F3CF5B.4000607@vmware.com"type="cite"><br /> Please write the patch without atomic CAS operation. Just use
aspinlock. </blockquote><br /><font face="Nimbus Roman No9 L">Umm,   this is a new criticism I think.   I use CAS for
thingsother than locking,<br /> such as add/remove from shared queue.   I suppose maybe a spinlock on the entire
queue<br/> can be used equivalently,  but with more code (extra confusion) and worse performance<br /> (coarser
serialization).     What is your objection to using gcc's atomic ops?   Portability?</font><br /><br /><br
/><blockquotecite="mid:53F3CF5B.4000607@vmware.com" type="cite">There's a patch in the commitfest to add support for
that,</blockquote><br/><font face="Nimbus Roman No9 L">sorry,  support for what?       There are already spinlocks in
postgresql,  <br /> you mean some new kind?    please point me at it with hacker msgid or something.</font><br /><br
/><blockquotecite="mid:53F3CF5B.4000607@vmware.com" type="cite"> but it's not committed yet, and all those
USE_AIO_ATOMIC_BUILTIN_COMP_SWAPifdefs make the patch more difficult to read. Same with all the other #ifdefs; please
justpick a method that works. <br /></blockquote><font face="Nimbus Roman No9 L"><br /> Ok,  yes,   the ifdefs are
unpleasant.   I will do something about that.<br /> Ideally they would be entirely confined to header files and only
macrofunctions<br /> in C files  -  maybe I can do that.   And eventually when the dust has settled<br /> eliminate
obsoleteifdef blocks altogether.</font><br /><br /><blockquote cite="mid:53F3CF5B.4000607@vmware.com" type="cite">Also,
pleasesplit prefetching of regular index scans into a separate patch. It's orthogonal to doing async
I/O;</blockquote><br/><font face="Nimbus Roman No9 L">actually not completely orthogonal,  see next</font><br /><br
/><blockquotecite="mid:53F3CF5B.4000607@vmware.com" type="cite"> we could prefetch regular index scans with
posix_fadvisetoo, and AIO should be useful for prefetching other stuff. <br /></blockquote><br /><div
class="moz-cite-prefix">On08/19/14 19:10, Claudio Freire wrote:<br /></div><blockquote
cite="mid:CAGTBQpbrHCJvkCmXmA8zHmfKNLLtqbgagLaYA+Z9E-F0oY2EtQ@mail.gmail.com"type="cite"><pre wrap="">On Tue, Aug 19,
2014at 7:27 PM, Heikki Linnakangas
 
<a class="moz-txt-link-rfc2396E" href="mailto:hlinnakangas@vmware.com"><hlinnakangas@vmware.com></a> wrote:
</pre><blockquote type="cite"><pre wrap="">Also, please split prefetching of regular index scans into a separate patch.
...
</pre></blockquote><pre wrap="">That patch already happened on the list, and it wasn't a win in many
cases. I'm not sure it should be proposed independently of this one.
Maybe a separate patch, but it should be considered dependent on this.

I don't have an archive link at hand atm, but I could produce one later.</pre></blockquote><font face="Nimbus Roman No9
L">Severalpeople have asked to split this patch into several smaller ones and I<br /> have thought about it.     It
wouldintroduce some awkward dependencies.<br /> E.g.  splitting the scanner code  (index,  relation-heap) into separate
patch<br/> from aio code would mean that the scanner patch becomes dependent<br /> on the aio patch.     They are not
quiteorthogonal.<br /><br /> The reason is that the scanners call a new function, DiscardBuffer(blockid)<br /> when aio
isin use.     We can get around it by providing a stub for that function<br /> in the scanner patch,   but then there
issome risk of someone getting the<br /> wrong version of that function in their build.     It just adds yet more
complexity<br/> and breakage opportunities.<br /></font><br /><blockquote cite="mid:53F3CF5B.4000607@vmware.com"
type="cite"><br/> - Heikki <br /><br /></blockquote><font face="Nimbus Roman No9 L">One further comment concerning
these</font><font face="Nimbus Roman No9 L"><font face="Nimbus Roman No9       L">BufferAiocb</font> and aiocb control
blocks<br/> being in shared memory :<br /><br /> I explained above that the </font><font face="Nimbus Roman No9
L"><fontface="Nimbus Roman No9 L">BufferAiocb</font> must be in shared memory for wait/post.<br /> The aiocb does not
necessarilyhave to be in shared memory,<br /> but since there is a one-to-one relationship between </font><font
face="NimbusRoman No9 L"><font face="Nimbus Roman No9 L">BufferAiocb</font> and aiocb,<br /> it makes the code much
simpler,  and the operation much more efficient,<br /> if the aiocb is embedded in the </font><font face="Nimbus Roman
  No9 L"><font face="Nimbus Roman No9 L">BufferAiocb</font> as I have it now.<br /> E.g. if the aiocb is in
private-processmemory,   then an additional allocation<br /> scheme is needed (fixed number?   palloc()in'g extra ones
asneeded?  ...)<br /> which adds complexity,   and probably some inefficiency since a shared pool is usually<br /> more
efficient(allows higher maximum per process etc),   and there would have to be<br /> some pointer de-referencing from
</font><fontface="Nimbus Roman     No9 L"><font face="Nimbus Roman No9 L">BufferAiocb</font> to aiocb adding some
(small)overhead.<br /><br /> I understood your objection to use of shared memory as being that you don't want<br /> a
non-originatorto access the originator's aiocb using aio_xxx calls,   and,     with the<br /> one change I've said I
willdo above (put the aio_error retcode in the </font><font face="Nimbus Roman No9 L"><font face="Nimbus       Roman
No9L">BufferAiocb</font>)<br /> I will have achieved that requirement.   I am hoping this answers your objections<br />
concerningshared memory.<br /></font><br /><br /><blockquote cite="mid:53F3CF5B.4000607@vmware.com" type="cite"><br
/><br/></blockquote><br /> 

pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: SKIP LOCKED DATA (work in progress)
Next
From: Josh Berkus
Date:
Subject: Code bug or doc bug?