Re: Extended Prefetching using Asynchronous IO - proposal and patch - Mailing list pgsql-hackers
From | johnlumby |
---|---|
Subject | Re: Extended Prefetching using Asynchronous IO - proposal and patch |
Date | |
Msg-id | BLU436-SMTP167F40CBAC7CD274E669274A3DE0@phx.gbl Whole thread Raw |
In response to | Re: Extended Prefetching using Asynchronous IO - proposal and patch (Heikki Linnakangas <hlinnakangas@vmware.com>) |
Responses |
Re: Extended Prefetching using Asynchronous IO - proposal
and patch
|
List | pgsql-hackers |
<div class="moz-cite-prefix"><font face="Nimbus Roman No9 L">Thanks for the replies and thoughts.</font><br /><br /> On 08/19/1418:27, Heikki Linnakangas wrote:<br /></div><blockquote cite="mid:53F3CF5B.4000607@vmware.com" type="cite">On 08/20/201412:17 AM, John Lumby wrote: <br /><blockquote type="cite">I am attaching a new version of the patch for considerationin the current commit fest. <br /></blockquote><br /> Thanks for working on this! <br /><br /><blockquote type="cite">Relativeto the one I submitted on 25 June in <a class="moz-txt-link-abbreviated" href="mailto:BAY175-W412FF89303686022A9F16AA3190@phx.gbl">BAY175-W412FF89303686022A9F16AA3190@phx.gbl</a><br/> the methodfor handling aio completion using sigevent has been re-written to use <br /> signals exclusively rather than a compositeof signals and LWlocks, <br /> and this has fixed the problem I mentioned before with the LWlock method. <br /></blockquote><br/> ISTM the patch is still allocating stuff in shared memory that really doesn't belong there. Namely,the BufferAiocb structs. Or at least parts of it; there's also a waiter queue there which probably needs to live inshared memory, but the rest of it does not. <br /></blockquote><br /><font face="Nimbus Roman No9 L">Actually the reasonthe BufferAiocb ( the postgresql block corresponding to the aio's aiocb)<br /> must be located in shared memory isthat, as you said, it acts as anchor for the waiter list.<br /> See further comment below.</font><br /><br /><blockquotecite="mid:53F3CF5B.4000607@vmware.com" type="cite"><br /> At least BufAWaitAioCompletion is still calling aio_error()on an AIO request that might've been initiated by another backend. That's not OK. <br /></blockquote><br /><fontface="Nimbus Roman No9 L">Yes, you are right, and I agree with this one -<br /> I will add a aio_error_return_codefield in the </font><font face="Nimbus Roman No9 L"><font face="Nimbus Roman No9 L">BufferAiocb</font><br/> and only the originator will set this from the real aiocb.</font><br /><br /><blockquote cite="mid:53F3CF5B.4000607@vmware.com"type="cite"><br /> Please write the patch without atomic CAS operation. Just use aspinlock. </blockquote><br /><font face="Nimbus Roman No9 L">Umm, this is a new criticism I think. I use CAS for thingsother than locking,<br /> such as add/remove from shared queue. I suppose maybe a spinlock on the entire queue<br/> can be used equivalently, but with more code (extra confusion) and worse performance<br /> (coarser serialization). What is your objection to using gcc's atomic ops? Portability?</font><br /><br /><br /><blockquotecite="mid:53F3CF5B.4000607@vmware.com" type="cite">There's a patch in the commitfest to add support for that,</blockquote><br/><font face="Nimbus Roman No9 L">sorry, support for what? There are already spinlocks in postgresql, <br /> you mean some new kind? please point me at it with hacker msgid or something.</font><br /><br /><blockquotecite="mid:53F3CF5B.4000607@vmware.com" type="cite"> but it's not committed yet, and all those USE_AIO_ATOMIC_BUILTIN_COMP_SWAPifdefs make the patch more difficult to read. Same with all the other #ifdefs; please justpick a method that works. <br /></blockquote><font face="Nimbus Roman No9 L"><br /> Ok, yes, the ifdefs are unpleasant. I will do something about that.<br /> Ideally they would be entirely confined to header files and only macrofunctions<br /> in C files - maybe I can do that. And eventually when the dust has settled<br /> eliminate obsoleteifdef blocks altogether.</font><br /><br /><blockquote cite="mid:53F3CF5B.4000607@vmware.com" type="cite">Also, pleasesplit prefetching of regular index scans into a separate patch. It's orthogonal to doing async I/O;</blockquote><br/><font face="Nimbus Roman No9 L">actually not completely orthogonal, see next</font><br /><br /><blockquotecite="mid:53F3CF5B.4000607@vmware.com" type="cite"> we could prefetch regular index scans with posix_fadvisetoo, and AIO should be useful for prefetching other stuff. <br /></blockquote><br /><div class="moz-cite-prefix">On08/19/14 19:10, Claudio Freire wrote:<br /></div><blockquote cite="mid:CAGTBQpbrHCJvkCmXmA8zHmfKNLLtqbgagLaYA+Z9E-F0oY2EtQ@mail.gmail.com"type="cite"><pre wrap="">On Tue, Aug 19, 2014at 7:27 PM, Heikki Linnakangas <a class="moz-txt-link-rfc2396E" href="mailto:hlinnakangas@vmware.com"><hlinnakangas@vmware.com></a> wrote: </pre><blockquote type="cite"><pre wrap="">Also, please split prefetching of regular index scans into a separate patch. ... </pre></blockquote><pre wrap="">That patch already happened on the list, and it wasn't a win in many cases. I'm not sure it should be proposed independently of this one. Maybe a separate patch, but it should be considered dependent on this. I don't have an archive link at hand atm, but I could produce one later.</pre></blockquote><font face="Nimbus Roman No9 L">Severalpeople have asked to split this patch into several smaller ones and I<br /> have thought about it. It wouldintroduce some awkward dependencies.<br /> E.g. splitting the scanner code (index, relation-heap) into separate patch<br/> from aio code would mean that the scanner patch becomes dependent<br /> on the aio patch. They are not quiteorthogonal.<br /><br /> The reason is that the scanners call a new function, DiscardBuffer(blockid)<br /> when aio isin use. We can get around it by providing a stub for that function<br /> in the scanner patch, but then there issome risk of someone getting the<br /> wrong version of that function in their build. It just adds yet more complexity<br/> and breakage opportunities.<br /></font><br /><blockquote cite="mid:53F3CF5B.4000607@vmware.com" type="cite"><br/> - Heikki <br /><br /></blockquote><font face="Nimbus Roman No9 L">One further comment concerning these</font><font face="Nimbus Roman No9 L"><font face="Nimbus Roman No9 L">BufferAiocb</font> and aiocb control blocks<br/> being in shared memory :<br /><br /> I explained above that the </font><font face="Nimbus Roman No9 L"><fontface="Nimbus Roman No9 L">BufferAiocb</font> must be in shared memory for wait/post.<br /> The aiocb does not necessarilyhave to be in shared memory,<br /> but since there is a one-to-one relationship between </font><font face="NimbusRoman No9 L"><font face="Nimbus Roman No9 L">BufferAiocb</font> and aiocb,<br /> it makes the code much simpler, and the operation much more efficient,<br /> if the aiocb is embedded in the </font><font face="Nimbus Roman No9 L"><font face="Nimbus Roman No9 L">BufferAiocb</font> as I have it now.<br /> E.g. if the aiocb is in private-processmemory, then an additional allocation<br /> scheme is needed (fixed number? palloc()in'g extra ones asneeded? ...)<br /> which adds complexity, and probably some inefficiency since a shared pool is usually<br /> more efficient(allows higher maximum per process etc), and there would have to be<br /> some pointer de-referencing from </font><fontface="Nimbus Roman No9 L"><font face="Nimbus Roman No9 L">BufferAiocb</font> to aiocb adding some (small)overhead.<br /><br /> I understood your objection to use of shared memory as being that you don't want<br /> a non-originatorto access the originator's aiocb using aio_xxx calls, and, with the<br /> one change I've said I willdo above (put the aio_error retcode in the </font><font face="Nimbus Roman No9 L"><font face="Nimbus Roman No9L">BufferAiocb</font>)<br /> I will have achieved that requirement. I am hoping this answers your objections<br /> concerningshared memory.<br /></font><br /><br /><blockquote cite="mid:53F3CF5B.4000607@vmware.com" type="cite"><br /><br/></blockquote><br />
pgsql-hackers by date: