Thread: Inline MemoryContextSwitchTo?
Can anyone think of a reason we aren't inlining MemoryContextSwitchTo() in GCC builds, similarly to the way list_head() et al are handled? It wouldn't be a huge gain, but I consistently see MemoryContextSwitchTo eating a percent or three of most profiles. regards, tom lane
On Sun, 2005-02-06 at 18:05 -0500, Tom Lane wrote: > Can anyone think of a reason we aren't inlining MemoryContextSwitchTo() > in GCC builds, similarly to the way list_head() et al are handled? > > It wouldn't be a huge gain, but I consistently see MemoryContextSwitchTo > eating a percent or three of most profiles. Sounds good. I think we can inlining all MemoryContext functions which check memory context header and call context->metods->...() only. An example MemoryContextAlloc() that is very often called from code too. Karel -- Karel Zak <zakkr@zf.jcu.cz>
>Karel Zak wrote > On Sun, 2005-02-06 at 18:05 -0500, Tom Lane wrote: > > Can anyone think of a reason we aren't inlining > MemoryContextSwitchTo() > > in GCC builds, similarly to the way list_head() et al are handled? > > > > It wouldn't be a huge gain, but I consistently see > MemoryContextSwitchTo > > eating a percent or three of most profiles. > > Sounds good. > > I think we can inlining all MemoryContext functions which check memory > context header and call context->metods->...() only. An example > MemoryContextAlloc() that is very often called from code too. Yes, thats good. But why MemoryContextSwitchTo ? That seems to come out much lower than MemoryContextAllocZeroAligned or MemoryContextAlloc on the profiles I've seen. Best Regards, Simon Riggs
"Simon Riggs" <simon@2ndquadrant.com> writes: > But why MemoryContextSwitchTo ? Because (a) it's so small that inlining it will probably be a net code savings rather than expenditure, and (b) it does have noticeable cost. For example, in this gprof profile taken Saturday: % cumulative self self total time seconds seconds calls ms/call ms/call name 31.25 22.40 22.40 _mcount 3.31 24.77 2.37 704032 0.00 0.02 IndexNext2.82 26.79 2.02 2112850 0.00 0.00 AllocSetAlloc 2.48 28.57 1.78 2821112 0.00 0.00 LockBuffer 2.13 30.10 1.53 701932 0.00 0.01 heap_release_fetch 1.97 31.51 1.41 6310394 0.00 0.00 MemoryContextSwitchTo 1.97 32.92 1.41 699632 0.00 0.00 int8inc 1.66 34.11 1.19 1886388 0.00 0.00 LWLockAcquire 1.62 35.27 1.16 474244 0.00 0.00 hash_search 1.56 36.39 1.12 2109900 0.00 0.00 AllocSetReset 1.46 37.44 1.05 701901 0.00 0.00 _bt_restscan1.42 38.46 1.02 2109079 0.00 0.00 memset 1.39 39.46 1.00 701901 0.00 0.00 _bt_step 1.24 40.35 0.89 701833 0.00 0.00 ExecEvalExprSwitchContext 1.20 41.21 0.86 704143 0.00 0.00 _bt_checkkeys 1.17 42.05 0.84 1886388 0.00 0.00 LWLockRelease 1.17 42.89 0.84 701901 0.00 0.00 _bt_next 1.05 43.64 0.75 701833 0.00 0.00 HeapTupleSatisfiesSnapshot1.03 44.38 0.74 704144 0.00 0.01 btgettuple 1.03 45.12 0.74 $$dyncall 1.02 45.85 0.73 2110119 0.00 0.00 AllocSetCheck 0.91 46.50 0.65 706412 0.00 0.01 ReleaseAndReadBuffer (all else below 1%) the only thing I see in that list that looks reasonable to inline is MemoryContextSwitchTo. (This is ye olde test_setup/test_run case on a single processor, which is not very interesting lock-wise but I wanted to reconfirm that we weren't spending a large fraction of the runtime inside bufmgr.) regards, tom lane