Thread: Inline MemoryContextSwitchTo?

Inline MemoryContextSwitchTo?

From

Tom Lane

Date:

06 February 2005, 23:05:53

Can anyone think of a reason we aren't inlining MemoryContextSwitchTo()
in GCC builds, similarly to the way list_head() et al are handled?

It wouldn't be a huge gain, but I consistently see MemoryContextSwitchTo
eating a percent or three of most profiles.
        regards, tom lane

Re: Inline MemoryContextSwitchTo?

From

Karel Zak

Date:

07 February 2005, 08:06:59

On Sun, 2005-02-06 at 18:05 -0500, Tom Lane wrote:
> Can anyone think of a reason we aren't inlining MemoryContextSwitchTo()
> in GCC builds, similarly to the way list_head() et al are handled?
> 
> It wouldn't be a huge gain, but I consistently see MemoryContextSwitchTo
> eating a percent or three of most profiles.

Sounds good. 

I think we can inlining all MemoryContext functions which check memory
context header and call context->metods->...() only. An example
MemoryContextAlloc() that is very often called from code too.
Karel

-- 
Karel Zak <zakkr@zf.jcu.cz>

Re: Inline MemoryContextSwitchTo?

From

"Simon Riggs"

Date:

07 February 2005, 09:28:29

>Karel Zak wrote
> On Sun, 2005-02-06 at 18:05 -0500, Tom Lane wrote:
> > Can anyone think of a reason we aren't inlining
> MemoryContextSwitchTo()
> > in GCC builds, similarly to the way list_head() et al are handled?
> >
> > It wouldn't be a huge gain, but I consistently see
> MemoryContextSwitchTo
> > eating a percent or three of most profiles.
>
> Sounds good.
>
> I think we can inlining all MemoryContext functions which check memory
> context header and call context->metods->...() only. An example
> MemoryContextAlloc() that is very often called from code too.

Yes, thats good.

But why MemoryContextSwitchTo ? That seems to come out much lower than
MemoryContextAllocZeroAligned or MemoryContextAlloc on the profiles I've
seen.

Best Regards, Simon Riggs

Re: Inline MemoryContextSwitchTo?

From

Tom Lane

Date:

07 February 2005, 15:28:46

"Simon Riggs" <simon@2ndquadrant.com> writes:
> But why MemoryContextSwitchTo ?

Because (a) it's so small that inlining it will probably be a net code
savings rather than expenditure, and (b) it does have noticeable cost.
For example, in this gprof profile taken Saturday:
 %   cumulative   self              self     total           time   seconds   seconds    calls  ms/call  ms/call  name
 31.25     22.40    22.40                             _mcount 3.31     24.77     2.37   704032     0.00     0.02
IndexNext2.82     26.79     2.02  2112850     0.00     0.00  AllocSetAlloc 2.48     28.57     1.78  2821112     0.00
0.00  LockBuffer 2.13     30.10     1.53   701932     0.00     0.01  heap_release_fetch 1.97     31.51     1.41
6310394    0.00     0.00  MemoryContextSwitchTo 1.97     32.92     1.41   699632     0.00     0.00  int8inc 1.66
34.11    1.19  1886388     0.00     0.00  LWLockAcquire 1.62     35.27     1.16   474244     0.00     0.00  hash_search
1.56    36.39     1.12  2109900     0.00     0.00  AllocSetReset 1.46     37.44     1.05   701901     0.00     0.00
_bt_restscan1.42     38.46     1.02  2109079     0.00     0.00  memset 1.39     39.46     1.00   701901     0.00
0.00 _bt_step 1.24     40.35     0.89   701833     0.00     0.00  ExecEvalExprSwitchContext 1.20     41.21     0.86
704143    0.00     0.00  _bt_checkkeys 1.17     42.05     0.84  1886388     0.00     0.00  LWLockRelease 1.17     42.89
   0.84   701901     0.00     0.00  _bt_next 1.05     43.64     0.75   701833     0.00     0.00
HeapTupleSatisfiesSnapshot1.03     44.38     0.74   704144     0.00     0.01  btgettuple 1.03     45.12     0.74
                    $$dyncall 1.02     45.85     0.73  2110119     0.00     0.00  AllocSetCheck 0.91     46.50     0.65
 706412     0.00     0.01  ReleaseAndReadBuffer
 
(all else below 1%)

the only thing I see in that list that looks reasonable to inline is
MemoryContextSwitchTo.  (This is ye olde test_setup/test_run case on
a single processor, which is not very interesting lock-wise but I wanted
to reconfirm that we weren't spending a large fraction of the runtime
inside bufmgr.)
        regards, tom lane