Re: [HACKERS] Profile of current backend - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: [HACKERS] Profile of current backend
Date
Msg-id 199804241503.LAA16668@candle.pha.pa.us
Whole thread Raw
In response to Re: [HACKERS] Profile of current backend  (Mattias Kregert <matti@algonet.se>)
List pgsql-hackers
>
> Bruce Momjian wrote:
> >
> > Interesting.  Nothing is jumping out at me.  Looks like we could try to
> > clean up heapgettup() to see if there is anything in there that can be
> > speeded up.
> >
> > None of the calls looks like it should be inlined.  Do you see any that
> > look good for inlining?
>
> ExecScan() seems to be the only func which calls SeqNext(), which in
> turn accounts for 60% of the calls to heap_getnext(), which does 80% of
> the calls to heapgettup().
>
> - Put SeqNext() into ExecScan() to lower function call overhead? [minimal optim.]
>
> - In heapgettup(), 50% is the func itself and 50% is called funcs.
>   Top four CPU consumers:
>     0.04    0.14    9924/9924        RelationGetBufferWithBuffer [148]
>     0.03    0.15    5642/5702        ReleaseAndReadBuffer [145]
>     0.10    0.00   26276/42896       nocachegetattr [158]
>     0.01    0.08    7111/9607        HeapTupleSatisfiesVisibility [185]
>
>   RelationGetBufferWithBuffer() seems to be called from here only. If so, inline.
>
> - Looking at RelationGetBufferWithBuffer():
>     0.00    0.10    4603/32354       ReadBuffer [55]
>   ReadBuffer() is the biggest cpu consumer called by RelationGetBufferWithBuffer(). (55%)
>
>   -> *** 97% of ReadBuffer() CPU time is in calling ReadBufferWithBufferLock()
>
>   -> 85% of ReadBufferWithBufferLock() CPU time is in calling BufferAlloc().
>   -> ReadBufferWithBufferLock() is the only func calling BufferAlloc().
>   -> Conclusion: INLINE BufferAlloc().
>
> - Looking at BufferAlloc():
>     0.04    0.25   37974/37974       BufTableLookup [114]
>     0.10    0.00   32340/151781      SpinAcquire [81]
>     0.10    0.00   37470/40585       PinBuffer [209]
>     0.08    0.00   38478/43799       RelationGetLRelId [234]
>     0.04    0.00   37974/151781      SpinRelease [175]
>
>   -> 40% of BufferAlloc() CPU time is in calling BufTableLookup().
>   -> BufferAlloc() is the only func calling BufTableLookup().
>   -> Conclusion: INLINE BufTableLookup().
>
> - Looking at BufTableLookup():
>   86% of CPU time is in calling hash_search(). The rest is own time.
>
> - Looking at hash_search():
>     0.13    0.41  179189/179189      call_hash [69]
>     0.00    0.00       6/6           bucket_alloc [1084]
>   -> Conclusion: INLINE call_hash() [and bucket_alloc()] into hash_search().
>
> - Looking at call_hash():
>     0.37    0.00  171345/171345      tag_hash [94]
>     0.04    0.00    7844/7844        string_hash [348]
>   -> Conclusion: INLINE tag_hash() [and string_hash()] into call_hash().
>   -> Perhaps disk_hash() could be used in some way? It is currently #ifdef'd away.
>   -> Could we use a lookup table instead of doing hash calculations? Would not that
>   ->  be much faster?
>
>
> It looks to me as if there are too many levels of function calls.
> Perhaps all functions which are called by only one other func should be inlined?
>
>
> Guesstimate:
>   This would speed up heapgettup() by 10% ???
>   Other functions would speed up too.

I have reviewed your findings, and have inlined 9 small functions that
are called for every row.  Unfortunately, several of the ones you
mention above are called via function pointers, so it was not possible
to inline them.

In my testing, I created a table with 160k rows, and did a SELECT that
would do a sequential scan of the table, and looked at the functions
that were called the most.

---------------------------------------------------------------------------

  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 54.3       2.65     2.65                             mcount (472)
 16.2       3.44     0.79     2999     0.26     0.26  _read [18]
  3.5       3.61     0.17   160148     0.00     0.01  _heapgettup [15]
  2.5       3.73     0.12   174007     0.00     0.00  _SpinAcquire [36]
  2.3       3.84     0.11   981732     0.00     0.00  _BufferIsValid [37]
  2.3       3.95     0.11   160148     0.00     0.01  _heap_getnext [14]
  2.0       4.05     0.10   160096     0.00     0.00  _ExecEvalVar [38]
  2.0       4.15     0.10   160096     0.00     0.00  _ExecMakeFunctionResult <cycle 3> [26]
  1.4       4.22     0.07   174007     0.00     0.00  _SpinRelease [41]
  1.2       4.28     0.06    18176     0.00     0.00  _hash_search [40]
  1.0       4.33     0.05   480288     0.00     0.00  _ExecEvalExpr <cycle 3> [34]
  1.0       4.38     0.05   167313     0.00     0.00  _RelationGetLRelId [47]
  1.0       4.43     0.05   160096     0.00     0.00  _ExecEvalFuncArgs <cycle 3> [48]
  0.8       4.47     0.04   160097     0.00     0.01  _SeqNext [13]
  0.8       4.51     0.04   160096     0.00     0.00  _ExecQual [22]
  0.6       4.54     0.03   160096     0.00     0.00  _ExecEvalOper <cycle 3> [51]
  0.4       4.56     0.02   163015     0.00     0.00  _IncrBufferRefCount [61]
  0.4       4.58     0.02   161342     0.00     0.00  _fmgr_c <cycle 5> [29]
  0.4       4.60     0.02   160451     0.00     0.00  _TransactionIdEquals [62]
  0.4       4.62     0.02   160148     0.00     0.00  _RelationGetBufferWithBuffer [28]
  0.4       4.64     0.02   160098     0.00     0.00  _TupIsNull [63]
  0.4       4.66     0.02   160097     0.00     0.00  _ExecIncrSlotBufferRefcnt [45]
  0.4       4.68     0.02   160096     0.00     0.00  _ExecQualClause [23]
  0.4       4.70     0.02     6161     0.00     0.00  _strcpy [64]
  0.4       4.72     0.02       31     0.65     0.65  _open [65]
  0.2       4.73     0.01   164288     0.00     0.00  _ReleaseBuffer [81]
  0.2       4.74     0.01   164048     0.00     0.00  _PageGetMaxOffsetNumber [97]
  0.2       4.75     0.01   160450     0.00     0.00  _HeapTupleSatisfiesVisibility [50]
  0.2       4.76     0.01   160449     0.00     0.00  _HeapTupleSatisfiesNow [98]
  0.2       4.77     0.01   160100     0.00     0.00  _ExecClearTuple [49]
  0.2       4.78     0.01   160098     0.00     0.00  _ExecStoreTuple [46]
  0.2       4.79     0.01   160096     0.00     0.00  _bpchareq [102]
  0.2       4.80     0.01    15696     0.00     0.00  _call_hash [99]
  0.2       4.81     0.01    12187     0.00     0.00  _malloc [96]
  0.2       4.82     0.01     9182     0.00     0.00  _PortalHeapMemoryAlloc [73]
  0.2       4.83     0.01     7257     0.00     0.00  _newNode [54]
  0.2       4.84     0.01     2973     0.00     0.20  _ReleaseAndReadBuffer [25]
  0.2       4.85     0.01     1241     0.01     0.02  _LockAcquire [58]
  0.2       4.86     0.01      189     0.05     0.05  _IsBootstrapProcessingMode [103]
  0.2       4.87     0.01        3     3.33     3.33  _shmdt [101]
  0.2       4.88     0.01        1    10.00  2172.84  _ExecScan [11]
  0.0       4.88     0.00   166214     0.00     0.00  _BufferGetBlock [72]
  0.0       4.88     0.00   160449     0.00     0.00  _heapisoverride [477]
  0.0       4.88     0.00    15303     0.00     0.00  _tag_hash [478]
  0.0       4.88     0.00    11718     0.00     0.00  _AllocSetContains [479]
  0.0       4.88     0.00    11718     0.00     0.00  _OrderedSetContains [480]
  0.0       4.88     0.00    11366     0.00     0.00  _OrderedElemPush [481]
  0.0       4.88     0.00    11366     0.00     0.00  _OrderedElemPushHead [482]
  0.0       4.88     0.00    11366     0.00     0.00  _OrderedElemPushInto [483]
  0.0       4.88     0.00    11361     0.00     0.00  _AllocSetAlloc [104]
  0.0       4.88     0.00    11361     0.00     0.00  _MemoryContextAlloc [67]
  0.0       4.88     0.00    11339     0.00     0.00  _palloc [68]
  0.0       4.88     0.00    10515     0.00     0.00  _free [484]
  0.0       4.88     0.00    10488     0.00     0.00  _AllocSetFree [485]
  0.0       4.88     0.00    10488     0.00     0.00  _OrderedElemPop [486]
  0.0       4.88     0.00     8176     0.00     0.00  _get_expr [487]
  0.0       4.88     0.00     7953     0.00     0.00  _AllocSetGetFirst [488]
  0.0       4.88     0.00     7953     0.00     0.00  _OrderedSetGetHead [489]
  0.0       4.88     0.00     7952     0.00     0.00  _OrderedElemGetBase [490]
  0.0       4.88     0.00     7424     0.00     0.00  _memmove [491]


---------------------------------------------------------------------------

Of these, I inlined:

    #define TransactionIdEquals(id1, id2) \
    #define TupIsNull(slot) \
    #define ExecIncrSlotBufferRefcnt(slot) \
    #define BufferIsValid(bufnum) \
    #define IncrBufferRefCount(buffer) \
    #define BufferGetBlock(buffer) \
    #define PageGetMaxOffsetNumber(page) \
    #define HeapTupleSatisfiesVisibility(tuple, seeself) \
    #define    heapisoverride() \

When I say linined, I mean I made macros of them.  I did not inline the
code into other functions, because you loose the modularity of the
function name, and many are called multiple times.

These were all small functions, only a few lines of actual code.
BufferIsValid was called several times per row.

With that said, the execution times did not change at all.  What I did
see is that the mcount row in gprof output:

 54.3       2.65     2.65                             mcount (472)

decreased from 2.65 to 1.79, which means that less time was spent in
function calls.  Not sure why that time did not translate to better
overall performance, but the inlining is a clear win for these small
functions.  mcount is artificially high anyway because it is doing the
profiling.  The resulting gprof output shows a much smaller list of
functions called 160k times.

A patch will be posted to the patches list, for any of you who subscribe
to that.

--
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] Re: [INTERFACES] retrieving varchar size
Next
From: Byron Nikolaidis
Date:
Subject: Re: [HACKERS] Re: [INTERFACES] retrieving varchar size