Re: [HACKERS] Profile of current backend - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: [HACKERS] Profile of current backend
Date
Msg-id 199802061709.MAA11628@candle.pha.pa.us
Whole thread Raw
In response to Re: [HACKERS] Profile of current backend  (Mattias Kregert <matti@algonet.se>)
List pgsql-hackers
>
> Bruce Momjian wrote:
> >
> > Interesting.  Nothing is jumping out at me.  Looks like we could try to
> > clean up heapgettup() to see if there is anything in there that can be
> > speeded up.
> >
> > None of the calls looks like it should be inlined.  Do you see any that
> > look good for inlining?
>
> ExecScan() seems to be the only func which calls SeqNext(), which in
> turn accounts for 60% of the calls to heap_getnext(), which does 80% of
> the calls to heapgettup().

This is certainly the type of analysis that will help us.  I disagree on
inlining everything, because the code maintenance becomes a nightmare,
but we certainly could improve things.

Currently, if a function is declared static, is it only called from
within that file.  If there is only one call to that function in the
file, it could be inlined.  I suggest we only attack things like the
below where we have a large amount of function call overhead.  Also,
there are cases where inlining the 'top' of the function can be really
good.  For heap_getattr(), many cases were handled in the top that just
did a quick return.  In those cases, I took the top and made a macro out
of it, and only called the function if I needed real processing.

This is a great help, and any patches would be good.  The other nice
thing about this is that it is pretty compartementalized.  You just need
to check that the new code behaves like the old code, and you are safe.

This is great analysis of the problem, too.

>
> - Put SeqNext() into ExecScan() to lower function call overhead? [minimal optim.]

Sure.  Just comment it so it is clear for readers, so it is not pulled
out into a separate function some day.





>
> - In heapgettup(), 50% is the func itself and 50% is called funcs.
>   Top four CPU consumers:
>     0.04    0.14    9924/9924        RelationGetBufferWithBuffer [148]
>     0.03    0.15    5642/5702        ReleaseAndReadBuffer [145]
>     0.10    0.00   26276/42896       nocachegetattr [158]
>     0.01    0.08    7111/9607        HeapTupleSatisfiesVisibility [185]
>
>   RelationGetBufferWithBuffer() seems to be called from here only. If so, inline.

OK.

>
> - Looking at RelationGetBufferWithBuffer():
>     0.00    0.10    4603/32354       ReadBuffer [55]
>   ReadBuffer() is the biggest cpu consumer called by RelationGetBufferWithBuffer(). (55%)
>
>   -> *** 97% of ReadBuffer() CPU time is in calling ReadBufferWithBufferLock()
>
>   -> 85% of ReadBufferWithBufferLock() CPU time is in calling BufferAlloc().
>   -> ReadBufferWithBufferLock() is the only func calling BufferAlloc().
>   -> Conclusion: INLINE BufferAlloc().

My only warning is that I have never inlined a function, AND left
another copy of the function non-inlined.  The reason is that you are
left with two copies of the function, and that can get really confusing.
I don't know how others feel about that, but the code is already so
complicated, I hesitate to start duplicating blocks of code.  Of course,
if you make it a macro, you have it in one place, and the preprocessor
is duplicating it for you.

I have converted a function to a macro, and I have inlined
frequently-called functions.


>
> - Looking at BufferAlloc():
>     0.04    0.25   37974/37974       BufTableLookup [114]
>     0.10    0.00   32340/151781      SpinAcquire [81]
>     0.10    0.00   37470/40585       PinBuffer [209]
>     0.08    0.00   38478/43799       RelationGetLRelId [234]
>     0.04    0.00   37974/151781      SpinRelease [175]
>
>   -> 40% of BufferAlloc() CPU time is in calling BufTableLookup().
>   -> BufferAlloc() is the only func calling BufTableLookup().
>   -> Conclusion: INLINE BufTableLookup().
>
> - Looking at BufTableLookup():
>   86% of CPU time is in calling hash_search(). The rest is own time.
>
> - Looking at hash_search():
>     0.13    0.41  179189/179189      call_hash [69]
>     0.00    0.00       6/6           bucket_alloc [1084]
>   -> Conclusion: INLINE call_hash() [and bucket_alloc()] into hash_search().
>
> - Looking at call_hash():
>     0.37    0.00  171345/171345      tag_hash [94]
>     0.04    0.00    7844/7844        string_hash [348]
>   -> Conclusion: INLINE tag_hash() [and string_hash()] into call_hash().
>   -> Perhaps disk_hash() could be used in some way? It is currently #ifdef'd away.
>   -> Could we use a lookup table instead of doing hash calculations? Would not that
>   ->  be much faster?
>
>
> It looks to me as if there are too many levels of function calls.
> Perhaps all functions which are called by only one other func should be inlined?
>
>
> Guesstimate:
>   This would speed up heapgettup() by 10% ???
>   Other functions would speed up too.

Makes sense.  If the functions are in the same source file, and are
declared static, doesn't the compiler inline some of them.

See my heap_getattr() macro for a way perhaps to macro-ize some of
these.  The macro looks much clearer than the old style macros.

--
Bruce Momjian
maillist@candle.pha.pa.us

pgsql-hackers by date:

Previous
From: Tom I Helbekkmo
Date:
Subject: Re: [HACKERS] Bug?
Next
From: Bruce Momjian
Date:
Subject: Re: [HACKERS] Profile of current backend