Re: [HACKERS] Re: [PATCHES] Try again: S_LOCK reduced contentionh] - Mailing list pgsql-hackers

From dg@illustra.com (David Gould)
Subject Re: [HACKERS] Re: [PATCHES] Try again: S_LOCK reduced contentionh]
Date
Msg-id 9805122021.AA16043@hawk.illustra.com
Whole thread Raw
In response to Re: [PATCHES] Try again: S_LOCK reduced contentionh]  (Bruce Momjian <maillist@candle.pha.pa.us>)
Responses Re: [HACKERS] Re: [PATCHES] Try again: S_LOCK reduced contentionh]
List pgsql-hackers
Bruce Momjian:
> David Gould:
> > This is always how micro-optimization goes, 1% and 2% gains here and there.
> > I am very familiar with it.
>
> I said 0.01 seconds at a time, not 1% at a time.  At this point, a 0.01
> seconds savings is 12%, because the total test takes 0.08 seconds.

I think I may have been unclear here. I was merely attempting to agree that
optimization is a process of accumulating small gains.

> > Have you done call graph profile for this, or just flat profiling? I think
> > you may find the call graph (gprof) output revealing, although perhaps not
> > on this particular topic...
>
> I ran gprof.  I did not look at the call graph, just the total number of
> calls.  We have a very modular system, and the call overhead can get
> exessive.  gprof shows tas() getting called far more than any other
> function.  It shows it as 0.01 seconds, on a 0.08 second test!  Now, I
> realize that gprof measurement is not perfect, but it certainly shows
> tas as being called a lot.

I agree the system is sometimes excessively layered resulting in many
trivial calls.

I agree tas() is called a lot. I am trying to understand if the overhead seen
below is in the call itself, or in the actual work of synchronization.

>  %   cumulative   self              self     total
> time   seconds   seconds    calls  ms/call  ms/call  name
> 20.0       0.02     0.02                             mcount (463)
> 10.0       0.03     0.01     5288     0.00     0.00  _tas [31]
> 10.0       0.04     0.01     2368     0.00     0.00  _hash_search [30]
> 10.0       0.05     0.01     1631     0.01     0.02  _malloc [11]
> 10.0       0.06     0.01      101     0.10     0.10  _sbrk [35]
> 10.0       0.07     0.01       56     0.18     0.20  _heapgettup [25]
> 10.0       0.08     0.01        4     2.50     2.50  _write [32]
> 10.0       0.09     0.01        2     5.00     5.00  ___sysctl [33]
> 10.0       0.10     0.01        1    10.00    10.41  _cnfify [28]
>  0.0       0.10     0.00     1774     0.00     0.00  _call_hash [468]
>  0.0       0.10     0.00     1604     0.00     0.00  _tag_hash [469]
>  0.0       0.10     0.00     1380     0.00     0.00  _OrderedElemPush [470]
>  0.0       0.10     0.00     1380     0.00     0.00  _OrderedElemPushHead [471]
>  0.0       0.10     0.00     1380     0.00     0.00  _OrderedElemPushInto [472]
>  0.0       0.10     0.00     1375     0.00     0.02  _AllocSetAlloc [12]
>  0.0       0.10     0.00     1375     0.00     0.02  _MemoryContextAlloc [13]
>  0.0       0.10     0.00     1353     0.00     0.02  _palloc [14]
>  0.0       0.10     0.00     1322     0.00     0.01  _SpinAcquire [45]

I asked about the call graph for two reasons (the first of which is part
of another thread):

1) I would expect that the 1353 calls to palloc() are also responsible for:

     1375 _MemoryContextAlloc
     1375 _AllocSetAlloc
     1380 _OrderedElemPushInto
     1380 _OrderedElemPush
  for a total of (1353 + 1375 + 1375 + 1380 + 1380) = 6863 calls.
  (not including the further 1631 _malloc and 101 _sbrk calls).

  I am curious why these calls do not seem to show up on the cumulative time.

2) I wonder how fine the resolution of the profile is. I am assuming that all
   the overhead of tas comes from either:
    - the call overhead
    - or the actual work done in tas().
   Given that, I wonder if the call overhead is the major part, it could be
   that the bus/cache synchronization is the real overhead. As a function,
   it is easy to identify tas(). As a macro it does not show up on the
   profile and its contribution to the overhead is distributed amoung all the
   callers which makes it less obvious on the profile. I was hoping the call
   graph would help identify which was the case.

In any case, I will test with it as a macro and as a function. It may also
be instructive to make a dummy tas() that does nothing and if that shows
the overhead to be in the actual synchronization, or in the calling. I will
test this too.

My intent here is not to be argumentative. My current mental model is that
excess calls are unfortunate but not especially harmful and not usually
worth changing (sometimes of course an inline sequence allows the optimizer
to do something clever and makes more difference than expected). If this view
is incorrect, I would like to know and understand that so that I can adjust
my theory accordingly.

> I don't understand your hesitation.  The code WAS inlined.  It was
> inlined because gprof showed is as being called a lot.  Most of them are
> ASM anyway, so what does it matter if it sits in a a *.c or *.h file, an
> asm() call looks the same in a macro or in a function.

I do not feel strongly about this. I prefer the function on the grounds of
clarity and ease of maintenance and porting. But I would be happy to make it
a macro even if the performance difference is not significant.

> If it makes you feel better, put it in something called tas.h, and add
> it as an include in all the files that include s_lock.h, or have
> s_lock.h include tas.h.

I am fine with folding it all into s_lock.h. If you wish, I will do so.

> I am not looking around for 1% optimization.  I am using gprof output to
> improve things that gprof shows need improving.

Please, I am not intending _any_ criticism. I have no disagreement with what
you are doing. I am glad to see us working on performance.

-dg

David Gould            dg@illustra.com           510.628.3783 or 510.305.9468
Informix Software  (No, really)         300 Lakeside Drive  Oakland, CA 94612
"My life has been full of wonderful moments -
  it's only later that they become embarassing."
                                            -- Gerhard Berger

pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: [HACKERS] Unusual notices in latest snapshot
Next
From: ocie@paracel.com
Date:
Subject: Re: [HACKERS] mmap and MAP_ANON