Re: testing ProcArrayLock patches - Mailing list pgsql-hackers

From Andres Freund
Subject Re: testing ProcArrayLock patches
Date
Msg-id 201111182312.02527.andres@anarazel.de
Whole thread Raw
In response to Re: testing ProcArrayLock patches  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Responses Re: testing ProcArrayLock patches
Re: testing ProcArrayLock patches
List pgsql-hackers
On Friday, November 18, 2011 09:16:01 PM Kevin Grittner wrote:
> Andres Freund <andres@anarazel.de> wrote:
> > When doing line-level profiles I would suggest looking at the
> > instructions.
> What's the best way to do that?
I think opannotate -a -s produces output with instructions/code intermingled.

> > I don't think cache line contention is the most likely candidate
> > here.  Simple cache-misses seem far more likely. In combination
> > with pipeline stalls...
> > 
> > Newer cpus (nehalem+) can measure stalled cycles which can be
> > really useful when analyzing performance. I don't remember how to
> > do that with oprofile right now though as I use perf these days
> > (its -e stalled-cycles{frontend|backend} there}).
> 
> When I run oprofile, I still always go back to this post by Tom:
> http://archives.postgresql.org/pgsql-performance/2009-06/msg00154.php
Hrm. I am on the train and for unknown reasons the only sensible working
protocols are smtp + pop.... Waiting.... Waiting....
Sorry, too slow/high latency atm. I wrote everything below and another mail
and the page still hasn't loaded.

oprofile can produces graphes as well (--callgraph). for both tools you need
-fno-omit-frame-pointers to get usable graphs.

> Can anyone provide such a "cheat sheet" for perf?  I could give that
> a try if I knew how.
Unfortunately for sensible results the kernel needs to be rather new.
I would say > 2.6.28 or so (just guessed).

# to record activity
perf record [-g|--call-graph] program|-p pid 

# to view a summation
perf report

graph:
# Overhead   Command      Shared Object                                     Symbol
# ........  ........  .................  .........................................
#    4.09%  postgres  postgres           [.] slab_alloc_dyn           |           --- slab_alloc_dyn              |
                  |--18.52%-- new_list              |          |                        |          |--63.79%-- lappend
           |          |          |                        |          |          |--13.40%-- find_usable_indexes
    |          |          |          create_index_paths              |          |          |          set_rel_pathlist
           |          |          |          make_one_rel
 

flat:

# Overhead   Command      Shared Object                                     Symbol
# ........  ........  .................  .........................................
#    5.10%  postgres  [vdso]             [.] 0x7ffff3d8d770      4.26%  postgres  postgres           [.] base_yyparse
3.88%  postgres  postgres           [.] slab_alloc_dyn    2.82%  postgres  postgres           [.] core_yylex    2.37%
postgres postgres           [.] SearchCatCache    1.85%  postgres  libc-2.13.so       [.] __memcpy_ssse3    1.66%
postgres libc-2.13.so       [.] __GI___strcmp_ssse3    1.23%  postgres  postgres           [.] MemoryContextAlloc
 


# to view a line/source/instruction level view
perf annotate -l symbol

...        :        :              /*        :               * one-time startup overhead for each cache        :
      */        :              if (cache->cc_tupdesc == NULL)   0.35 :        6e81fd:       48 83 7f 28 00
cmpq  $0x0,0x28(%rdi)/home/andres/src/postgresql/build/optimize/../../src/backend/utils/cache/catcache.c:1070   4.15 :
     6e8202:       0f 84 54 04 00 00       je     6e865c <SearchCatCache+0x47c>        :      #endif        :        :
           /*        :               * initialize the search key information        :               */        :
    memcpy(cur_skey, cache->cc_skey, sizeof(cur_skey));   0.00 :        6e8208:       48 8d bd a0 fe ff ff    lea
-0x160(%rbp),%rdi  0.17 :        6e820f:       49 8d 77 70             lea    0x70(%r15),%rsi   0.00 :        6e8213:
   b9 24 00 00 00          mov
$0x24,%ecx/home/andres/src/postgresql/build/optimize/../../src/backend/utils/cache/catcache.c:1080 33.22 :
6e8218:      f3 48 a5                rep movsq %ds:(%rsi),%es:(%rdi)        :              cur_skey[0].sk_argument =
v1;/home/andres/src/postgresql/build/optimize/../../src/backend/utils/cache/catcache.c:1081  1.56 :        6e821b:
48 89 9d e0 fe ff ff    mov    %rbx,-0x120(%rbp)
 
...

# get heaps of stats from something
perf stat -ddd someprogram|-p pid
      1242.409965 task-clock                #    0.824 CPUs utilized           [100.00%]           14,572
context-switches         #    0.012 M/sec                   [100.00%]              264 CPU-migrations            #
0.000M/sec                   [100.00%]                0 page-faults               #    0.000 M/sec
2,854,775,135cycles                    #    2.298 GHz                     [26.28%]  <not supported>
stalled-cycles-frontend  <not supported> stalled-cycles-backend      2,024,997,785 instructions              #    0.71
insnsper cycle         [25.25%]      387,240,903 branches                  #  311.685 M/sec                   [26.51%]
    21,756,886 branch-misses             #    5.62% of all branches         [26.26%]      753,912,137 L1-dcache-loads
       #  606.814 M/sec                   [13.24%]       52,733,007 L1-dcache-load-misses     #    6.99% of all
L1-dcachehits   [14.72%]       35,006,406 LLC-loads                 #   28.176 M/sec                   [15.46%]
 26,673 LLC-load-misses           #    0.08% of all LL-cache hits    [13.38%]    1,855,654,347 L1-icache-loads
# 1493.593 M/sec                   [12.63%]       52,169,033 L1-icache-load-misses     #    2.81% of all L1-icache hits
 [12.88%]      761,475,250 dTLB-loads                #  612.902 M/sec                   [13.37%]        4,457,558
dTLB-load-misses         #    0.59% of all dTLB cache hits  [13.12%]    2,049,753,137 iTLB-loads                #
1649.820M/sec                   [20.09%]        4,139,394 iTLB-load-misses          #    0.20% of all iTLB cache hits
[19.31%]       3,705,429 L1-dcache-prefetches      #    2.982 M/sec                   [19.64%]  <not supported>
L1-dcache-prefetch-misses
      1.507855345 seconds time elapsed

-r can repeat a command and gives you the standard derivation...

# show whats the system executing overall
perf top -az

# get help
perf help (record|report|annotate|stat|...)

In new versions many commands (those that produce pageable text) take --stdio
and --tui to select between two interfaces. I personally fnd --tui unusable.

I am not really sure how good the results are compared to oprofile I
just prefer the ui by far... Also the overhead seems to be measurably 
smaller. Also its usable by every user, not just root...

Hope that suffices? I have no problem answering further questions, so ...

Andres


pgsql-hackers by date:

Previous
From: "Kevin Grittner"
Date:
Subject: Should a materialized view be based on a view?
Next
From: Andres Freund
Date:
Subject: Re: testing ProcArrayLock patches