Re: testing ProcArrayLock patches - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: testing ProcArrayLock patches |
Date | |
Msg-id | 201111182312.02527.andres@anarazel.de Whole thread Raw |
In response to | Re: testing ProcArrayLock patches ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>) |
Responses |
Re: testing ProcArrayLock patches
Re: testing ProcArrayLock patches |
List | pgsql-hackers |
On Friday, November 18, 2011 09:16:01 PM Kevin Grittner wrote: > Andres Freund <andres@anarazel.de> wrote: > > When doing line-level profiles I would suggest looking at the > > instructions. > What's the best way to do that? I think opannotate -a -s produces output with instructions/code intermingled. > > I don't think cache line contention is the most likely candidate > > here. Simple cache-misses seem far more likely. In combination > > with pipeline stalls... > > > > Newer cpus (nehalem+) can measure stalled cycles which can be > > really useful when analyzing performance. I don't remember how to > > do that with oprofile right now though as I use perf these days > > (its -e stalled-cycles{frontend|backend} there}). > > When I run oprofile, I still always go back to this post by Tom: > http://archives.postgresql.org/pgsql-performance/2009-06/msg00154.php Hrm. I am on the train and for unknown reasons the only sensible working protocols are smtp + pop.... Waiting.... Waiting.... Sorry, too slow/high latency atm. I wrote everything below and another mail and the page still hasn't loaded. oprofile can produces graphes as well (--callgraph). for both tools you need -fno-omit-frame-pointers to get usable graphs. > Can anyone provide such a "cheat sheet" for perf? I could give that > a try if I knew how. Unfortunately for sensible results the kernel needs to be rather new. I would say > 2.6.28 or so (just guessed). # to record activity perf record [-g|--call-graph] program|-p pid # to view a summation perf report graph: # Overhead Command Shared Object Symbol # ........ ........ ................. ......................................... # 4.09% postgres postgres [.] slab_alloc_dyn | --- slab_alloc_dyn | |--18.52%-- new_list | | | |--63.79%-- lappend | | | | | |--13.40%-- find_usable_indexes | | | create_index_paths | | | set_rel_pathlist | | | make_one_rel flat: # Overhead Command Shared Object Symbol # ........ ........ ................. ......................................... # 5.10% postgres [vdso] [.] 0x7ffff3d8d770 4.26% postgres postgres [.] base_yyparse 3.88% postgres postgres [.] slab_alloc_dyn 2.82% postgres postgres [.] core_yylex 2.37% postgres postgres [.] SearchCatCache 1.85% postgres libc-2.13.so [.] __memcpy_ssse3 1.66% postgres libc-2.13.so [.] __GI___strcmp_ssse3 1.23% postgres postgres [.] MemoryContextAlloc # to view a line/source/instruction level view perf annotate -l symbol ... : : /* : * one-time startup overhead for each cache : */ : if (cache->cc_tupdesc == NULL) 0.35 : 6e81fd: 48 83 7f 28 00 cmpq $0x0,0x28(%rdi)/home/andres/src/postgresql/build/optimize/../../src/backend/utils/cache/catcache.c:1070 4.15 : 6e8202: 0f 84 54 04 00 00 je 6e865c <SearchCatCache+0x47c> : #endif : : /* : * initialize the search key information : */ : memcpy(cur_skey, cache->cc_skey, sizeof(cur_skey)); 0.00 : 6e8208: 48 8d bd a0 fe ff ff lea -0x160(%rbp),%rdi 0.17 : 6e820f: 49 8d 77 70 lea 0x70(%r15),%rsi 0.00 : 6e8213: b9 24 00 00 00 mov $0x24,%ecx/home/andres/src/postgresql/build/optimize/../../src/backend/utils/cache/catcache.c:1080 33.22 : 6e8218: f3 48 a5 rep movsq %ds:(%rsi),%es:(%rdi) : cur_skey[0].sk_argument = v1;/home/andres/src/postgresql/build/optimize/../../src/backend/utils/cache/catcache.c:1081 1.56 : 6e821b: 48 89 9d e0 fe ff ff mov %rbx,-0x120(%rbp) ... # get heaps of stats from something perf stat -ddd someprogram|-p pid 1242.409965 task-clock # 0.824 CPUs utilized [100.00%] 14,572 context-switches # 0.012 M/sec [100.00%] 264 CPU-migrations # 0.000M/sec [100.00%] 0 page-faults # 0.000 M/sec 2,854,775,135cycles # 2.298 GHz [26.28%] <not supported> stalled-cycles-frontend <not supported> stalled-cycles-backend 2,024,997,785 instructions # 0.71 insnsper cycle [25.25%] 387,240,903 branches # 311.685 M/sec [26.51%] 21,756,886 branch-misses # 5.62% of all branches [26.26%] 753,912,137 L1-dcache-loads # 606.814 M/sec [13.24%] 52,733,007 L1-dcache-load-misses # 6.99% of all L1-dcachehits [14.72%] 35,006,406 LLC-loads # 28.176 M/sec [15.46%] 26,673 LLC-load-misses # 0.08% of all LL-cache hits [13.38%] 1,855,654,347 L1-icache-loads # 1493.593 M/sec [12.63%] 52,169,033 L1-icache-load-misses # 2.81% of all L1-icache hits [12.88%] 761,475,250 dTLB-loads # 612.902 M/sec [13.37%] 4,457,558 dTLB-load-misses # 0.59% of all dTLB cache hits [13.12%] 2,049,753,137 iTLB-loads # 1649.820M/sec [20.09%] 4,139,394 iTLB-load-misses # 0.20% of all iTLB cache hits [19.31%] 3,705,429 L1-dcache-prefetches # 2.982 M/sec [19.64%] <not supported> L1-dcache-prefetch-misses 1.507855345 seconds time elapsed -r can repeat a command and gives you the standard derivation... # show whats the system executing overall perf top -az # get help perf help (record|report|annotate|stat|...) In new versions many commands (those that produce pageable text) take --stdio and --tui to select between two interfaces. I personally fnd --tui unusable. I am not really sure how good the results are compared to oprofile I just prefer the ui by far... Also the overhead seems to be measurably smaller. Also its usable by every user, not just root... Hope that suffices? I have no problem answering further questions, so ... Andres
pgsql-hackers by date: