Re: [PATCH] Identify LWLocks in tracepoints - Mailing list pgsql-hackers
From | Craig Ringer |
---|---|
Subject | Re: [PATCH] Identify LWLocks in tracepoints |
Date | |
Msg-id | CAGRY4nzwYY6+o0jawxv=jXry=ziJ3genVhUn7hA5BAjSWL7vvQ@mail.gmail.com Whole thread Raw |
In response to | Re: [PATCH] Identify LWLocks in tracepoints (Craig Ringer <craig.ringer@enterprisedb.com>) |
Responses |
Re: [PATCH] Identify LWLocks in tracepoints
|
List | pgsql-hackers |
On Tue, 13 Apr 2021 at 21:05, Craig Ringer <craig.ringer@enterprisedb.com> wrote: > On Tue, 13 Apr 2021 at 11:06, Andres Freund <andres@anarazel.de> wrote: > > IIRC those aren't really comparable - the kernel actually does modify > > the executable code to replace the tracepoints with nops. > > Same with userspace static trace markers (USDTs). > > A followup mail will contain a testcase and samples to demonstrate this. Demo follows, with source attached too. gcc 10.2 compiling with -O2, using dtrace and <sys/sdt.h> from systemtap 4.4 . Trivial empty function definition: __attribute__((noinline)) void no_args(void) { SDT_NOOP_NO_ARGS(); } Disassembly when SDT_NOOP_NO_ARGS is defined as #define SDT_NOOP_NO_ARGS() is: <no_args>: retq When built with a probes.d definition processed by the dtrace script instead, the disassembly becomes: <no_args>: nop retq So ... yup, it's a nop. Now, if we introduce semaphores that changes. __attribute__((noinline)) void no_args(void) { if (SDT_NOOP_NO_ARGS_ENABLED()) SDT_NOOP_NO_ARGS(); } disassembles to: <no_args>: cmpw $0x0,0x2ec4(%rip) # <sdt_noop_no_args_semaphore> jne <no_args+0x10> retq nopl 0x0(%rax,%rax,1) nop retq so the semaphore test is actually quite harmful and wasteful in this case. That's not surprising since this SDT is a simple marker point. But what if we supply arguments to it? It turns out that the disassembly is the same if args are passed, whether locals or globals, including globals assigned based on program input that can't be determined at compile time. Still just a nop. If I pass a function call as an argument expression to a probe, e.g. __attribute__((noinline)) static int compute_probe_argument(void) { return 100; } void with_computed_arg(void) { SDT_NOOP_WITH_COMPUTED_ARG(compute_probe_argument()); } then the disassembly with SDTs is: <with_computed_arg>: callq <compute_probe_argument> nop retq so the function call isn't elided even if it's unused. That's somewhat expected. The same will be true if the arguments to a probe require pointer chasing or non-trivial marshalling. If a semaphore guard is added this becomes: <with_computed_arg>: cmpw $0x0,0x2e2e(%rip) # <sdt_noop_with_computed_arg_semaphore> jne <with_computed_arg+0x10> retq nopl 0x0(%rax,%rax,1) callq <compute_probe_argument> nop retq so now the call to compute_probe_argument() is skipped unless the probe is enabled, but the function is longer and requires a test and jump. If I dummy up a function that does some pointer chasing, without semaphores I get <with_pointer_chasing>: mov (%rdi),%rax mov (%rax),%rax mov (%rax),%rax nop retq so the arguments are marshalled then ignored. with semaphores I get: <with_pointer_chasing>: cmpw $0x0,0x2d90(%rip) # <sdt_noop_with_pointer_chasing_semaphore> jne <with_pointer_chasing+0x10> retq nopl 0x0(%rax,%rax,1) mov (%rdi),%rax mov (%rax),%rax mov (%rax),%rax nop retq so again the probe's argument marshalling is inline in the function body, but at the end, and skipped over. Findings: * A probe without arguments or with simple arguments is just a 'nop' instruction * Probes that require function calls, pointer chasing, other expression evaluation etc may impose a fixed cost to collect up arguments even if the probe is disabled. * SDT semaphores can avoid that cost but add a branch, so should probably be avoided unless preparing probe arguments is likely to be expensive. Hideous but effective demo code attached.
Attachment
pgsql-hackers by date: