Thread: [HACKERS] Dynamic instrumentation of lwlock wait times (lwlock flamegraphs)
[HACKERS] Dynamic instrumentation of lwlock wait times (lwlock flamegraphs)
From
Andres Freund
Date:
Hi, At pgcon some people were talking about the difficulty of instrumenting the time actually spent waiting for lwlocks and related measurements. I'd mentioned that linux these days provides infrastructure to measure such things in unmodified binaries. Attached is a prototype of a script that measures the time spent inside PGSemaphoreLock(), aggregates that inside the kernel, grouped by pid and stacktrace. That allows one to generate nice flame graphs showing which part of the code waits how long for lwlocks. The attached script clearly needs improvements, but I thought I'd show some of the results it can get. To run it you need the the python library of the 'bcc' project [1], and a sufficiently new kernel. Some distributions, e.g. newer debian versions, package this as python-bpfcc and similar. The output of the script can be turned into a flamegraph with https://github.com/brendangregg/FlameGraph 's flamegraph.pl. Here's a few examples of a pgbench run. The number is the number of clients, sync/async indicates synchronous_commit on/off. All the numbers here were generated with the libpq & pgbench batch patches applied and in use, but that's just because that was the state of my working tree. http://anarazel.de/t/2017-06-22/pgsemwait_8_sync.svg http://anarazel.de/t/2017-06-22/pgsemwait_8_async.svg http://anarazel.de/t/2017-06-22/pgsemwait_64_sync.svg http://anarazel.de/t/2017-06-22/pgsemwait_64_async.svg A bonus, not that representative one is the wait for a pgbench readonly run after the above, with autovacuum previously disabled: http://anarazel.de/t/2017-06-22/pgsemwait_64_select.svg interesting to see how the backends themselves never end up having to flush WAL themselves, or at least not in a manner triggering contention. I plan to write a few more of these, because they're hugely useful to understand actual locking behaviour. Among them: - time beteen Acquire/Release of lwlocks, grouped by lwlock - time beteen Acquire/Release of lwlocks, grouped by stack - callstack of acquirer and waker of lwlocks, grouped by caller stack, waiter stack - ... I think it might be interesting to collect a few of these somewhere centrally once halfway mature. Maybe in src/tools or such. Greetings, Andres Freund [1] https://github.com/iovisor/bcc -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Re: [HACKERS] Dynamic instrumentation of lwlock wait times (lwlockflamegraphs)
From
Stas Kelvich
Date:
> On 23 Jun 2017, at 00:08, Andres Freund <andres@anarazel.de> wrote: > > Hi, > > At pgcon some people were talking about the difficulty of instrumenting > the time actually spent waiting for lwlocks and related measurements. > I'd mentioned that linux these days provides infrastructure to measure > such things in unmodified binaries. > > Attached is a prototype of a script that measures the time spent inside > PGSemaphoreLock(), aggregates that inside the kernel, grouped by pid and > stacktrace. That allows one to generate nice flame graphs showing which > part of the code waits how long for lwlocks. > > The attached script clearly needs improvements, but I thought I'd show > some of the results it can get. To run it you need the the python > library of the 'bcc' project [1], and a sufficiently new kernel. Some > distributions, e.g. newer debian versions, package this as python-bpfcc > and similar. > > The output of the script can be turned into a flamegraph with > https://github.com/brendangregg/FlameGraph 's flamegraph.pl. > > Here's a few examples of a pgbench run. The number is the number of > clients, sync/async indicates synchronous_commit on/off. All the > numbers here were generated with the libpq & pgbench batch patches > applied and in use, but that's just because that was the state of my > working tree. > > http://anarazel.de/t/2017-06-22/pgsemwait_8_sync.svg > http://anarazel.de/t/2017-06-22/pgsemwait_8_async.svg > http://anarazel.de/t/2017-06-22/pgsemwait_64_sync.svg > http://anarazel.de/t/2017-06-22/pgsemwait_64_async.svg > > A bonus, not that representative one is the wait for a pgbench readonly > run after the above, with autovacuum previously disabled: > http://anarazel.de/t/2017-06-22/pgsemwait_64_select.svg > interesting to see how the backends themselves never end up having to > flush WAL themselves, or at least not in a manner triggering contention. > > I plan to write a few more of these, because they're hugely useful to > understand actual locking behaviour. Among them: > - time beteen Acquire/Release of lwlocks, grouped by lwlock > - time beteen Acquire/Release of lwlocks, grouped by stack > - callstack of acquirer and waker of lwlocks, grouped by caller stack, waiter stack > - ... > > I think it might be interesting to collect a few of these somewhere > centrally once halfway mature. Maybe in src/tools or such. Wow, that’s extremely helpful, thanks a lot. Stas Kelvich Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Dynamic instrumentation of lwlock wait times (lwlock flamegraphs)
From
Andres Freund
Date:
Hi, On 2017-06-22 14:08:45 -0700, Andres Freund wrote: > At pgcon some people were talking about the difficulty of instrumenting > the time actually spent waiting for lwlocks and related measurements. > I'd mentioned that linux these days provides infrastructure to measure > such things in unmodified binaries. > > Attached is a prototype of a script that measures the time spent inside > PGSemaphoreLock(), aggregates that inside the kernel, grouped by pid and > stacktrace. That allows one to generate nice flame graphs showing which > part of the code waits how long for lwlocks. > > The attached script clearly needs improvements, but I thought I'd show > some of the results it can get. To run it you need the the python > library of the 'bcc' project [1], and a sufficiently new kernel. Some > distributions, e.g. newer debian versions, package this as python-bpfcc > and similar. > > The output of the script can be turned into a flamegraph with > https://github.com/brendangregg/FlameGraph 's flamegraph.pl. The script has bitrot slightly, due to python3 and postgres changes (the move to posix semaphores). Updated version attached. Based on the discussion in https://www.postgresql.org/message-id/20200813003934.yrm4qqngfygr6ii7%40alap3.anarazel.de Greetings, Andres Freund