Re: Reinitialize stack base after fork (for the benefit of rr)? - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Reinitialize stack base after fork (for the benefit of rr)? |
Date | |
Msg-id | CAH2-WzmtkJDWBUqgEyWfWFo6wj2Tk9q_FFAGF7mvv+eB0sQWVg@mail.gmail.com Whole thread Raw |
In response to | Re: Reinitialize stack base after fork (for the benefit of rr)? (Andres Freund <andres@anarazel.de>) |
Responses |
Re: Reinitialize stack base after fork (for the benefit of rr)?
|
List | pgsql-hackers |
On Sun, Apr 5, 2020 at 6:54 PM Andres Freund <andres@anarazel.de> wrote: > I just pushed that. Great! I have found that it's useful to use rr to debug Postgres by following certain recipes. I'll share some of the details now, in case anybody else wants to start using rr and isn't sure where to start. I have a script that records a postgres session using rr with these options: rr record -M /code/postgresql/$BRANCH/install/bin/postgres \ -D /code/postgresql/$BRANCH/data \ --log_line_prefix="%m %p " \ --autovacuum=off \ --effective_cache_size=1GB \ --random_page_cost=4.0 \ --work_mem=4MB \ --maintenance_work_mem=64MB \ --fsync=off \ --log_statement=all \ --log_min_messages=DEBUG5 \ --max_connections=50 \ --shared_buffers=32MB Most of these settings were taken from a similar script that I use to run Valgrind, so the particulars may not matter much -- though it's useful to make the server logs as verbose as possible (you'll see why in a minute). I find it quite practical to run "make installcheck" against the server, recording the entire execution. I find that it's not that much slower than just running the tests against a regular debug build of Postgres. It's still much faster than Valgrind, for example. (Replaying the recording seems to be where having a high end machine helps a lot.) Once the tests are done, I stop Postgres in the usual way (Ctrl + C). The recording is saved to the $HOME/.local/share/rr/ directory on my Linux distro -- rr creates a directory for each distinct recording in this parent directory. rr also maintains a symlink (latest-trace) that points to the latest recording directory, which I rely on most of the time when replaying a recording (it's the default). I am careful to not leave too many recordings around, since they're large enough that that could become a concern. The record/Postgres terminal has output that looks like this: [rr 1786705 1241867]2020-04-04 21:55:05.018 PDT 1786705 DEBUG: CommitTransaction(1) name: unnamed; blockState: STARTED; state: INPROGRESS, xid/subid/cid: 63992/1/2 [rr 1786705 1241898]2020-04-04 21:55:05.019 PDT 1786705 DEBUG: StartTransaction(1) name: unnamed; blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0 [rr 1786705 1241902]2020-04-04 21:55:05.019 PDT 1786705 LOG: statement: CREATE TYPE test_type_empty AS (); [rr 1786705 1241906]2020-04-04 21:55:05.020 PDT 1786705 DEBUG: CommitTransaction(1) name: unnamed; blockState: STARTED; state: INPROGRESS, xid/subid/cid: 63993/1/1 [rr 1786705 1241936]2020-04-04 21:55:05.020 PDT 1786705 DEBUG: StartTransaction(1) name: unnamed; blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0 [rr 1786705 1241940]2020-04-04 21:55:05.020 PDT 1786705 LOG: statement: DROP TYPE test_type_empty; [rr 1786705 1241944]2020-04-04 21:55:05.021 PDT 1786705 DEBUG: drop auto-cascades to composite type test_type_empty [rr 1786705 1241948]2020-04-04 21:55:05.021 PDT 1786705 DEBUG: drop auto-cascades to type test_type_empty[] [rr 1786705 1241952]2020-04-04 21:55:05.021 PDT 1786705 DEBUG: MultiXact: setting OldestMember[2] = 9 [rr 1786705 1241956]2020-04-04 21:55:05.021 PDT 1786705 DEBUG: CommitTransaction(1) name: unnamed; blockState: STARTED; state: INPROGRESS, xid/subid/cid: 63994/1/3 The part of each log line in square brackets comes from rr (since we used -M when recording) -- the first number is a PID, the second an event number. I usually don't care about the PIDs, though, since the event number alone unambiguously identifies a particular "event" in a particular backend (rr recordings are single threaded, even when there are multiple threads or processes). Suppose I want to get to the "CREATE TYPE test_type_empty AS ();" query -- I can get to the end of the query by replaying the recording with this option: $ rr replay -M -g 1241902 Replaying the recording like this takes me to the point where the Postgres backend prints the log message at the end of executing the query I mentioned -- I get a familiar gdb debug server (rr implements a gdb backend). This isn't precisely the point of execution that interests me, but it's close enough. I can easily set a breakpoint to the precise function I'm interested in, and then "reverse-continue" to get there by going backwards. I can also find the point where a particular backend starts by using the fork option instead. So for the PID 1786705, that would look like: $ rr replay -M -f 1786705 (Don't try to use the similar -p option, since that starts a debug server when the pid has been exec'd.) rr really shines when debugging things like tap tests, where there is complex scaffolding that may run multiple Postgres servers. You can run an entire "rr record make check", without having to worry about how that scaffolding works. Once you have useful event numbers to work off of, it doesn't take too long to get an interactive debugging session in the backend of interest by applying the same techniques. Note that saving the output of a recording using standard tools like "tee" seems to have some issues [1]. I've found it helpful to get log output (complete with these event numbers) by doing an "autopilot" replay, like this: $ rr replay -M -a &> rr.log This may actually be required when running "make installcheck" or something, since there might be megabytes of log output. I usually don't need to bother to generate logs in this way, though. It might take a few minutes to do an autopilot replay, since rr will replay everything that was recorded in sub realtime. One last tip: rr pack can be used to save a recording in a fairly stable format -- it copies the needed files into the trace: $ rr pack I haven't used this one yet. It seems like it would be useful if I wanted to save a recording for more than a day or two. Because every single detail of the recording (e.g. pointers, PIDs) is stable, it seems possible to treat a recording as a totally self contained thing. Other resources: https://github.com/mozilla/rr/wiki/Usage https://github.com/mozilla/rr/wiki/Debugging-protips [1] https://github.com/mozilla/rr/issues/91 -- Peter Geoghegan
pgsql-hackers by date: