Re: Potential G2-item cycles under serializable isolation - Mailing list pgsql-bugs
From | Peter Geoghegan |
---|---|
Subject | Re: Potential G2-item cycles under serializable isolation |
Date | |
Msg-id | CAH2-WznTb6-0fjW4WPzNQh4mFvBH86J7bqZpNqteVUzo8p=6Hg@mail.gmail.com Whole thread Raw |
In response to | Re: Potential G2-item cycles under serializable isolation (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: Potential G2-item cycles under serializable isolation
|
List | pgsql-bugs |
On Fri, Jun 5, 2020 at 12:43 PM Peter Geoghegan <pg@bowt.ie> wrote: > I suspect that I should try a high fidelity way of capturing the > original execution order. I'm thinking of something like > test_decoding I had a better idea -- use rr debugging recorder to get a recording of an entire execution: https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD#Recording_Postgres_using_rr_Record_and_Replay_Framework Attached results.edn + G2-item.txt files show the anomalies that were reported during an execution I recorded a little earlier. We can now use this as a point of reference for further debugging. One less obvious benefit of doing this is that rr makes the execution of the entire Postgres instance single threaded, so we can be sure that the log faithfully represents the order of operations during execution, within and across PostgreSQL backends -- no more uncertainty about the exact order, when each xact starts and ends, etc. (Note that I had to use rr's "chaos mode" feature to reproduce the race condition problem in the first place -- it was easy once I remembered about that option.) This useful property is pretty much self-evident from the logs, since rr includes an event number. For example, the log starts like this: [rr 316370 892] 1591388133.699-316370--0--5edaa7e5.4d3d2- LOG: starting PostgreSQL 13beta1-dc on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.3.0-10ubuntu2) 9.3.0, 64-bit [rr 316370 1056] 1591388133.706-316370--0--5edaa7e5.4d3d2- LOG: listening on IPv6 address "::1", port 5432 [rr 316370 1068] 1591388133.706-316370--0--5edaa7e5.4d3d2- LOG: listening on IPv4 address "127.0.0.1", port 5432 [rr 316370 1093] 1591388133.707-316370--0--5edaa7e5.4d3d2- LOG: listening on Unix socket "/tmp/.s.PGSQL.5432" [rr 316371 1303] 1591388133.716-316371--0--5edaa7e5.4d3d3- LOG: database system was shut down at 2020-06-05 13:15:18 PDT [rr 316370 1726] 1591388133.735-316370--0--5edaa7e5.4d3d2- LOG: database system is ready to accept connections [rr 316448 4190] 1591388153.592-316448--0--5edaa7f9.4d420-[unknown] LOG: connection received: host=127.0.0.1 port=41078 The second number in each log line represents an rr event number -- notice that these are strictly monotonically increasing. I don't even need a PID to be able to start an interactive debugging session during execution of a query of interest, though -- all I need is an event number (such as 4190) to reliably get an interactive debugging session. Since the entire recorded execution is actually single threaded, and because of Postgres' multi-process architecture, that information alone unambiguously identifies a particular backend/connection and query execution. I also make available the entire log that I showed the start of just now: https://drive.google.com/file/d/1YFTVChfEtzZncdjakXKpp22HKDNFoWKo/view?usp=sharing This hasn't been filtered by me this time around, so anyone playing along at home can look through everything, including the queries pertaining to the two G2-Item issues Jepsen complains about. Hopefully this helps you, Kyle. Would you mind repeating the exercise from earlier with this new log file/execution? Thanks -- Peter Geoghegan
Attachment
pgsql-bugs by date: