Re: Potential G2-item cycles under serializable isolation - Mailing list pgsql-bugs
From | Peter Geoghegan |
---|---|
Subject | Re: Potential G2-item cycles under serializable isolation |
Date | |
Msg-id | CAH2-WznZO2kgJr+jR5mSp3bsSZKqbevztdPq-eZLcgGML_vBMw@mail.gmail.com Whole thread Raw |
In response to | Re: Potential G2-item cycles under serializable isolation (Kyle Kingsbury <aphyr@jepsen.io>) |
Responses |
Re: Potential G2-item cycles under serializable isolation
|
List | pgsql-bugs |
On Thu, Jun 4, 2020 at 1:35 PM Kyle Kingsbury <aphyr@jepsen.io> wrote: > >> How can I determine what SQL each transaction generates from these values? It's > >> not obvious to me which of the three tables (which of txn0, txn1, and txn2) are affected in each case. > > This is a good and obvious question which I don't yet have a good answer for. > Reading the source gives you *some* idea of what SQL's being generated, but > there's some stuff being done by next.jdbc and JDBC itself, so I don't know how > to show you *exactly* what goes over the wire. A terrible way to do this is to > look at the pcap traces in wireshark--you can correlate from the timestamps in > jepsen.log, or search for the transactions which interacted with specific keys. I'd appreciate it if you could provide this information, so I can be confident I didn't get something wrong. I don't really understand how Elle detects this G2-Item anomaly, nor how it works in general. PostgreSQL doesn't really use 2PL, even to a limited degree (unlike Oracle), so a lot of the definitions from the "Generalized Isolation Level Definitions"/Adya paper are not particularly intuitive to me. That said, I find it easy to understand why the "G2-item: Item Anti-dependency Cycles" example from the paper exhibits behavior that would be wrong for Postgres -- even in repeatable read mode. If Postgres exhibits this anomaly (in repeatable read more or serializable mode), that would be a case of a transaction reading data that isn't visible to its original transaction snapshot. The paper supposes that this could happen when another transaction (the one that updated the sum-of-salaries from the example) committed. Just being able to see the SQL executed by each of the two transactions would be compelling evidence of a serious problem, provided the details are equivalent to the sum-of-salaries example from the paper. > One option would be to add some sort of tracing thing to the test so that it > records the SQL statements it generates as extra metadata on operations. I can > look into doing that for you later on. :) If each Jepsen worker has its own connection for the duration of the test (which I guess must happen already), and each connection specified an informative and unique "application_name", it would be possible to see Jepsen's string from the Postgres logs, next to the SQL text. With prepared statements, you'd see the constants used, though not in all log messages (iirc they don't appear in error messages, but do appear in regular log messages). See: https://www.postgresql.org/docs/current/runtime-config-logging.html#GUC-LOG-LINE-PREFIX If you can't figure out how to get JDBC to accept the application_name you want to provide, then you execute a "set application_name = 'my application name';" SQL statement within each Jepsen worker instead. Do this at the start, I suppose. Whichever approach is easiest and makes sense. You might have something like this in postgresql.conf to see the "application_name" string next to each statement from the log (This will increase the log volume considerably, which should still be manageable): log_line_prefix='%p %a %l' log_statement=all (I also suggest further customizing log_line_prefix in whatever way seems most useful to you.) -- Peter Geoghegan
pgsql-bugs by date: