Re: Potential G2-item cycles under serializable isolation - Mailing list pgsql-bugs

From Kyle Kingsbury
Subject Re: Potential G2-item cycles under serializable isolation
Date
Msg-id 11bb3199-c685-1cec-63bb-f92848edbe10@jepsen.io
Whole thread Raw
In response to Re: Potential G2-item cycles under serializable isolation  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: Potential G2-item cycles under serializable isolation
List pgsql-bugs
On 5/31/20 11:04 PM, Peter Geoghegan wrote:
> We generally like to produce tests for SSI, ON CONFLICT DO UPDATE, and
> anything else involving concurrent behavior using something called isolation 
> tester: https://github.com/postgres/postgres/tree/master/src/test/isolation We 
> may end up writing an isolation test for the issue you reported as part of an 
> eventual fix. You might find it helpful to review some of the existing tests.

Ah, wonderful! I don't exactly know how to plug Elle's history analysis into 
this, but I think it... should be possible to write down some special cases 
based on the histories I've seen.

> Could you test Postgres 9.5? It would be nice to determine if this is
> a new issue, or a regression.

I'll look into that tomorrow morning! :)

I, uh, backed off to running these tests at read committed (which, uh... should 
be SI, right?) and found what appear to be scads of SI violations, including 
read skew and even *internal* consistency anomalies. Read-only transactions 
can... apparently... see changing values of a record? Here's a single 
transaction which read key 21, got [1], then read key 21 again, and saw [1 2 3]:

   [[:r 21 [1]] [:r 20 [1 2]] [:r 20 [1 2]] [:r 21 [1 2 3]]]

See 
http://jepsen.io.s3.amazonaws.com/analyses/postgresql-12.3/20200531T223558.000-0400.zip 
-- jepsen.log from 22:36:09,907 to 22:36:09,909:

   2020-05-31 22:36:09,907{GMT}    INFO    [jepsen worker 6] jepsen.util: 6
   :invoke :txn    [[:r 21 nil] [:r 20 nil] [:r 20 nil] [:r 21 nil]]

   ...

   2020-05-31 22:36:09,909{GMT}    INFO    [jepsen worker 6] jepsen.util: 6
   :ok     :txn    [[:r 21 [1]] [:r 20 [1 2]] [:r 20 [1 2]] [:r 21 [1 2 3]]]

You can fire up wireshark and point it at the pcap file in n1/ to 
double-check--try `tcp.stream eq 4`. The BEGIN statement for this transaction is 
at 22:36:09.908115. There are a bunch more anomalies called out in analysis.edn, 
if it's helpful.

This looks so weird that I assume I've *got* to be doing it wrong, but trawling 
through the source code and pcap trace, I can't see where the mistake is. Maybe 
I'll have fresher eyes in the morning. :)

Sincerely,

--Kyle




pgsql-bugs by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Potential G2-item cycles under serializable isolation
Next
From: Peter Geoghegan
Date:
Subject: Re: Potential G2-item cycles under serializable isolation