Re: BUG #11732: Non-serializable outcomes under serializable isolation - Mailing list pgsql-bugs
From | Kevin Grittner |
---|---|
Subject | Re: BUG #11732: Non-serializable outcomes under serializable isolation |
Date | |
Msg-id | 2081625385.1952368.1420067509048.JavaMail.yahoo@jws100150.mail.ne1.yahoo.com Whole thread Raw |
In response to | Re: BUG #11732: Non-serializable outcomes under serializable isolation (Alvaro Herrera <alvherre@2ndquadrant.com>) |
Responses |
Re: BUG #11732: Non-serializable outcomes under serializable
isolation
|
List | pgsql-bugs |
Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Kevin Grittner wrote: >> Everything suggests a race condition. I haven't been able to >> pin down exactly where it happens, but that's just a matter of >> time. It is interesting that the data type of the column >> changes the timing enough to have such a large effect on seeing >> the failure. I've set it aside for the moment but expect to get >> back to it later this week. > > Any luck here? Yes and no. I have further evidence, and have narrowed the area of investigation, but I haven't been able to pin down the cause yet. I'll post what I have "for the record" and in case someone has an idea that will help me find it. I was able to create a bash script and simple plpgsql function to demonstrate the problem. They're short enough to include in the email, but I will attach, too, in case the email formatting mangles it. Bash file "stress": #!/bin/bash pg_ctl -D Debug/data -m fast -w stop rm -fr Debug/data/* initdb Debug/data echo "default_transaction_isolation = 'serializable'" >> Debug/data/postgresql.conf echo "shared_buffers = 1GB" >> Debug/data/postgresql.conf echo "max_connections = 200" >> Debug/data/postgresql.conf pg_ctl -D Debug/data -l Debug/data/logfile -w start || ( cat Debug/data/logfile && exit 1 ) createdb test psql -f ~/stress.sql test psql -c "CREATE TABLE \"stress\" (\"id\" serial primary key, \"key\" text);" test for i in {1..1000} do echo $i for j in {1..128} do psql -c "select stress_fn('$i');" test >/dev/null 2>&1 & done wait sleep 0.3s done psql -c "select count(*) from stress;" test psql -c "select key, count(*) from stress group by key having count(*) > 1 order by key::int;" test "stress.sql" file: CREATE FUNCTION stress_fn(target_key text) RETURNS VOID LANGUAGE plpgsql VOLATILE AS $$ BEGIN PERFORM 1 AS one FROM "stress" WHERE "stress"."key" = target_key; IF FOUND THEN RAISE EXCEPTION 'already inserted'; END IF; INSERT INTO "stress" ("key") VALUES (target_key); END; $$; So far this has generated 1 to 3 duplicated values per run. I have changed the timing to 10.0 seconds and still had an occurrence. Changing the "key" column and the corresponding parameter to integer causes it to run without generating any duplicates (so far, at least). For each non-duplicated "key" value I get about 10 instances of this: ERROR: could not serialize access due to read/write dependencies among transactions DETAIL: Reason code: Canceled on identification as a pivot, during write. HINT: The transaction might succeed if retried. CONTEXT: SQL statement "INSERT INTO "stress" ("key") VALUES (target_key)" PL/pgSQL function stress_fn(text) line 8 at SQL statement STATEMENT: select stress_fn('1'); ... followed by enough of these to add up to 127 errors per value: ERROR: already inserted STATEMENT: select stress_fn('1'); For the "key" values that generate one or more duplicates I get different logging. I see about 30 of these: ERROR: could not serialize access due to read/write dependencies among transactions DETAIL: Reason code: Canceled on conflict out to pivot 9137, during read. HINT: The transaction might succeed if retried. CONTEXT: SQL statement "SELECT 1 AS one FROM "stress" WHERE "stress"."key" = target_key" PL/pgSQL function stress_fn(text) line 3 at PERFORM STATEMENT: select stress_fn('339'); ... before it switches to: ERROR: already inserted STATEMENT: select stress_fn('339'); We used a different reason code in the detail for each spot that could generate a serialization failure based on what serializable transactions add to the checking done for repeatable read transactions. This particular reason code can only be generated if a "dangerous structure" of read-write dependencies is found where the "pivot" has committed -- a fairly rare occurrence. So far in this test, I have not seen that reason code without the duplicates, nor duplicates without that reason code. For those not intimately familiar with our serializable implementation but who want to follow along to some degree or other, the only way that serialization anomalies can occur under snapshot isolation (what we have for repeatable read) is when a "pivot" transaction (Tpivot) has a read-write dependency out to a transaction (Tout) and a read-write dependency in from a transaction (Tin) (in other words, Tout writes something that would have been read by Tpivot if Tout had been committed before Tpivot began, and Tpivot writes something that would have been read by Tin if Tpivot had committed before Tin began) and Tout commits before Tin or Tpivot commit. As a further test to avoid false positives, an anomaly cannot occur if Tin is read-only (or if it commits without modifying data) and Tin started before Tout committed. Just for completeness, Tin and Tout can be the same transaction. So, to hit this code, Tin must be reading data written by Tpivot, Tpivot must have committed (or at least be well into the commit process), and Tout must have committed before Tpivot committed. Tin is executing a SELECT and checks MVCC visibility information of a row and finds that it was written by a transaction which is committed (or at least past the point where it can throw a serialization failure), *and* Tpivot had either read data that had been written by Tout or Tout had written data that predicate locks should show had already been read by Tpivot. Either way, what Tin is seeing should not be possible, because what I just described should have caused a dangerous structure for which should have caused either Tpivot to be canceled. Following from that it appears that the different reason code Tin is seeing is probably a symptom of a failure which already occurred -- otherwise how could both of the other transactions have committed? Since there is no index on the "key" column, we must do a heap scan, and that means a predicate lock on the heap relation. Somehow, two connections get that predicate lock, scan the relation and find that the key value doesn't yet exist, and insert without seeing the other transaction's predicate lock. But I'm not seeing how that could happen. Another possibility is that this is somehow happening as the result of some bug in the "commit sequence number" used by the serializable implementation. In the absence of a real CSN, and to avoid additional contention on the PROCARRAY locking, we created an approximation of a CSN which plays some tricks which assume atomic memory access where perhaps that isn't safe or needs a memory barrier. If neither Tpivot nor Tout sees the other as having committed first, we could see exactly this behavior. Those aren't the only places this race condition bug could be hiding, but they seem like the two most obvious candidates. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
pgsql-bugs by date: