Re: incoherent view of serializable transactions - Mailing list pgsql-hackers

From Kevin Grittner
Subject Re: incoherent view of serializable transactions
Date
Msg-id 49637309.EE98.0025.0@wicourts.gov
Whole thread Raw
In response to Re: incoherent view of serializable transactions  (Paul Schlie <schlie@comcast.net>)
List pgsql-hackers
>>> Paul Schlie <schlie@comcast.net> wrote: 
>>  Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:
>> There is a related thread on which I'm attempting to come up with
>> documentation to assist those familiar with true serializable
>> behavior who are attempting to recognize application coding
>> patterns where the differences between that and snapshot isolation
>> are material, with tips on how to handle these differences.  There
>> seems to be some question whether the patterns in which anomalies
>> occur are common enough to merit comment.
>>  
>> If you could reference any concise and accessible work on these
>> anomalies and practical workarounds in application code, it would
>> be much appreciated.
> 
> Personally; although compliance may reduce the execution performance
> of such so designated transactions, it will correspondingly warrant
> correct results, and should be the goal rather than documenting
> non-conformance; as those who wish to embed more direct control over
> transaction evaluation into their specification to enable their
> improved concurrent execution efficiency by utilizing more relaxed
> evaluation semantics, remain free to do without penalty. (Simple
> examples of the risk of non-compliance already seem sufficiently
> identified in your example and first reference cited).
> 
> Merely documenting that transactions designated to be evaluated at
> the isolation level SERIALIZABLE may not yield expected results, as
> currently identified, seems sufficient in the short term; and as/if
> enough interest develops otherwise, so may an effort to warrant
> compliance; I suspect.
> 
> (as that known to most often be fine, can't be relied upon in
> practice)
Thank you for your perspective.  I'm not sure that I totally followed
you, so let me restate to see if it sounds right to you.  You are
suggesting that minimal discussion of the problem, the initial example
I provided, and more discussion of how to ensure correct semantics
would be what is needed?  Filling in more detail if interest is
expressed by users?
If so, the draft of a partial replacement for the partial replacement
of text in "Serializable Isolation versus True Serializability" may be
close to what you're suggesting -- if additional guidance on when to
use what additional locks is provided.  I'll paste below my signature
for comment.  It's a little rough yet, but looking to see if I'm on
the right track.
The first paragraph is a slightly modified form of a suggestion from
Robert Haas in:
http://archives.postgresql.org/pgsql-hackers/2008-12/msg01732.php
-Kevin
PostgreSQL's MVCC framework, snapshot isolation, and limited automatic
row-level locking permit a greater degree of concurrency than some
other databases; however, even when the transaction isolation level is
set to serializable, serialization anomalies can occur in some
situations.  When it is important to prevent these anomalies, explicit
row-level or table-level locking can be used at the expense of reduced
concurrency.
Since PostgreSQL protects a serializable transaction against changes
in the view of the data, and uses locks to prevent modification of
data which is being modified by a concurrent transaction, the
anomalies can only occur when a transaction reads data which is
modified by a concurrent transaction, and uses that as the basis of
database modifications which are read by a concurrent transaction. 
Data consistency checks at the application level have a problem with
this in general, and are addressed in section 13.4.  Some examples of
other types of anomalies follow, with suggestions on how to use
explicit locking to prevent the anomalies where needed.
Consider a system which involves recording receipts, each of which
must go into a daily deposit.  There is a control table with one row
containing the current deposit date for receipts.  Each transaction
which is inserting a receipt selects the deposit date from the control
table within its transaction, and uses it for the receipt's deposit
date.  Somewhere mid-afternoon the control table's date is updated,
all subsequent receipts should fall into the new day, and a report is
run listing the receipts for the day and giving the deposit total.
If all transactions involved were truly serializable, any SELECT of
receipts for a date prior to the deposit date of the control table
would see the complete, final set of receipts.  Under the PostgreSQL
implementation, unless explicit locking is used, although data
eventually gets to that state there can be a window of time during
which a SELECT can return an incomplete list of receipts for a date
which appears to be closed, even if all transactions for modifying and
viewing data are SERIALIZABLE.  This window of time runs from the
commit of the transaction which updated the control table until the
commit of any pending transactions which are inserting receipts and
which obtained a snapshot before the update of the control table.
To prevent this anomaly, a lock can be taken out on the receipt table
to block all modification to that table.  This should be done in the
transaction which will update the control table, and should be
acquired before the transaction selects or modifies any data.



pgsql-hackers by date:

Previous
From: "Jaime Casanova"
Date:
Subject: Re: parallel restore
Next
From: "Jaime Casanova"
Date:
Subject: Re: parallel restore