Serializable Snapshot Isolation - Mailing list pgsql-hackers
From | Kevin Grittner |
---|---|
Subject | Serializable Snapshot Isolation |
Date | |
Msg-id | 4C8F5DB202000025000356A0@gw.wicourts.gov Whole thread Raw |
Responses |
Re: Serializable Snapshot Isolation
|
List | pgsql-hackers |
Attached is the latest Serializable Snapshot Isolation (SSI) patch. With Joe's testing and review, and with stress tests adapted from those used by Florian for his patch, we were able to identify and fix several bugs. Stability seems good now. We have many tests for correct behavior which are all looking good. The only solid benchmarks we have so far show no impact on isolation levels other than SERIALIZABLE, and a 1.8% increase in run time for a saturation run of small, read only SERIALIZABLE transactions against a fully cached database. Dan has been working on setting up some benchmarks using DBT-2, but doesn't yet have results to publish. If we can get more eyes on the code during this CF, I'm hoping we can get this patch committed this round. This patch is basically an implementation of the techniques described in the 2008 paper by Cahill et al, and which was further developed in Cahill's 2009 PhD thesis. Techniques needed to be adapted somewhat because of differences between PostgreSQL and the two databases used for prototype implementations for those papers (Oracle Berkeley DB and InnoDB), and there are a few original ideas from Dan and myself used to optimize the implementation. One reason for hoping that this patch gets committed in this CF is that it will leave time to try out some other, more speculative optimizations before release. Documentation is not included in this patch; I plan on submitting that to a later CF as a separate patch. Changes should be almost entirely within the Concurrency Control chapter. The current patch has one new GUC which (if kept) will need to be documented, and one of the potential optimizations could involve adding a new transaction property which would then need documentation. The premise of the patch is simple: that snapshot isolation comes so close to supporting fully serializable transactions that S2PL is not necessary -- the database engine can watch for rw-dependencies among transactions, without introducing any blocking, and roll back transactions as required to prevent serialization anomalies. This eliminates the need for using the SELECT FOR SHARE or SELECT FOR UPDATE clauses, the need for explicit locking, and the need for additional updates to introduce conflict points. While block-level locking is included in this patch for btree and GiST indexes, an index relation lock is still used for predicate locks when a search is made through a GIN or hash index. These additional index types can be implemented separately. Dan is looking at bringing btree indexes to finer granularity, but wants to have good benchmarks first, to confirm that the net impact is a gain in performance. Most of the work is in the new predicate.h and predicate.c files, which total 2,599 lines, over 39% of which are comment lines. There are 1626 lines in the new pg_dtester.py.in files, which uses Markus Wanner's dtester software to implement a large number of correctness tests. We added 79 lines to lockfuncs.c to include the new SIReadLock entries in the pg_locks view. The rest of the patch affects 286 lines (counting an updated line twice) across 25 existing PostgreSQL source files to implement the actual feature. The code organization and naming issues mentioned here remain: http://archives.postgresql.org/pgsql-hackers/2010-07/msg00383.php -Kevin
Attachment
pgsql-hackers by date: