Re: eXtensible Transaction Manager API (v2) - Mailing list pgsql-hackers
From | Kevin Grittner |
---|---|
Subject | Re: eXtensible Transaction Manager API (v2) |
Date | |
Msg-id | CACjxUsPwv70_OVM_58BDsFM=5GGeV3ooTsVgvzbY_GbHdw4HbA@mail.gmail.com Whole thread Raw |
In response to | Re: eXtensible Transaction Manager API (v2) (Robert Haas <robertmhaas@gmail.com>) |
List | pgsql-hackers |
On Sat, Mar 12, 2016 at 11:21 AM, Robert Haas <robertmhaas@gmail.com> wrote: > I'd also be interested in hearing Kevin Grittner's thoughts about > serializability in a distributed environment, since he's obviously > thought about the topic of serializability quite a bit. I haven't done a thorough search of the academic literature on this, and I wouldn't be comfortable taking a really solid position without that; but in thinking about it it seems like there are at least three distinct problems which may have distinct solutions. *Physical replication* may be best handled by leveraging the "safe snapshot" idea already implemented in READ ONLY DEFERRABLE transactions, and passing through information in the WAL stream to allow the receiver to identify points where a snapshot can be taken which cannot see an anomaly. There should probably be an option to use the last known safe snapshot or wait for a point in the stream where one next appears. This might take as little as a bit or two per WAL commit record. It's not clear what the processing overhead would be -- it wouldn't surprise me if it was "in the noise", nor would it surprise me if it wasn't. We would need some careful benchmarking, and, if performance was an issue, A GUC to control whether the information was passed along (and, thus, whether SERIALIZABLE transactions were allowed on the replica). *Logical replication* (considered for the moment in a unidirectional context) might best be handled by some reordering of application of the commits on the replica into "apparent order of execution" -- which is pretty well defined on the primary based on commit order adjusted by read-write dependencies. Basically, the "simple" implementation would be that WAL is applied normally unless you receive a commit record which is flagged in some way to indicate that it is for a serializable transaction which wrote data and at the time of commit was concurrent with at least one other serializable transaction which had not completed and was not READ ONLY. Such a commit would await information in the WAL stream to tell it when all such concurrent transactions completed, and would indicate when such a transaction had a read-write dependency *in* to the transaction with the suspended commit; commits for any such transactions must be moved ahead of the suspended commit. This would allow logical replication, with all the filtering and such, to avoid ever showing a state on the replica which contained serialization anomalies. *Logical replication with cycles* (where there is some path for cluster A to replicate to cluster B, and some other path for cluster B to replicate the same or related data to cluster A) has a few options. You could opt for "eventual consistency" -- essentially giving up on the I in ACID and managing the anomalies. In practice this seems to lead to some form of S2PL at the application coding level, which is very bad for performance and concurrency, so I tend to think it should not be the only option. Built-in S2PL would probably perform better than having it pasted on at the application layer through some locking API, but for most workloads is still inferior to SSI in both concurrency and performance. Unless a search of the literature turns up some new alternative, I'm inclined to think that if you want to distribute a "logical" database over multiple clusters and still manage race conditions through use of SERIALIZABLE transactions, a distributed SSI implementation may be the best bet. That requires the transaction manager (or something like it) to track non-blocking predicate "locks" (what the implementation calls a SIReadLock) across the whole environment, as well as tracking rw-conflicts (our short name for read-write dependencies) across the whole environment. Since SSI also looks at the MVCC state, handling checks of that without falling victim to race conditions would also need to be handled somehow. If I remember correctly, the patches to add the SSI implementation of SERIALIZABLE transactions were about ten times the size of the patches to remove S2PL and initially replace it with MVCC. I don't have even a gut feel as to how much bigger the distributed form is likely to be. On the one hand the *fundamental logic* is all there and should not need to change; on the other hand the *mechanism* for acquiring the data to be *used* in that logic would be different and potentially complex. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: