Thread: SSI and 2PC
In going back through old emails to see what issues might have been raised but not yet addressed for the SSI patch, I found the subject issue described in a review by Jeff Davis here: http://archives.postgresql.org/pgsql-hackers/2010-10/msg01159.php I think this is already handled based on feedback from Heikki: http://archives.postgresql.org/pgsql-hackers/2010-09/msg00789.php Basically, the PREPARE TRANSACTION statement plays the same role for the prepared transaction that COMMIT does for other transactions -- final conflict checking is done and the transaction becomes immune to later serialization_failure rollback. A transaction which starts after PREPARE TRANSACTION executes is not considered to overlap with the prepared transaction. In Jeff's example, if the Session2 runs a query before Session1 executes PREPARE TRANSACTION, one of them will fail. (I tested to make sure.) Does that sound sane, or is something else needed here? -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote: > In going back through old emails to see what issues might have > been raised but not yet addressed for the SSI patch, I found the > subject issue described in a review by Jeff Davis here: > > http://archives.postgresql.org/pgsql-hackers/2010-10/msg01159.php After reviewing the docs and testing things, I'm convinced that more work is needed. Because the transaction's writes aren't visible until COMMIT PREPARED is run, and write-write conflicts are still causing serialization failures after PREPARE TRANSACTION, some of the work being done for SSI on PREPARE TRANSACTION needs to be moved to COMMIT PREPARED. It seems likely that shops who use prepared transactions are more likely than most to care about truly serializable transactions, so I don't think I should write this off as a limitation for the 9.1 implementation. Unless someone sees some dire problem with the patch which I've missed, this seems like my top priority to fix before cutting a patch. -Kevin
On Mon, Jan 10, 2011 at 08:49:12AM -0600, Kevin Grittner wrote: > "Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote: > > > In going back through old emails to see what issues might have > > been raised but not yet addressed for the SSI patch, I found the > > subject issue described in a review by Jeff Davis here: > > > > http://archives.postgresql.org/pgsql-hackers/2010-10/msg01159.php > > After reviewing the docs and testing things, I'm convinced that more > work is needed. Because the transaction's writes aren't visible > until COMMIT PREPARED is run, and write-write conflicts are still > causing serialization failures after PREPARE TRANSACTION, some of > the work being done for SSI on PREPARE TRANSACTION needs to be moved > to COMMIT PREPARED. > > It seems likely that shops who use prepared transactions are more > likely than most to care about truly serializable transactions, so I > don't think I should write this off as a limitation for the 9.1 > implementation. Unless someone sees some dire problem with the > patch which I've missed, this seems like my top priority to fix > before cutting a patch. Could people fix it after the patch? ISTM that a great way to test it is to make very sure it's available ASAP to a wide range of people via the next alpha (or beta, if that's where we're going next). Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
David Fetter <david@fetter.org> wrote: > Could people fix it after the patch? ISTM that a great way to > test it is to make very sure it's available ASAP to a wide range > of people via the next alpha (or beta, if that's where we're going > next). People can always pull from the git repo: git://git.postgresql.org/git/users/kgrittn/postgres.git Also, I can post a patch against HEAD at any time. Should I post one now, and then again after this is solved? Full disclosure requires that I mention that while Dan has completed code to fix the page split/combine issues Heikki raised, I don't think he's done testing it. (It's hard to test because you don't hit the problem unless you have a page split or combine right at the point where the hash table for predicate lock becomes full.) So, anyway, there could possibly be some wet paint there. -Kevin
On Mon, Jan 10, 2011 at 08:59:45AM -0600, Kevin Grittner wrote: > David Fetter <david@fetter.org> wrote: > > Could people fix it after the patch? ISTM that a great way to > > test it is to make very sure it's available ASAP to a wide range > > of people via the next alpha (or beta, if that's where we're going > > next). > > People can always pull from the git repo: > > git://git.postgresql.org/git/users/kgrittn/postgres.git > > Also, I can post a patch against HEAD at any time. Should I post > one now, and then again after this is solved? > > Full disclosure requires that I mention that while Dan has completed > code to fix the page split/combine issues Heikki raised, I don't > think he's done testing it. (It's hard to test because you don't > hit the problem unless you have a page split or combine right at the > point where the hash table for predicate lock becomes full.) So, > anyway, there could possibly be some wet paint there. Short of a test suite that can inject faults at the exact kinds of places where this occurs and a way to enumerate all those faults, there's only so much testing that's possible to do /in vitro/. Oh, and such enumerations tend to be combinatorial explosions anyhow. :P At some point, and that point is rapidly approaching if it's not already here, you've done what you can to shake out bugs and infelicities, and the next steps are up to people testing alphas, betas, and to be completely frank, 9.1.0 and possibly later versions. This is way, way too big a feature to expect you can get a perfect handle on it by theory alone. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote: > "Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote: > >> In going back through old emails to see what issues might have >> been raised but not yet addressed for the SSI patch, I found the >> subject issue described in a review by Jeff Davis here: >> >> http://archives.postgresql.org/pgsql-hackers/2010-10/msg01159.php > > After reviewing the docs and testing things, I'm convinced that > more work is needed. Because the transaction's writes aren't > visible until COMMIT PREPARED is run, and write-write conflicts > are still causing serialization failures after PREPARE > TRANSACTION, some of the work being done for SSI on PREPARE > TRANSACTION needs to be moved to COMMIT PREPARED. I'm now also convinced that Jeff is right in his assessment that when a transaction is prepared, information about predicate locks and conflicts with other prepared transactions must be persisted somewhere. (Jeff referred to a "2PC state file".) I'm trying not to panic here, but I haven't looked at 2PC before yesterday and am just dipping into the code to support it, and time is short. Can anyone give me a pointer to anything I should read before I dig through the 2PC code, which might accelerate this? -Kevin
On Mon, 2011-01-10 at 11:50 -0600, Kevin Grittner wrote: > I'm trying not to panic here, but I haven't looked at 2PC before > yesterday and am just dipping into the code to support it, and time > is short. Can anyone give me a pointer to anything I should read > before I dig through the 2PC code, which might accelerate this? I don't see much about 2PC outside of twophase.c. Regarding the original post, I agree that we should have two phase-commit support for SSI. We opted not to support it for notifications, but there was a fairly reasonable argument why users wouldn't value the combination of 2PC and NOTIFY. I don't expect this to be a huge roadblock for the feature though. It seems fairly contained. I haven't read the 2PC code either, but I don't expect that you'll need to change the rest of your algorithm just to support it. Regards,Jeff Davis
On Jan10, 2011, at 18:50 , Kevin Grittner wrote: > I'm trying not to panic here, but I haven't looked at 2PC before > yesterday and am just dipping into the code to support it, and time > is short. Can anyone give me a pointer to anything I should read > before I dig through the 2PC code, which might accelerate this? It roughly works as follows Upon PREPARE, the locks previously held by the transaction are transferred to a kind of virtual backend which only consists of a special proc array entry. The transaction will thus still appear to be running, and will still be holding its locks, even after the original backend is gone. The information necessary to reconstruct that proc array entry is also written to the 2PC state, and used to recreate the "virtual backend" after a restart or crash. There are also some additional pieces of transaction state which are stored in the 2PC state file like the full list of subtransaction xids (The proc array entry may not contain all of them if it overflowed). Upon COMMIT PREPARED, the information in the 2PC state file is used to write a COMMIT wal record and to update the clog. The transaction is then committed, and the special proc array entry is removed and all lockmgr locks it held are released. For 2PC to work for SSI transaction, I guess you must check for conflicts during PREPARE - at any later point the COMMIT may only fail transiently, not permanently. Any transaction that adds a conflict with an already prepared transaction must check if that conflict completes a dangerous structure, and abort if this is the case, since the already PREPAREd transaction can no longer be aborted. COMMIT PREPARED then probably doesn't need to do anything special for SSI transactions, apart from some cleanup actions maybe. Unfortunately, it seems that doing things this way will undermine the guarantee that retrying a failed SSI transaction won't fail due to the same conflict as it did originally. Consider T1> BEGIN TRANSACTION ISOLATION SERIALIZABLE T1> SELECT * FROM T T1> UPDATE T ... T1> PREPARE TRANSACTION T2> BEGIN TRANSACTION ISOLATION SERIALIZABLE T2> SELECT * FROM T T2> UPDATE T ... -> Serialization Error Retrying T2 won't help as long as T1 isn't COMMITTED. There doesn't seems a way around that, however - any correct implementation of 2PC for SSI will have to behave that way I fear :-( Hope this helps & best regards, Florian Pflug
Florian Pflug <fgp@phlo.org> wrote: > On Jan10, 2011, at 18:50 , Kevin Grittner wrote: >> I'm trying not to panic here, but I haven't looked at 2PC before >> yesterday and am just dipping into the code to support it, and >> time is short. Can anyone give me a pointer to anything I should >> read before I dig through the 2PC code, which might accelerate >> this? > > > It roughly works as follows > > Upon PREPARE, the locks previously held by the transaction are > transferred to a kind of virtual backend which only consists of a > special proc array entry. The transaction will thus still appear > to be running, and will still be holding its locks, even after the > original backend is gone. The information necessary to reconstruct > that proc array entry is also written to the 2PC state, and used > to recreate the "virtual backend" after a restart or crash. > > There are also some additional pieces of transaction state which > are stored in the 2PC state file like the full list of > subtransaction xids (The proc array entry may not contain all of > them if it overflowed). > > Upon COMMIT PREPARED, the information in the 2PC state file is > used to write a COMMIT wal record and to update the clog. The > transaction is then committed, and the special proc array entry is > removed and all lockmgr locks it held are released. > > For 2PC to work for SSI transaction, I guess you must check for > conflicts during PREPARE - at any later point the COMMIT may only > fail transiently, not permanently. Any transaction that adds a > conflict with an already prepared transaction must check if that > conflict completes a dangerous structure, and abort if this is the > case, since the already PREPAREd transaction can no longer be > aborted. COMMIT PREPARED then probably doesn't need to do anything > special for SSI transactions, apart from some cleanup actions > maybe. Thanks; that all makes sense. The devil, as they say, is in the details. As far as I've worked it out, the PREPARE must persist both the predicate locks and any conflict pointers which are to other prepared transactions. That leaves some fussy work around the coming and going of prepared transactions, because on recovery you need to be prepared to ignore conflict pointers with prepared transactions which committed or rolled back. What I haven't found yet is the right place and means to persist and recover this stuff, but that's just a matter of digging through enough source code. Any tips regarding that may save time. I'm also not clear on what, if anything, needs to be written to WAL. I'm really fuzzy on that, still. > Unfortunately, it seems that doing things this way will undermine > the guarantee that retrying a failed SSI transaction won't fail > due to the same conflict as it did originally. I hadn't thought of that, but you're right. Of course, I can't enforce that guarantee, anyway, without some other patch first being there to allow me to cancel other transactions with serialization_failure, even if they are "idle in transaction". > There doesn't seems a way around that, however - any correct > implementation of 2PC for SSI will have to behave that way I fear > :-( I think you're right. > Hope this helps & best regards, It does. Even the parts which just confirm my tentative conclusions save me time in not feeling like I need to cross-check so much. I can move forward with more confidence. Thanks. -Kevin
Jeff Davis <pgsql@j-davis.com> wrote: > I don't expect this to be a huge roadblock for the feature though. > It seems fairly contained. I haven't read the 2PC code either, but > I don't expect that you'll need to change the rest of your > algorithm just to support it. Agreed; but I am starting to get concerned about whether this particular area can be completed by the start of the CF. I might run a few days over on 2PC support. Unless ... Dan? Could you look into this while I chase down the issue Anssi raised? -Kevin
On 11.01.2011 20:08, Florian Pflug wrote: > Unfortunately, it seems that doing things this way will undermine the guarantee > that retrying a failed SSI transaction won't fail due to the same conflict as > it did originally. Consider > > T1> BEGIN TRANSACTION ISOLATION SERIALIZABLE > T1> SELECT * FROM T > T1> UPDATE T ... > T1> PREPARE TRANSACTION > > T2> BEGIN TRANSACTION ISOLATION SERIALIZABLE > T2> SELECT * FROM T > T2> UPDATE T ... > -> Serialization Error > > Retrying T2 won't help as long as T1 isn't COMMITTED. T2 should block until T1 commits. I would be very surprised if it doesn't behave like that already. In general, a prepared transaction should be treated like an in-progress transaction - it might still abort too. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > On 11.01.2011 20:08, Florian Pflug wrote: >> Unfortunately, it seems that doing things this way will undermine >> the guarantee that retrying a failed SSI transaction won't fail >> due to the same conflict as it did originally. Consider >> >> T1> BEGIN TRANSACTION ISOLATION SERIALIZABLE >> T1> SELECT * FROM T >> T1> UPDATE T ... >> T1> PREPARE TRANSACTION >> >> T2> BEGIN TRANSACTION ISOLATION SERIALIZABLE >> T2> SELECT * FROM T >> T2> UPDATE T ... >> -> Serialization Error >> >> Retrying T2 won't help as long as T1 isn't COMMITTED. > > T2 should block until T1 commits. I would be very surprised if it > doesn't behave like that already. In general, a prepared > transaction should be treated like an in-progress transaction - it > might still abort too. It shouldn't block if the updates were to different rows, which is what I took Florian to mean; otherwise this would be handled by SI and would have nothing to do with the SSI patch. SSI doesn't introduce any new blocking (with the one exception of the READ ONLY DEFERRABLE style we invented to support long-running reports and backups, and all blocking there is at the front -- once it's running, it's going full speed ahead). -Kevin
On Jan11, 2011, at 19:41 , Heikki Linnakangas wrote: > On 11.01.2011 20:08, Florian Pflug wrote: >> Unfortunately, it seems that doing things this way will undermine the guarantee >> that retrying a failed SSI transaction won't fail due to the same conflict as >> it did originally. Consider >> >> T1> BEGIN TRANSACTION ISOLATION SERIALIZABLE >> T1> SELECT * FROM T >> T1> UPDATE T ... >> T1> PREPARE TRANSACTION >> >> T2> BEGIN TRANSACTION ISOLATION SERIALIZABLE >> T2> SELECT * FROM T >> T2> UPDATE T ... >> -> Serialization Error >> >> Retrying T2 won't help as long as T1 isn't COMMITTED. > > T2 should block until T1 commits. The serialization error will occur even if T1 and T2 update *different* rows. This is due to the SELECTs in the interleaved schedule above returning the state of T prior to both T1 and T2. Which of course never the case for a serial schedule. best regards, Florian Pflug
On Tue, Jan 11, 2011 at 12:34:44PM -0600, Kevin Grittner wrote: > Agreed; but I am starting to get concerned about whether this > particular area can be completed by the start of the CF. I might > run a few days over on 2PC support. Unless ... Dan? Could you look > into this while I chase down the issue Anssi raised? I'll take a look at it, but be forewarned that I currently know extremely little about 2PC in Postgres... Dan -- Dan R. K. Ports MIT CSAIL http://drkp.net/