Thread: Prepared transaction releasing locks before deregistering its GID
Hello, We have an app that copies some metrics between predefined tables on the source and destination clusters, truncating the table at the source afterward. The app locks both the source and the destination table at the beginning and then, once copy concludes, prepares a transaction on each of the affected hosts and commits it at last. The naming schema for the prepared transaction involves the source and the destination table; while it doesn’t guarantee the uniqueness if multiple copy processes run on the same host, the possibility of that happening should be prevented by the locks held on tables involved. Or so we thought. From time to time, our cron jobs reported a transaction identifier already in use. After digging into this, my colleague Ildar Musin has produced a simple pgbench script to reproduce the issue: ---------------------------------------------------------------- $ cat custom.sql begin; lock abc in exclusive mode; insert into abc values ((random() * 1000)::int); prepare transaction 'test transaction'; commit prepared 'test transaction’; ---------------------------------------------------------------- pgbench launched with: pgbench -T 20 -c 10 -f custom.sql postgres (after creating the table abc) produces a number of complaints: ERROR: transaction identifier "test transaction" is already in use. I could easily reproduce that on Linux, but not Mac OS, with both Postgres 10.7 and 11.2. That looks like a race condition to me. What happens is that another transaction with the name identical to the running one can start and proceed to the prepare phase while the original one commits, failing at last instead of waiting for the original one to finish. By looking at the source code of FinishPreparedTransaction() I can see the RemoveGXact() call, which removes the prepared transaction from TwoPhaseState->prepXacts. It is placed at the very end of the function, after the post-commit callbacks that clear out the locks held by the transaction. Those callbacks are not guarded by the TwoPhaseStateLock, resulting in a possibility for a concurrent session to proceed will MarkAsPreparing after acquiring the locks released by them. I couldn’t find any documentation on the expected outcome in cases like this, so I assume it might not be a bug, but an undocumented behavior. Should I go about and produce a patch to put a note in the description of commit prepared, or is there any interest in changing the behavior to avoid such conflicts altogether (perhaps by holding the lock throughout the cleanup phase)? Regards, Oleksii Kliukin
On Mon, Feb 18, 2019 at 05:05:13PM +0100, Oleksii Kliukin wrote: > That looks like a race condition to me. What happens is that another > transaction with the name identical to the running one can start and proceed > to the prepare phase while the original one commits, failing at last instead > of waiting for the original one to finish. It took me 50 clients and a bit more than 20 seconds, but I have been able to reproduce the problem with one error. Thanks for the reproducer. This is indeed a race condition with 2PC. > By looking at the source code of FinishPreparedTransaction() I can see the > RemoveGXact() call, which removes the prepared transaction from > TwoPhaseState->prepXacts. It is placed at the very end of the function, > after the post-commit callbacks that clear out the locks held by the > transaction. Those callbacks are not guarded by the TwoPhaseStateLock, > resulting in a possibility for a concurrent session to proceed will > MarkAsPreparing after acquiring the locks released by them. Hm, I see. Taking a breakpoint just after ProcessRecords() or putting a sleep there makes the issue plain. The same issue can happen with predicate locks. > I couldn’t find any documentation on the expected outcome in cases like > this, so I assume it might not be a bug, but an undocumented behavior. If you run two transactions in parallel using your script, the second transaction would wait at LOCK time until the first transaction releases its locks with the COMMIT PREPARED. > Should I go about and produce a patch to put a note in the description of > commit prepared, or is there any interest in changing the behavior to avoid > such conflicts altogether (perhaps by holding the lock throughout the > cleanup phase)? That's a bug, let's fix it. I agree with your suggestion that we had better not release the 2PC lock using the callbacks for COMMIT PREPARED or ROLLBACK PREPARED until the shared memory state is cleared. At the same time, I think that we should be smarter in the ordering of the actions: we care about predicate locks here, but not about the on-disk file removal and the stat counters. One trick is that the post-commit callback calls TwoPhaseGetDummyProc() which would try to take TwoPhaseStateLock which needs to be applied so we need some extra logic to not take a lock in this case. From what I can see this is older than 9.4 as the removal of the GXACT entry in shared memory and the post-commit hooks are out of sync for a long time :( Attached is a patch that fixes the problem for me. Please note the safety net in TwoPhaseGetGXact(). -- Michael
Attachment
On 2019-Feb-19, Michael Paquier wrote: > diff --git a/src/include/access/twophase.h b/src/include/access/twophase.h > index 6228b091d8..2dcd08e9fa 100644 > --- a/src/include/access/twophase.h > +++ b/src/include/access/twophase.h > @@ -34,7 +34,7 @@ extern void TwoPhaseShmemInit(void); > extern void AtAbort_Twophase(void); > extern void PostPrepare_Twophase(void); > > -extern PGPROC *TwoPhaseGetDummyProc(TransactionId xid); > +extern PGPROC *TwoPhaseGetDummyProc(TransactionId xid, bool lock_held); > extern BackendId TwoPhaseGetDummyBackendId(TransactionId xid); > > extern GlobalTransaction MarkAsPreparing(TransactionId xid, const char *gid, Hmm, ABI break ... -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Feb 19, 2019 at 01:07:06AM -0300, Alvaro Herrera wrote: > On 2019-Feb-19, Michael Paquier wrote: >> extern GlobalTransaction MarkAsPreparing(TransactionId xid, const char *gid, > > Hmm, ABI break ... Well, sure. I always post patches for HEAD first. And I was actually wondering if that's worth back-patching per the odds of facing the error and seeing how old it is. -- Michael
Attachment
Re: Prepared transaction releasing locks before deregistering its GID
From
Konstantin Knizhnik
Date:
On 19.02.2019 7:44, Michael Paquier wrote: > On Tue, Feb 19, 2019 at 01:07:06AM -0300, Alvaro Herrera wrote: >> On 2019-Feb-19, Michael Paquier wrote: >>> extern GlobalTransaction MarkAsPreparing(TransactionId xid, const char *gid, >> Hmm, ABI break ... > Well, sure. I always post patches for HEAD first. And I was actually > wondering if that's worth back-patching per the odds of facing the > error and seeing how old it is. > -- > Michael May be I missed something, but why it is not possible just to move removing 2PC GXact before releasing transaction locks: diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c index 9a8a6bb..574d28b 100644 --- a/src/backend/access/transam/twophase.c +++ b/src/backend/access/transam/twophase.c @@ -1560,17 +1560,6 @@ FinishPreparedTransaction(const char *gid, bool isCommit) if (hdr->initfileinval) RelationCacheInitFilePostInvalidate(); - /* And now do the callbacks */ - if (isCommit) - ProcessRecords(bufptr, xid, twophase_postcommit_callbacks); - else - ProcessRecords(bufptr, xid, twophase_postabort_callbacks); - - PredicateLockTwoPhaseFinish(xid, isCommit); - - /* Count the prepared xact as committed or aborted */ - AtEOXact_PgStat(isCommit); - /* * And now we can clean up any files we may have left. */ @@ -1582,6 +1571,17 @@ FinishPreparedTransaction(const char *gid, bool isCommit) LWLockRelease(TwoPhaseStateLock); MyLockedGxact = NULL; + /* And now do the callbacks */ + if (isCommit) + ProcessRecords(bufptr, xid, twophase_postcommit_callbacks); + else + ProcessRecords(bufptr, xid, twophase_postabort_callbacks); + + PredicateLockTwoPhaseFinish(xid, isCommit); + + /* Count the prepared xact as committed or aborted */ + AtEOXact_PgStat(isCommit); + RESUME_INTERRUPTS(); pfree(buf); -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Tue, Feb 19, 2019 at 12:26:04PM +0300, Konstantin Knizhnik wrote: > May be I missed something, but why it is not possible just to move removing > 2PC GXact before releasing transaction locks: Because we need to keep the 2PC reference in shared memory when releasing the locks before removing the physical file. The 2PC test suite for recovery has good coverage, and I imagine that these blow up with your suggestion (no time to test now, please see 009_twophase.pl). -- Michael
Attachment
Hi, Michael Paquier <michael@paquier.xyz> wrote: > On Mon, Feb 18, 2019 at 05:05:13PM +0100, Oleksii Kliukin wrote: >> That looks like a race condition to me. What happens is that another >> transaction with the name identical to the running one can start and proceed >> to the prepare phase while the original one commits, failing at last instead >> of waiting for the original one to finish. > > It took me 50 clients and a bit more than 20 seconds, but I have been > able to reproduce the problem with one error. Thanks for the > reproducer. This is indeed a race condition with 2PC. Yes, as a race condition I could consistently reproduce it on one system, but not on another. On my macbook it never manifested in an error; however, on one occasion I saw sessions waiting on a lock and an orphaned record in pg_prepared_xacts. I couldn’t find any evidences in the logs or elsewhere what has happened to that session, so I just assumed it was, perhaps, killed by the test cleanup. >> By looking at the source code of FinishPreparedTransaction() I can see the >> RemoveGXact() call, which removes the prepared transaction from >> TwoPhaseState->prepXacts. It is placed at the very end of the function, >> after the post-commit callbacks that clear out the locks held by the >> transaction. Those callbacks are not guarded by the TwoPhaseStateLock, >> resulting in a possibility for a concurrent session to proceed will >> MarkAsPreparing after acquiring the locks released by them. > > Hm, I see. Taking a breakpoint just after ProcessRecords() or putting > a sleep there makes the issue plain. The same issue can happen with > predicate locks. I did verify this on my part by putting a TwoPhaseStateLock right before the line saying "We assume it's safe to do this without taking TwoPhaseStateLock” and observing that the errors has stopped over multiple runs. >> I couldn’t find any documentation on the expected outcome in cases like >> this, so I assume it might not be a bug, but an undocumented behavior. > > If you run two transactions in parallel using your script, the second > transaction would wait at LOCK time until the first transaction > releases its locks with the COMMIT PREPARED. That is the desired outcome, right? > >> Should I go about and produce a patch to put a note in the description of >> commit prepared, or is there any interest in changing the behavior to avoid >> such conflicts altogether (perhaps by holding the lock throughout the >> cleanup phase)? > > That's a bug, let's fix it. I agree with your suggestion that we had > better not release the 2PC lock using the callbacks for COMMIT > PREPARED or ROLLBACK PREPARED until the shared memory state is > cleared. At the same time, I think that we should be smarter in the > ordering of the actions: we care about predicate locks here, but not > about the on-disk file removal and the stat counters. One trick is > that the post-commit callback calls TwoPhaseGetDummyProc() which would > try to take TwoPhaseStateLock which needs to be applied so we need > some extra logic to not take a lock in this case. From what I can see > this is older than 9.4 as the removal of the GXACT entry in shared > memory and the post-commit hooks are out of sync for a long time :( > > Attached is a patch that fixes the problem for me. Please note the > safety net in TwoPhaseGetGXact(). The approach looks good to me. Surprisingly, I saw no stalled backends because of the double acquisition of lock at TwoPhaseGetGXact once I put a simple TwoPhaseStateLock right before the "gxact->valid = false” line; I will test your patch and post the outcome. Regards, Oleksii Kliukin
On Tue, Feb 19, 2019 at 10:59:33AM +0100, Oleksii Kliukin wrote: > Michael Paquier <michael@paquier.xyz> wrote: >> If you run two transactions in parallel using your script, the second >> transaction would wait at LOCK time until the first transaction >> releases its locks with the COMMIT PREPARED. > > That is the desired outcome, right? Yes, that is the correct one in my opinion, and we should not have GID conflicts when running the scenario you provided upthread. -- Michael
Attachment
Hi, Oleksii Kliukin <alexk@hintbits.com> wrote: > > The approach looks good to me. Surprisingly, I saw no stalled backends > because of the double acquisition of lock at TwoPhaseGetGXact once I put a > simple TwoPhaseStateLock right before the "gxact->valid = false” line; I > will test your patch and post the outcome. I gave it a spin on the same VM host as shown to constantly reproduce the issue and observed neither 'identifier already in use' nor any locking issues over a few dozens of runs, so it looks good to me. That was HEAD, but since FinishPreparedTransaction seems to be identical there and on the back branches it should work for PG 10 and 11 as well. Regards, Oleksii Kliukin
On Tue, Feb 19, 2019 at 08:17:14PM +0100, Oleksii Kliukin wrote: > I gave it a spin on the same VM host as shown to constantly reproduce the > issue and observed neither 'identifier already in use' nor any locking > issues over a few dozens of runs, so it looks good to me. Thanks for the confirmation. I'll try to wrap up this one then. > That was HEAD, but since FinishPreparedTransaction seems to be identical > there and on the back branches it should work for PG 10 and 11 as well. The issue exists since two-phase commit has been added, down to 8.1 if my read of the code is right, so it took 14 years to be found. Congrats. I have also manually tested that the code is broken down to 9.4 though. It is good that you stress-test 2PC the way you do, and that's the second bug you have spotted since the duplicate XIDs in the standby snapshots which are now lazily discarded at recovery. Now, it seems to me that the potential ABI breakage argument (which can be solved by introducing an extra routine, which means more conflicts to handle when back-patching 2PC stuff), and the time it took to find the issue are pointing out that we should patch only HEAD. In the event of a back-patch, on top of the rest we would need to rework RemoveGXact so as it does not take a LWLock on two-phase state data in 10 and upwards, so that would be invasive, and we have way less automated testing in those versions.. -- Michael
Attachment
On Tue, Feb 19, 2019 at 12:27 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Mon, Feb 18, 2019 at 05:05:13PM +0100, Oleksii Kliukin wrote: > > That looks like a race condition to me. What happens is that another > > transaction with the name identical to the running one can start and proceed > > to the prepare phase while the original one commits, failing at last instead > > of waiting for the original one to finish. > > It took me 50 clients and a bit more than 20 seconds, but I have been > able to reproduce the problem with one error. Thanks for the > reproducer. This is indeed a race condition with 2PC. > > > By looking at the source code of FinishPreparedTransaction() I can see the > > RemoveGXact() call, which removes the prepared transaction from > > TwoPhaseState->prepXacts. It is placed at the very end of the function, > > after the post-commit callbacks that clear out the locks held by the > > transaction. Those callbacks are not guarded by the TwoPhaseStateLock, > > resulting in a possibility for a concurrent session to proceed will > > MarkAsPreparing after acquiring the locks released by them. > > Hm, I see. Taking a breakpoint just after ProcessRecords() or putting > a sleep there makes the issue plain. The same issue can happen with > predicate locks. > > > I couldn’t find any documentation on the expected outcome in cases like > > this, so I assume it might not be a bug, but an undocumented behavior. > > If you run two transactions in parallel using your script, the second > transaction would wait at LOCK time until the first transaction > releases its locks with the COMMIT PREPARED. > > > Should I go about and produce a patch to put a note in the description of > > commit prepared, or is there any interest in changing the behavior to avoid > > such conflicts altogether (perhaps by holding the lock throughout the > > cleanup phase)? > > That's a bug, let's fix it. I agree with your suggestion that we had > better not release the 2PC lock using the callbacks for COMMIT > PREPARED or ROLLBACK PREPARED until the shared memory state is > cleared. At the same time, I think that we should be smarter in the > ordering of the actions: we care about predicate locks here, but not > about the on-disk file removal and the stat counters. One trick is > that the post-commit callback calls TwoPhaseGetDummyProc() which would > try to take TwoPhaseStateLock which needs to be applied so we need > some extra logic to not take a lock in this case. From what I can see > this is older than 9.4 as the removal of the GXACT entry in shared > memory and the post-commit hooks are out of sync for a long time :( > > Attached is a patch that fixes the problem for me. Please note the > safety net in TwoPhaseGetGXact(). Thank you for working on this. @@ -811,6 +811,9 @@ TwoPhaseGetGXact(TransactionId xid) static TransactionId cached_xid = InvalidTransactionId; static GlobalTransaction cached_gxact = NULL; + Assert(!lock_held || + LWLockHeldByMeInMode(TwoPhaseStateLock, LW_EXCLUSIVE)); + /* * During a recovery, COMMIT PREPARED, or ABORT PREPARED, we'll be called * repeatedly for the same XID. We can save work with a simple cache. @@ -818,7 +821,8 @@ TwoPhaseGetGXact(TransactionId xid) if (xid == cached_xid) return cached_gxact; - LWLockAcquire(TwoPhaseStateLock, LW_SHARED); + if (!lock_held) + LWLockAcquire(TwoPhaseStateLock, LW_SHARED); for (i = 0; i < TwoPhaseState->numPrepXacts; i++) { It seems strange to me, why do we always require an exclusive lock here in spite of acquiring a shared lock? ----- @@ -854,7 +859,7 @@ TwoPhaseGetGXact(TransactionId xid) BackendId TwoPhaseGetDummyBackendId(TransactionId xid) { - GlobalTransaction gxact = TwoPhaseGetGXact(xid); + GlobalTransaction gxact = TwoPhaseGetGXact(xid, false); return gxact->dummyBackendId; } TwoPhaseGetDummyBackendId() is called by multixact_twophase_postcommit() which is called while holding TwoPhaseStateLock in exclusive mode in FinishPreparedTransaction(). Since we cache the global transaction when we call lock_twophase_commit() I couldn't produce issue but it seems to have a potential of locking problem. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Wed, Feb 20, 2019 at 03:14:07PM +0900, Masahiko Sawada wrote: > @@ -811,6 +811,9 @@ TwoPhaseGetGXact(TransactionId xid) > static TransactionId cached_xid = InvalidTransactionId; > static GlobalTransaction cached_gxact = NULL; > > + Assert(!lock_held || > + LWLockHeldByMeInMode(TwoPhaseStateLock, LW_EXCLUSIVE)); > + > > It seems strange to me, why do we always require an exclusive lock > here in spite of acquiring a shared lock? Thanks for looking at the patch. LWLockHeldByMe() is fine here. It seems that I had a brain fade. > TwoPhaseGetDummyBackendId() is called by > multixact_twophase_postcommit() which is called while holding > TwoPhaseStateLock in exclusive mode in FinishPreparedTransaction(). > Since we cache the global transaction when we call > lock_twophase_commit() I couldn't produce issue but it seems to have a > potential of locking problem. You are right: this could cause a deadlock problem, but it actually cannot be triggered now because the repetitive calls of TwoPhaseGetGXact() make the resulting XID to be cached, so the lock finishes by never be taken, but that could become broken in the future. From a consistency point of view, adding the same option to TwoPhaseGetGXact() and TwoPhaseGetGXact() to control the lock acquisition is much better as well. Please note that the recovery tests stress multixact post-commit callbacks already, so you can see the caching behavior with that one: BEGIN; CREATE TABLE t_009_tbl2 (id int, msg text); SAVEPOINT s1; INSERT INTO t_009_tbl2 VALUES (27, 'issued to stuff'); PREPARE TRANSACTION 'xact_009_13'; CHECKPOINT; COMMIT PREPARED 'xact_009_13'; The assertion really needs to be before the cached XID is checked though. Attached is an updated patch. Thanks for the feedback. -- Michael
Attachment
Michael Paquier <michael@paquier.xyz> wrote: > > Attached is an updated patch. Thanks for the feedback. @@ -1755,7 +1755,7 @@ void multixact_twophase_recover(TransactionId xid, uint16 info, void *recdata, uint32 len) { - BackendId dummyBackendId = TwoPhaseGetDummyBackendId(xid); + BackendId dummyBackendId = TwoPhaseGetDummyBackendId(xid, true); MultiXactId oldestMember; RecoverPreparedTransactions calls ProcessRecords with twophase_recover_callbacks right after releasing the TwoPhaseStateLock, so I think lock_held should be false here (matching the similar call of TwoPhaseGetDummyProc at lock_twophase_recover). Since the patch touches TwoPhaseGetDummyBackendId, let’s fix the comment header to mention it instead of TwoPhaseGetDummyProc > Now, it seems > to me that the potential ABI breakage argument (which can be solved by > introducing an extra routine, which means more conflicts to handle > when back-patching 2PC stuff), and the time it took to find the issue > are pointing out that we should patch only HEAD. Sounds reasonable. Regards, Oleksii Kliukin
On Wed, Feb 20, 2019 at 11:50:42AM +0100, Oleksii Kliukin wrote: > RecoverPreparedTransactions calls ProcessRecords with > twophase_recover_callbacks right after releasing the TwoPhaseStateLock, so I > think lock_held should be false here (matching the similar call of > TwoPhaseGetDummyProc at lock_twophase_recover). Indeed. That would be a bad idea. We don't actually stress this routine in 009_twophase.pl or the assertion I added would have blown up immediately. So I propose to close the gap, and the updated patch attached adds dedicated tests causing the problem you spotted to be triggered (soft and hard restarts with 2PC transactions including DDLs). The previous version of the patch and those tests cause the assertion to blow up, failing the tests. > Since the patch touches TwoPhaseGetDummyBackendId, let’s fix the comment > header to mention it instead of TwoPhaseGetDummyProc Not sure I understand the issue you are pointing out here. The patch adds an extra argument to both TwoPhaseGetDummyProc() and TwoPhaseGetDummyBackendId(), and the headers of both functions document the new argument. One extra trick I have been using for testing this patch is the following: --- a/src/backend/access/transam/twophase.c +++ b/src/backend/access/transam/twophase.c @@ -816,13 +816,6 @@ TwoPhaseGetGXact(TransactionId xid, bool lock_held) Assert(!lock_held || LWLockHeldByMe(TwoPhaseStateLock)); - /* - * During a recovery, COMMIT PREPARED, or ABORT PREPARED, we'll be called - * repeatedly for the same XID. We can save work with a simple cache. - */ - if (xid == cached_xid) - return cached_gxact; This has the advantage to check more aggressively for lock conflicts, causing the tests to deadlock if the flag is not correctly set in the worst case scenario. -- Michael
Attachment
Hi, Michael Paquier <michael@paquier.xyz> wrote: > On Wed, Feb 20, 2019 at 11:50:42AM +0100, Oleksii Kliukin wrote: >> RecoverPreparedTransactions calls ProcessRecords with >> twophase_recover_callbacks right after releasing the TwoPhaseStateLock, so I >> think lock_held should be false here (matching the similar call of >> TwoPhaseGetDummyProc at lock_twophase_recover). > > Indeed. That would be a bad idea. We don't actually stress this > routine in 009_twophase.pl or the assertion I added would have blown > up immediately. So I propose to close the gap, and the updated patch > attached adds dedicated tests causing the problem you spotted to be > triggered (soft and hard restarts with 2PC transactions including > DDLs). The previous version of the patch and those tests cause the > assertion to blow up, failing the tests. Thank you for updating the patch and sorry for the delay, it looks good to me, the tests pass on my system. >> Since the patch touches TwoPhaseGetDummyBackendId, let’s fix the comment >> header to mention it instead of TwoPhaseGetDummyProc > > Not sure I understand the issue you are pointing out here. The patch > adds an extra argument to both TwoPhaseGetDummyProc() and > TwoPhaseGetDummyBackendId(), and the headers of both functions > document the new argument. Just this: @@ -844,17 +851,18 @@ TwoPhaseGetGXact(TransactionId xid) } /* - * TwoPhaseGetDummyProc + * TwoPhaseGetDummyBackendId * Get the dummy backend ID for prepared transaction specified by XID > > One extra trick I have been using for testing this patch is the > following: > --- a/src/backend/access/transam/twophase.c > +++ b/src/backend/access/transam/twophase.c > @@ -816,13 +816,6 @@ TwoPhaseGetGXact(TransactionId xid, bool lock_held) > > Assert(!lock_held || LWLockHeldByMe(TwoPhaseStateLock)); > > - /* > - * During a recovery, COMMIT PREPARED, or ABORT PREPARED, we'll be called > - * repeatedly for the same XID. We can save work with a simple cache. > - */ > - if (xid == cached_xid) > - return cached_gxact; > > This has the advantage to check more aggressively for lock conflicts, > causing the tests to deadlock if the flag is not correctly set in the > worst case scenario. Nice, thank you. I ran all my tests with it and found no further issues. Regards, Oleksii Kliukin
On Fri, Feb 22, 2019 at 12:17:01PM +0100, Oleksii Kliukin wrote: > Thank you for updating the patch and sorry for the delay, it looks good to > me, the tests pass on my system. Thanks. I am still looking at this patch an extra time, so this may take at most a couple of days from my side. For now I have committed the test portion, which is independently useful and caused recovery of multixact post-commit callbacks to never be stressed. > Just this: > > @@ -844,17 +851,18 @@ TwoPhaseGetGXact(TransactionId xid) > } > > /* > - * TwoPhaseGetDummyProc > + * TwoPhaseGetDummyBackendId > * Get the dummy backend ID for prepared transaction specified by XID I have not been paying much attention to this one, good catch. This is unrelated to the patch of this thread so I have fixed it separately. -- Michael
Attachment
On Sat, Feb 23, 2019 at 08:44:43AM +0900, Michael Paquier wrote: > Thanks. I am still looking at this patch an extra time, so this may > take at most a couple of days from my side. For now I have committed > the test portion, which is independently useful and caused recovery of > multixact post-commit callbacks to never be stressed. Done. I have spent some time today looking at the performance of the patch, designing a worst-case scenario to see how much bloat this adds in COMMIT PREPARED by running across many sessions 2PC transactions taking SHARE locks across many tables, as done in the script attached. This test case runs across multiple sessions: BEGIN; SELECT lock_tables($NUM_TABLES); PREPARE TRANSACTION 't_$i'; COMMIT PREPARED 't_$i'; lock_tables() is able to take a lock on a set of tables, bloating the number of locks registered in the 2PC transaction, still those do not conflict, so it gives an idea of the extra impact of holding TwoPhaseStateLock longer. The script also includes a function to create thousands of tables easily, and can be controlled with a couple of parameters: - number of tables to use, which will be locked. - number of sessions. - run time of pgbench. -- Michael
Attachment
On Mon, Feb 25, 2019 at 02:28:23PM +0900, Michael Paquier wrote: > Done. I have spent some time today looking at the performance of the > patch, designing a worst-case scenario to see how much bloat this adds > in COMMIT PREPARED by running across many sessions 2PC transactions > taking SHARE locks across many tables, as done in the script attached. And of course I forgot the script, which is now attached. -- Michael