Thread: SUBTRANS: Minimizing calls to SubTransSetParent()
On Thu, 4 Aug 2022 at 13:11, Simon Riggs <simon.riggs@enterprisedb.com> wrote: > > On Wed, 3 Aug 2022 at 20:18, Andres Freund <andres@anarazel.de> wrote: > > > I think we should consider redesigning subtrans more substantially - even with > > the changes you propose here, there's still plenty ways to hit really bad > > performance. And there's only so much we can do about that without more > > fundamental design changes. > > I completely agree - you will be glad to hear that I've been working > on a redesign of the subtrans module. ... > I will post my patch, when complete, in a different thread. The attached patch reduces the overhead of SUBTRANS by minimizing the number of times SubTransSetParent() is called, to below 1% of the current rate in common cases. Instead of blindly calling SubTransSetParent() for every subxid, this proposal only calls SubTransSetParent() when that information will be required for later use. It does this by analyzing all of the callers of SubTransGetParent() and uses these pre-conditions to filter out calls/subxids that will never be required, for various reasons. It redesigns the way XactLockTableWait() calls SubTransGetTopmostTransactionId() to allow this. This short patchset compiles and passes make check-world, with lengthy comments. This might then make viable a simple rewrite of SUBTRANS using a hash table, as proposed by Andres. But in any case, it will allow us to design a densely packed SUBTRANS replacement that does not generate as much contention and I/O. NOTE that this patchset does not touch SUBTRANS at all, it just minimizes the calls in preparation for a later redesign in a later patch. If this patch/later versions of it is committed in Sept CF, then we should be in good shape to post a subtrans redesign patch by major patch deadline at end of year. Patches 001 and 002 are common elements of a different patch, "Smoothing the subtrans performance catastrophe", but other than that, the two patches are otherwise independent of each other. Where does this come from? I learnt a lot about subxids when coding Hot Standby, specifically commit 06da3c570f21394003. This patch just builds upon that earlier understanding. Comments please. -- Simon Riggs http://www.EnterpriseDB.com/
Attachment
On Mon, Aug 8, 2022 at 6:41 PM Simon Riggs <simon.riggs@enterprisedb.com> wrote: > > On Thu, 4 Aug 2022 at 13:11, Simon Riggs <simon.riggs@enterprisedb.com> wrote: > > > > On Wed, 3 Aug 2022 at 20:18, Andres Freund <andres@anarazel.de> wrote: > > > > > I think we should consider redesigning subtrans more substantially - even with > > > the changes you propose here, there's still plenty ways to hit really bad > > > performance. And there's only so much we can do about that without more > > > fundamental design changes. > > > > I completely agree - you will be glad to hear that I've been working > > on a redesign of the subtrans module. > ... > > I will post my patch, when complete, in a different thread. > > The attached patch reduces the overhead of SUBTRANS by minimizing the > number of times SubTransSetParent() is called, to below 1% of the > current rate in common cases. > > Instead of blindly calling SubTransSetParent() for every subxid, this > proposal only calls SubTransSetParent() when that information will be > required for later use. It does this by analyzing all of the callers > of SubTransGetParent() and uses these pre-conditions to filter out > calls/subxids that will never be required, for various reasons. It > redesigns the way XactLockTableWait() calls > SubTransGetTopmostTransactionId() to allow this. > > This short patchset compiles and passes make check-world, with lengthy comments. Does this patch set work independently or it has dependency on the patches on the other thread "Smoothing the subtrans performance catastrophe"? Because in this patch I see no code where we are changing anything to control the access of SubTransGetParent() from SubTransGetTopmostTransactionId()? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Tue, 9 Aug 2022 at 12:39, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > This short patchset compiles and passes make check-world, with lengthy comments. > > Does this patch set work independently or it has dependency on the > patches on the other thread "Smoothing the subtrans performance > catastrophe"? Patches 001 and 002 are common elements of a different patch, "Smoothing the subtrans performance catastrophe", but other than that, the two patches are otherwise independent of each other. i.e. there are common elements in both patches 001 puts all subxid data in a snapshot (up to a limit of 64 xids per topxid), even if one or more xids overflows. > Because in this patch I see no code where we are > changing anything to control the access of SubTransGetParent() from > SubTransGetTopmostTransactionId()? Those calls are unaffected, i.e. they both still work. Right now, we register all subxids in subtrans. But not all xids are subxids, so in fact, subtrans has many "holes" in it, where if you look up the parent for an xid it will just return InvalidTransactionId. There is a protection against that causing a problem because if you call TransactionIdDidCommit/Abort you can get a WARNING, or if you call SubTransGetTopmostTransaction() you can get an ERROR, but it is possible if you do a lookup for an inappropriate xid. i.e. if you call TransactionIdDidCommit() without first calling TransactionIdIsInProgress() as you are supposed to do. What this patch does is increase the number of "holes" in subtrans, reducing the overhead and making the subtrans data structure more amenable to using a dense structure rather than a sparse structure as we do now, which then leads to I/O overheads. But in this patch, we only have holes when we can prove that the subxid's parent will never be requested. Specifically, with this patch, running PL/pgSQL with a few subtransactions in will cause those subxids to be logged in subtrans about 1% as often as they are now, so greatly reducing the number of subtrans calls. Happy to provide more detailed review thoughts, so please keep asking questions. -- Simon Riggs http://www.EnterpriseDB.com/
On Tue, Aug 9, 2022 at 9:46 PM Simon Riggs <simon.riggs@enterprisedb.com> wrote: > Those calls are unaffected, i.e. they both still work. > > Right now, we register all subxids in subtrans. But not all xids are > subxids, so in fact, subtrans has many "holes" in it, where if you > look up the parent for an xid it will just return > c. There is a protection against that causing a > problem because if you call TransactionIdDidCommit/Abort you can get a > WARNING, or if you call SubTransGetTopmostTransaction() you can get an > ERROR, but it is possible if you do a lookup for an inappropriate xid. > i.e. if you call TransactionIdDidCommit() without first calling > TransactionIdIsInProgress() as you are supposed to do. IIUC, if SubTransGetParent SubTransGetParent then SubTransGetTopmostTransaction() loop will break and return the previousxid. So if we pass any topxid to SubTransGetTopmostTransaction() it will return back the same xid and that's fine as next we are going to search in the snapshot->xip array. But if we are calling this function with the subxid which might be there in the snapshot->subxip array but if we are first calling SubTransGetTopmostTransaction() then it will just return the same xid if the parent is not set for it. And now if we search this in the snapshot->xip array then we will get the wrong answer? So I still think some adjustment is required in XidInMVCCSnapdhot() such that we first search the snapshot->subxip array. Am I still missing something? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Wed, 10 Aug 2022 at 08:34, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > Am I still missing something? No, you have found a dependency between the patches that I was unaware of. So there is no bug if you apply both patches. Thanks for looking. > So I still think some adjustment is required in XidInMVCCSnapdhot() That is one way to resolve the issue, but not the only one. I can also change AssignTransactionId() to recursively register parent xids for all of a subxid's parents. I will add in a test case and resolve the dependency in my next patch. Thanks again. -- Simon Riggs http://www.EnterpriseDB.com/
On Wed, Aug 10, 2022 at 6:31 PM Simon Riggs <simon.riggs@enterprisedb.com> wrote: > > On Wed, 10 Aug 2022 at 08:34, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > Am I still missing something? > > No, you have found a dependency between the patches that I was unaware > of. So there is no bug if you apply both patches. Right > > > So I still think some adjustment is required in XidInMVCCSnapdhot() > > That is one way to resolve the issue, but not the only one. I can also > change AssignTransactionId() to recursively register parent xids for > all of a subxid's parents. > > I will add in a test case and resolve the dependency in my next patch. Okay, thanks, I will look into the updated patch after you submit that. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Thu, 11 Aug 2022 at 06:32, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > So I still think some adjustment is required in XidInMVCCSnapdhot() > > > > That is one way to resolve the issue, but not the only one. I can also > > change AssignTransactionId() to recursively register parent xids for > > all of a subxid's parents. > > > > I will add in a test case and resolve the dependency in my next patch. > > Okay, thanks, I will look into the updated patch after you submit that. PFA two patches, replacing earlier work 001_new_isolation_tests_for_subxids.v3.patch 002_minimize_calls_to_SubTransSetParent.v8.patch 001_new_isolation_tests_for_subxids.v3.patch Adds new test cases to master without adding any new code, specifically addressing the two areas of code that are not tested by existing tests. This gives us a baseline from which we can do test driven development. I'm hoping this can be reviewed and committed fairly smoothly. 002_minimize_calls_to_SubTransSetParent.v8.patch Reduces the number of calls to subtrans below 1% for the first 64 subxids, so overall will substantially reduce subtrans contention on master for the typical case, as well as smoothing the overflow case. Some discussion needed on this; there are various options. This combines the work originally posted here with another patch posted on the thread "Smoothing the subtrans performance catastrophe". I will do some performance testing also, but more welcome. -- Simon Riggs http://www.EnterpriseDB.com/
Attachment
On Tue, Aug 30, 2022 at 10:16 PM Simon Riggs <simon.riggs@enterprisedb.com> wrote: > PFA two patches, replacing earlier work > 001_new_isolation_tests_for_subxids.v3.patch > 002_minimize_calls_to_SubTransSetParent.v8.patch > > 001_new_isolation_tests_for_subxids.v3.patch > Adds new test cases to master without adding any new code, specifically > addressing the two areas of code that are not tested by existing tests. > This gives us a baseline from which we can do test driven development. > I'm hoping this can be reviewed and committed fairly smoothly. > > 002_minimize_calls_to_SubTransSetParent.v8.patch > Reduces the number of calls to subtrans below 1% for the first 64 subxids, > so overall will substantially reduce subtrans contention on master for the > typical case, as well as smoothing the overflow case. > Some discussion needed on this; there are various options. > This combines the work originally posted here with another patch posted on the > thread "Smoothing the subtrans performance catastrophe". > > I will do some performance testing also, but more welcome. Thanks for the updated patch, I have some questions/comments. 1. + * This has the downside that anyone waiting for a lock on aborted + * subtransactions would not be released immediately; that may or + * may not be an acceptable compromise. If not acceptable, this + * simple call needs to be replaced with a loop to register the + * parent for the current subxid stack, so we can walk back up it to + * the topxid. + */ + SubTransSetParent(subxid, GetTopTransactionId()); I do not understand in which situation we will see this downside. I mean if we see the logic of XactLockTableWait() then in the current situation also if the subtransaction is committed we directly wait on the top transaction by calling SubTransGetTopmostTransaction(xid); So if the lock-taking subtransaction is committed then we will wait directly for the top-level transaction and after that, it doesn't matter if we abort any of the parent subtransactions, because it will wait for the topmost transaction to complete. And if the lock-taking subtransaction is aborted then it will anyway stop waiting because TransactionIdIsInProgress() should return false. 2. /* * Notice that we update pg_subtrans with the top-level xid, rather than * the parent xid. This is a difference between normal processing and * recovery, yet is still correct in all cases. The reason is that * subtransaction commit is not marked in clog until commit processing, so * all aborted subtransactions have already been clearly marked in clog. * As a result we are able to refer directly to the top-level * transaction's state rather than skipping through all the intermediate * states in the subtransaction tree. This should be the first time we * have attempted to SubTransSetParent(). */ for (i = 0; i < nsubxids; i++) SubTransSetParent(subxids[i], topxid); I think this comment needs some modification because in this patch now in normal processing also we are setting the topxid as a parent right? 3. + while (TransactionIdIsValid(parentXid)) + { + previousXid = parentXid; + + /* + * Stop as soon as we are earlier than the cutoff. This saves multiple + * lookups against subtrans when we have a deeply nested subxid with + * a later snapshot with an xmin much higher than TransactionXmin. + */ + if (TransactionIdPrecedes(parentXid, cutoff_xid)) + { + *xid = previousXid; + return true; + } + parentXid = SubTransGetParent(parentXid); Do we need this while loop if we are directly setting topxid as a parent, so with that, we do not need multiple iterations to go to the top xid? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Tue, 6 Sept 2022 at 12:37, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Tue, Aug 30, 2022 at 10:16 PM Simon Riggs > <simon.riggs@enterprisedb.com> wrote: > > > PFA two patches, replacing earlier work > > 001_new_isolation_tests_for_subxids.v3.patch > > 002_minimize_calls_to_SubTransSetParent.v8.patch > > > > 001_new_isolation_tests_for_subxids.v3.patch > > Adds new test cases to master without adding any new code, specifically > > addressing the two areas of code that are not tested by existing tests. > > This gives us a baseline from which we can do test driven development. > > I'm hoping this can be reviewed and committed fairly smoothly. > > > > 002_minimize_calls_to_SubTransSetParent.v8.patch > > Reduces the number of calls to subtrans below 1% for the first 64 subxids, > > so overall will substantially reduce subtrans contention on master for the > > typical case, as well as smoothing the overflow case. > > Some discussion needed on this; there are various options. > > This combines the work originally posted here with another patch posted on the > > thread "Smoothing the subtrans performance catastrophe". > > > > I will do some performance testing also, but more welcome. > > Thanks for the updated patch, I have some questions/comments. Thanks for the review. > 1. > + * This has the downside that anyone waiting for a lock on aborted > + * subtransactions would not be released immediately; that may or > + * may not be an acceptable compromise. If not acceptable, this > + * simple call needs to be replaced with a loop to register the > + * parent for the current subxid stack, so we can walk > back up it to > + * the topxid. > + */ > + SubTransSetParent(subxid, GetTopTransactionId()); > > I do not understand in which situation we will see this downside. I > mean if we see the logic of XactLockTableWait() then in the current > situation also if the subtransaction is committed we directly wait on > the top transaction by calling SubTransGetTopmostTransaction(xid); > > So if the lock-taking subtransaction is committed then we will wait > directly for the top-level transaction and after that, it doesn't > matter if we abort any of the parent subtransactions, because it will > wait for the topmost transaction to complete. And if the lock-taking > subtransaction is aborted then it will anyway stop waiting because > TransactionIdIsInProgress() should return false. Yes, correct. > 2. > /* > * Notice that we update pg_subtrans with the top-level xid, rather than > * the parent xid. This is a difference between normal processing and > * recovery, yet is still correct in all cases. The reason is that > * subtransaction commit is not marked in clog until commit processing, so > * all aborted subtransactions have already been clearly marked in clog. > * As a result we are able to refer directly to the top-level > * transaction's state rather than skipping through all the intermediate > * states in the subtransaction tree. This should be the first time we > * have attempted to SubTransSetParent(). > */ > for (i = 0; i < nsubxids; i++) > SubTransSetParent(subxids[i], topxid); > > I think this comment needs some modification because in this patch now > in normal processing also we are setting the topxid as a parent right? Correct > 3. > + while (TransactionIdIsValid(parentXid)) > + { > + previousXid = parentXid; > + > + /* > + * Stop as soon as we are earlier than the cutoff. This saves multiple > + * lookups against subtrans when we have a deeply nested subxid with > + * a later snapshot with an xmin much higher than TransactionXmin. > + */ > + if (TransactionIdPrecedes(parentXid, cutoff_xid)) > + { > + *xid = previousXid; > + return true; > + } > + parentXid = SubTransGetParent(parentXid); > > Do we need this while loop if we are directly setting topxid as a > parent, so with that, we do not need multiple iterations to go to the > top xid? Correct. I think we can dispense with SubTransGetTopmostTransactionPrecedes() entirely. I was initially trying to leave options open but that is confusing and as a result, some parts are misleading after I merged the two patches. I will update the patch, thanks for your scrutiny. -- Simon Riggs http://www.EnterpriseDB.com/
On Tue, 6 Sept 2022 at 13:14, Simon Riggs <simon.riggs@enterprisedb.com> wrote: > I will update the patch, thanks for your scrutiny. I attach a diff showing what has changed between v8 and v9, and will reattach a full set of new patches in the next post, so patchtester doesn't squeal. -- Simon Riggs http://www.EnterpriseDB.com/
Attachment
On Tue, 13 Sept 2022 at 11:56, Simon Riggs <simon.riggs@enterprisedb.com> wrote: > > On Tue, 6 Sept 2022 at 13:14, Simon Riggs <simon.riggs@enterprisedb.com> wrote: > > > I will update the patch, thanks for your scrutiny. > > I attach a diff showing what has changed between v8 and v9, and will > reattach a full set of new patches in the next post, so patchtester > doesn't squeal. Full set of v9 patches -- Simon Riggs http://www.EnterpriseDB.com/
Attachment
On 2022-Aug-30, Simon Riggs wrote: > 001_new_isolation_tests_for_subxids.v3.patch > Adds new test cases to master without adding any new code, specifically > addressing the two areas of code that are not tested by existing tests. > This gives us a baseline from which we can do test driven development. > I'm hoping this can be reviewed and committed fairly smoothly. I gave this a quick run to confirm the claimed increase of coverage. It checks out, so pushed. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
On Wed, 14 Sept 2022 at 15:21, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > On 2022-Aug-30, Simon Riggs wrote: > > > 001_new_isolation_tests_for_subxids.v3.patch > > Adds new test cases to master without adding any new code, specifically > > addressing the two areas of code that are not tested by existing tests. > > This gives us a baseline from which we can do test driven development. > > I'm hoping this can be reviewed and committed fairly smoothly. > > I gave this a quick run to confirm the claimed increase of coverage. It > checks out, so pushed. Thank you. So now we just have the main part of the patch, reattached here for the auto patch tester's benefit. -- Simon Riggs http://www.EnterpriseDB.com/
Attachment
On Thu, 15 Sep 2022 at 18:04, Simon Riggs <simon.riggs@enterprisedb.com> wrote: > On Wed, 14 Sept 2022 at 15:21, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: >> >> On 2022-Aug-30, Simon Riggs wrote: >> >> > 001_new_isolation_tests_for_subxids.v3.patch >> > Adds new test cases to master without adding any new code, specifically >> > addressing the two areas of code that are not tested by existing tests. >> > This gives us a baseline from which we can do test driven development. >> > I'm hoping this can be reviewed and committed fairly smoothly. >> >> I gave this a quick run to confirm the claimed increase of coverage. It >> checks out, so pushed. > > Thank you. > > So now we just have the main part of the patch, reattached here for > the auto patch tester's benefit. Hi Simon, Thanks for the updated patch, here are some comments. There is a typo, s/SubTransGetTopMostTransaction/SubTransGetTopmostTransaction/g. + * call SubTransGetTopMostTransaction() if that xact overflowed; Is there a punctuation mark missing on the following first line? + * 2. When IsolationIsSerializable() we sometimes need to access topxid + * This occurs only when SERIALIZABLE is requested by app user. When we use function name in comments, some places we use parentheses, but others do not use it. Why? I think, we should keep them consistent, at least in the same commit. + * 3. When TransactionIdSetTreeStatus will use a status of SUB_COMMITTED, + * which then requires us to consult subtrans to find parent, which + * is needed to avoid race condition. In this case we ask Clog/Xact + * module if TransactionIdsAreOnSameXactPage(). Since we start a new + * clog page every 32000 xids, this is usually <<1% of subxids. Maybe we declaration a topxid to avoid calling GetTopTransactionId() twice when we should set subtrans parent? + TransactionId subxid = XidFromFullTransactionId(s->fullTransactionId); + TransactionId topxid = GetTopTransactionId(); ... + if (MyProc->subxidStatus.overflowed || + IsolationIsSerializable() || + !TransactionIdsAreOnSameXactPage(topxid, subxid)) + { ... + SubTransSetParent(subxid, topxid); + } -- Regrads, Japin Li. ChengDu WenWu Information Technology Co.,Ltd.
On Thu, 15 Sept 2022 at 12:36, Japin Li <japinli@hotmail.com> wrote: > > > On Thu, 15 Sep 2022 at 18:04, Simon Riggs <simon.riggs@enterprisedb.com> wrote: > > On Wed, 14 Sept 2022 at 15:21, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > >> > >> On 2022-Aug-30, Simon Riggs wrote: > >> > >> > 001_new_isolation_tests_for_subxids.v3.patch > >> > Adds new test cases to master without adding any new code, specifically > >> > addressing the two areas of code that are not tested by existing tests. > >> > This gives us a baseline from which we can do test driven development. > >> > I'm hoping this can be reviewed and committed fairly smoothly. > >> > >> I gave this a quick run to confirm the claimed increase of coverage. It > >> checks out, so pushed. > > > > Thank you. > > > > So now we just have the main part of the patch, reattached here for > > the auto patch tester's benefit. > > Hi Simon, > > Thanks for the updated patch, here are some comments. Thanks for your comments. > There is a typo, s/SubTransGetTopMostTransaction/SubTransGetTopmostTransaction/g. > > + * call SubTransGetTopMostTransaction() if that xact overflowed; > > > Is there a punctuation mark missing on the following first line? > > + * 2. When IsolationIsSerializable() we sometimes need to access topxid > + * This occurs only when SERIALIZABLE is requested by app user. > > > When we use function name in comments, some places we use parentheses, > but others do not use it. Why? I think, we should keep them consistent, > at least in the same commit. > > + * 3. When TransactionIdSetTreeStatus will use a status of SUB_COMMITTED, > + * which then requires us to consult subtrans to find parent, which > + * is needed to avoid race condition. In this case we ask Clog/Xact > + * module if TransactionIdsAreOnSameXactPage(). Since we start a new > + * clog page every 32000 xids, this is usually <<1% of subxids. I've reworded those comments, hoping to address all of your above points. > Maybe we declaration a topxid to avoid calling GetTopTransactionId() > twice when we should set subtrans parent? > > + TransactionId subxid = XidFromFullTransactionId(s->fullTransactionId); > + TransactionId topxid = GetTopTransactionId(); > ... > + if (MyProc->subxidStatus.overflowed || > + IsolationIsSerializable() || > + !TransactionIdsAreOnSameXactPage(topxid, subxid)) > + { > ... > + SubTransSetParent(subxid, topxid); > + } Seems a minor point, but I've done this anyway. Thanks for the review. v10 attached -- Simon Riggs http://www.EnterpriseDB.com/
Attachment
On Fri, 16 Sept 2022 at 13:20, Simon Riggs <simon.riggs@enterprisedb.com> wrote: > > Thanks for the review. > > v10 attached v11 attached, corrected for recent commit 14ff44f80c09718d43d853363941457f5468cc03. -- Simon Riggs http://www.EnterpriseDB.com/
Attachment
Hi, Le lun. 26 sept. 2022 à 15:57, Simon Riggs <simon.riggs@enterprisedb.com> a écrit : > > On Fri, 16 Sept 2022 at 13:20, Simon Riggs <simon.riggs@enterprisedb.com> wrote: > > > > Thanks for the review. > > > > v10 attached > > v11 attached, corrected for recent commit > 14ff44f80c09718d43d853363941457f5468cc03. Please find below the performance tests results I have produced for this patch. Attaching some charts and the scripts used to reproduce these tests. 1. Assumption The number of sub-transaction issued by only one long running transaction may affect global TPS throughput if the number of sub-transaction exceeds 64 (sub-overflow) 2. Testing scenario Based on pgbench, 2 different types of DB activity are applied concurrently: - 1 long running transaction, including N sub-transactions - X pgbench clients running read-only workload Tests are executed with a varying number of sub-transactions: from 0 to 128 Key metric is the TPS rate reported by pgbench runs in read-only mode Tests are executed against - HEAD (14a737) - HEAD (14a737) + 002_minimize_calls_to_SubTransSetParent.v11.patch 3. Long transaction anatomy Two different long transactions are tested because they don't have the exact same impact on performance. Transaction number 1 includes one UPDATE affecting each row of pgbench_accounts, plus an additional UPDATE affecting only one row but executed in its own rollbacked sub-transaction: BEGIN; SAVEPOINT s1; SAVEPOINT s2; -- ... SAVEPOINT sN - 1; UPDATE pgbench_accounts SET abalance = abalance + 1 WHERE aid > 0; SAVEPOINT sN; UPDATE pgbench_accounts SET abalance = abalance + 1 WHERE aid = 12345; ROLLBACK TO SAVEPOINT sN; -- sleeping until the end of the test ROLLBACK; Transaction 2 includes one UPDATE affecting each row of pgbench_accounts: BEGIN; SAVEPOINT s1; SAVEPOINT s2; -- ... SAVEPOINT sN; UPDATE pgbench_accounts SET abalance = abalance + 1 WHERE aid > 0; -- sleeping until the end of the test ROLLBACK; 4. Test results with transaction 1 TPS vs number of sub-transaction nsubx HEAD patched -------------------- 0 441109 439474 8 439045 438103 16 439123 436993 24 436269 434194 32 439707 437429 40 439997 437220 48 439388 437422 56 439409 437210 64 439748 437366 72 92869 434448 80 66577 434100 88 61243 434255 96 57016 434419 104 52132 434917 112 49181 433755 120 46581 434044 128 44067 434268 Perf profiling on HEAD with 80 sub-transactions: Overhead Symbol 51.26% [.] LWLockAttemptLock 24.59% [.] LWLockRelease 0.36% [.] base_yyparse 0.35% [.] PinBuffer 0.34% [.] AllocSetAlloc 0.33% [.] hash_search_with_hash_value 0.22% [.] LWLockAcquire 0.20% [.] UnpinBuffer 0.15% [.] SimpleLruReadPage_ReadOnly 0.15% [.] _bt_compare Perf profiling on patched with 80 sub-transactions: Overhead Symbol 2.64% [.] AllocSetAlloc 2.09% [.] base_yyparse 1.76% [.] hash_search_with_hash_value 1.62% [.] LWLockAttemptLock 1.26% [.] MemoryContextAllocZeroAligned 0.93% [.] _bt_compare 0.92% [.] expression_tree_walker_impl.part.4 0.84% [.] SearchCatCache1 0.79% [.] palloc 0.64% [.] core_yylex 5. Test results with transaction 2 nsubx HEAD patched -------------------- 0 440145 443816 8 438867 443081 16 438634 441786 24 436406 440187 32 439203 442447 40 439819 443574 48 439314 442941 56 439801 443736 64 439074 441970 72 439833 444132 80 148737 439941 88 413714 443343 96 251098 442021 104 70190 443488 112 405507 438866 120 177827 443202 128 399431 441842 From the performance point of view, this patch clearly fixes the dramatic TPS collapse shown in these tests. Regards, -- Julien Tachoires EDB
Attachment
On Fri, Oct 28, 2022 at 10:55 PM Julien Tachoires <julmon@gmail.com> wrote: > > Hi, > > Le lun. 26 sept. 2022 à 15:57, Simon Riggs > <simon.riggs@enterprisedb.com> a écrit : > > > > On Fri, 16 Sept 2022 at 13:20, Simon Riggs <simon.riggs@enterprisedb.com> wrote: > > > > > > Thanks for the review. > > > > > > v10 attached > > > > v11 attached, corrected for recent commit > > 14ff44f80c09718d43d853363941457f5468cc03. > > Please find below the performance tests results I have produced for this patch. > Attaching some charts and the scripts used to reproduce these tests. > > 1. Assumption > > The number of sub-transaction issued by only one long running > transaction may affect global TPS throughput if the number of > sub-transaction exceeds 64 (sub-overflow) > > 2. Testing scenario > > Based on pgbench, 2 different types of DB activity are applied concurrently: > - 1 long running transaction, including N sub-transactions > - X pgbench clients running read-only workload > > Tests are executed with a varying number of sub-transactions: from 0 to 128 > Key metric is the TPS rate reported by pgbench runs in read-only mode > > Tests are executed against > - HEAD (14a737) > - HEAD (14a737) + 002_minimize_calls_to_SubTransSetParent.v11.patch > > 3. Long transaction anatomy > > Two different long transactions are tested because they don't have the > exact same impact on performance. > > Transaction number 1 includes one UPDATE affecting each row of > pgbench_accounts, plus an additional UPDATE affecting only one row but > executed in its own rollbacked sub-transaction: > BEGIN; > SAVEPOINT s1; > SAVEPOINT s2; > -- ... > SAVEPOINT sN - 1; > UPDATE pgbench_accounts SET abalance = abalance + 1 WHERE aid > 0; > SAVEPOINT sN; > UPDATE pgbench_accounts SET abalance = abalance + 1 WHERE aid = 12345; > ROLLBACK TO SAVEPOINT sN; > -- sleeping until the end of the test > ROLLBACK; > > Transaction 2 includes one UPDATE affecting each row of pgbench_accounts: > BEGIN; > SAVEPOINT s1; > SAVEPOINT s2; > -- ... > SAVEPOINT sN; > UPDATE pgbench_accounts SET abalance = abalance + 1 WHERE aid > 0; > -- sleeping until the end of the test > ROLLBACK; > > 4. Test results with transaction 1 > > TPS vs number of sub-transaction > > nsubx HEAD patched > -------------------- > 0 441109 439474 > 8 439045 438103 > 16 439123 436993 > 24 436269 434194 > 32 439707 437429 > 40 439997 437220 > 48 439388 437422 > 56 439409 437210 > 64 439748 437366 > 72 92869 434448 > 80 66577 434100 > 88 61243 434255 > 96 57016 434419 > 104 52132 434917 > 112 49181 433755 > 120 46581 434044 > 128 44067 434268 > > Perf profiling on HEAD with 80 sub-transactions: > Overhead Symbol > 51.26% [.] LWLockAttemptLock > 24.59% [.] LWLockRelease > 0.36% [.] base_yyparse > 0.35% [.] PinBuffer > 0.34% [.] AllocSetAlloc > 0.33% [.] hash_search_with_hash_value > 0.22% [.] LWLockAcquire > 0.20% [.] UnpinBuffer > 0.15% [.] SimpleLruReadPage_ReadOnly > 0.15% [.] _bt_compare > > Perf profiling on patched with 80 sub-transactions: > Overhead Symbol > 2.64% [.] AllocSetAlloc > 2.09% [.] base_yyparse > 1.76% [.] hash_search_with_hash_value > 1.62% [.] LWLockAttemptLock > 1.26% [.] MemoryContextAllocZeroAligned > 0.93% [.] _bt_compare > 0.92% [.] expression_tree_walker_impl.part.4 > 0.84% [.] SearchCatCache1 > 0.79% [.] palloc > 0.64% [.] core_yylex > > 5. Test results with transaction 2 > > nsubx HEAD patched > -------------------- > 0 440145 443816 > 8 438867 443081 > 16 438634 441786 > 24 436406 440187 > 32 439203 442447 > 40 439819 443574 > 48 439314 442941 > 56 439801 443736 > 64 439074 441970 > 72 439833 444132 > 80 148737 439941 > 88 413714 443343 > 96 251098 442021 > 104 70190 443488 > 112 405507 438866 > 120 177827 443202 > 128 399431 441842 > > From the performance point of view, this patch clearly fixes the > dramatic TPS collapse shown in these tests. I think these are really promising results. Although the perf result shows that the bottleneck on the SLRU is no more there with the patch, I think it would be nice to see the wait event as well. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Hi, Le mar. 1 nov. 2022 à 09:39, Dilip Kumar <dilipbalaut@gmail.com> a écrit : > > On Fri, Oct 28, 2022 at 10:55 PM Julien Tachoires <julmon@gmail.com> wrote: > > > > Hi, > > > > Le lun. 26 sept. 2022 à 15:57, Simon Riggs > > <simon.riggs@enterprisedb.com> a écrit : > > > > > > On Fri, 16 Sept 2022 at 13:20, Simon Riggs <simon.riggs@enterprisedb.com> wrote: > > > > > > > > Thanks for the review. > > > > > > > > v10 attached > > > > > > v11 attached, corrected for recent commit > > > 14ff44f80c09718d43d853363941457f5468cc03. > > > > Please find below the performance tests results I have produced for this patch. > > Attaching some charts and the scripts used to reproduce these tests. > > > > 1. Assumption > > > > The number of sub-transaction issued by only one long running > > transaction may affect global TPS throughput if the number of > > sub-transaction exceeds 64 (sub-overflow) > > > > 2. Testing scenario > > > > Based on pgbench, 2 different types of DB activity are applied concurrently: > > - 1 long running transaction, including N sub-transactions > > - X pgbench clients running read-only workload > > > > Tests are executed with a varying number of sub-transactions: from 0 to 128 > > Key metric is the TPS rate reported by pgbench runs in read-only mode > > > > Tests are executed against > > - HEAD (14a737) > > - HEAD (14a737) + 002_minimize_calls_to_SubTransSetParent.v11.patch > > > > 3. Long transaction anatomy > > > > Two different long transactions are tested because they don't have the > > exact same impact on performance. > > > > Transaction number 1 includes one UPDATE affecting each row of > > pgbench_accounts, plus an additional UPDATE affecting only one row but > > executed in its own rollbacked sub-transaction: > > BEGIN; > > SAVEPOINT s1; > > SAVEPOINT s2; > > -- ... > > SAVEPOINT sN - 1; > > UPDATE pgbench_accounts SET abalance = abalance + 1 WHERE aid > 0; > > SAVEPOINT sN; > > UPDATE pgbench_accounts SET abalance = abalance + 1 WHERE aid = 12345; > > ROLLBACK TO SAVEPOINT sN; > > -- sleeping until the end of the test > > ROLLBACK; > > > > Transaction 2 includes one UPDATE affecting each row of pgbench_accounts: > > BEGIN; > > SAVEPOINT s1; > > SAVEPOINT s2; > > -- ... > > SAVEPOINT sN; > > UPDATE pgbench_accounts SET abalance = abalance + 1 WHERE aid > 0; > > -- sleeping until the end of the test > > ROLLBACK; > > > > 4. Test results with transaction 1 > > > > TPS vs number of sub-transaction > > > > nsubx HEAD patched > > -------------------- > > 0 441109 439474 > > 8 439045 438103 > > 16 439123 436993 > > 24 436269 434194 > > 32 439707 437429 > > 40 439997 437220 > > 48 439388 437422 > > 56 439409 437210 > > 64 439748 437366 > > 72 92869 434448 > > 80 66577 434100 > > 88 61243 434255 > > 96 57016 434419 > > 104 52132 434917 > > 112 49181 433755 > > 120 46581 434044 > > 128 44067 434268 > > > > Perf profiling on HEAD with 80 sub-transactions: > > Overhead Symbol > > 51.26% [.] LWLockAttemptLock > > 24.59% [.] LWLockRelease > > 0.36% [.] base_yyparse > > 0.35% [.] PinBuffer > > 0.34% [.] AllocSetAlloc > > 0.33% [.] hash_search_with_hash_value > > 0.22% [.] LWLockAcquire > > 0.20% [.] UnpinBuffer > > 0.15% [.] SimpleLruReadPage_ReadOnly > > 0.15% [.] _bt_compare > > > > Perf profiling on patched with 80 sub-transactions: > > Overhead Symbol > > 2.64% [.] AllocSetAlloc > > 2.09% [.] base_yyparse > > 1.76% [.] hash_search_with_hash_value > > 1.62% [.] LWLockAttemptLock > > 1.26% [.] MemoryContextAllocZeroAligned > > 0.93% [.] _bt_compare > > 0.92% [.] expression_tree_walker_impl.part.4 > > 0.84% [.] SearchCatCache1 > > 0.79% [.] palloc > > 0.64% [.] core_yylex > > > > 5. Test results with transaction 2 > > > > nsubx HEAD patched > > -------------------- > > 0 440145 443816 > > 8 438867 443081 > > 16 438634 441786 > > 24 436406 440187 > > 32 439203 442447 > > 40 439819 443574 > > 48 439314 442941 > > 56 439801 443736 > > 64 439074 441970 > > 72 439833 444132 > > 80 148737 439941 > > 88 413714 443343 > > 96 251098 442021 > > 104 70190 443488 > > 112 405507 438866 > > 120 177827 443202 > > 128 399431 441842 > > > > From the performance point of view, this patch clearly fixes the > > dramatic TPS collapse shown in these tests. > > I think these are really promising results. Although the perf result > shows that the bottleneck on the SLRU is no more there with the patch, > I think it would be nice to see the wait event as well. Please find attached samples returned by the following query when testing transaction 1 with 80 subxacts: SELECT wait_event_type, wait_event, locktype, mode, database, relation, COUNT(*) from pg_stat_activity AS psa JOIN pg_locks AS pl ON (psa.pid = pl.pid) GROUP BY 1, 2, 3, 4, 5, 6 ORDER BY 7 DESC; Regards, -- Julien Tachoires EDB
Attachment
On Tue, 1 Nov 2022 at 08:55, Julien Tachoires <julmon@gmail.com> wrote: > > > > 4. Test results with transaction 1 > > > > > > TPS vs number of sub-transaction > > > > > > nsubx HEAD patched > > > -------------------- > > > 0 441109 439474 > > > 8 439045 438103 > > > 16 439123 436993 > > > 24 436269 434194 > > > 32 439707 437429 > > > 40 439997 437220 > > > 48 439388 437422 > > > 56 439409 437210 > > > 64 439748 437366 > > > 72 92869 434448 > > > 80 66577 434100 > > > 88 61243 434255 > > > 96 57016 434419 > > > 104 52132 434917 > > > 112 49181 433755 > > > 120 46581 434044 > > > 128 44067 434268 > > > > > > Perf profiling on HEAD with 80 sub-transactions: > > > Overhead Symbol > > > 51.26% [.] LWLockAttemptLock > > > 24.59% [.] LWLockRelease > > > 0.36% [.] base_yyparse > > > 0.35% [.] PinBuffer > > > 0.34% [.] AllocSetAlloc > > > 0.33% [.] hash_search_with_hash_value > > > 0.22% [.] LWLockAcquire > > > 0.20% [.] UnpinBuffer > > > 0.15% [.] SimpleLruReadPage_ReadOnly > > > 0.15% [.] _bt_compare > > > > > > Perf profiling on patched with 80 sub-transactions: > > > Overhead Symbol > > > 2.64% [.] AllocSetAlloc > > > 2.09% [.] base_yyparse > > > 1.76% [.] hash_search_with_hash_value > > > 1.62% [.] LWLockAttemptLock > > > 1.26% [.] MemoryContextAllocZeroAligned > > > 0.93% [.] _bt_compare > > > 0.92% [.] expression_tree_walker_impl.part.4 > > > 0.84% [.] SearchCatCache1 > > > 0.79% [.] palloc > > > 0.64% [.] core_yylex > > > > > > 5. Test results with transaction 2 > > > > > > nsubx HEAD patched > > > -------------------- > > > 0 440145 443816 > > > 8 438867 443081 > > > 16 438634 441786 > > > 24 436406 440187 > > > 32 439203 442447 > > > 40 439819 443574 > > > 48 439314 442941 > > > 56 439801 443736 > > > 64 439074 441970 > > > 72 439833 444132 > > > 80 148737 439941 > > > 88 413714 443343 > > > 96 251098 442021 > > > 104 70190 443488 > > > 112 405507 438866 > > > 120 177827 443202 > > > 128 399431 441842 > > > > > > From the performance point of view, this patch clearly fixes the > > > dramatic TPS collapse shown in these tests. > > > > I think these are really promising results. Although the perf result > > shows that the bottleneck on the SLRU is no more there with the patch, > > I think it would be nice to see the wait event as well. > > Please find attached samples returned by the following query when > testing transaction 1 with 80 subxacts: > SELECT wait_event_type, wait_event, locktype, mode, database, > relation, COUNT(*) from pg_stat_activity AS psa JOIN pg_locks AS pl ON > (psa.pid = pl.pid) GROUP BY 1, 2, 3, 4, 5, 6 ORDER BY 7 DESC; These results are compelling, thank you. Setting this to Ready for Committer. -- Simon Riggs http://www.EnterpriseDB.com/
On Mon, 7 Nov 2022 at 21:14, Simon Riggs <simon.riggs@enterprisedb.com> wrote: > These results are compelling, thank you. > > Setting this to Ready for Committer. New version attached. -- Simon Riggs http://www.EnterpriseDB.com/
Attachment
On Tue, 15 Nov 2022 at 17:34, Simon Riggs <simon.riggs@enterprisedb.com> wrote: > On Mon, 7 Nov 2022 at 21:14, Simon Riggs <simon.riggs@enterprisedb.com> wrote: > >> These results are compelling, thank you. >> >> Setting this to Ready for Committer. > > New version attached. Take a quick look, I think it should be PGPROC instead of PG_PROC, right? + * 1. When there's no room in PG_PROC, as mentioned above. + * During XactLockTableWait() we sometimes need to know the topxid. + * If there is room in PG_PROC we can get a subxid's topxid direct + * from the procarray if the topxid is still running, using -- Regrads, Japin Li. ChengDu WenWu Information Technology Co.,Ltd.
Simon Riggs <simon.riggs@enterprisedb.com> writes: > New version attached. I looked at this patch and I do not see how it can possibly be safe. The most direct counterexample arises from the fact that HeapCheckForSerializableConflictOut checks SubTransGetTopmostTransaction in some cases. You haven't tried to analyze when, but just disabled the optimization in serializable mode: + * 2. When IsolationIsSerializable() we sometimes need to access topxid. + * This occurs only when SERIALIZABLE is requested by app user. +... + if (MyProc->subxidStatus.overflowed || + IsolationIsSerializable() || However, what this is checking is whether *our current transaction* is serializable. If we're not serializable, but other transactions in the system are, then this fails to store information that they'll need for correctness. You'd have to have some way of knowing that no transaction in the system is using serializable mode, and that none will start to do so while this transaction is still in-doubt. I fear that's already enough to kill the idea; but there's more. The subxidStatus.overflowed check quoted above has a similar sort of myopia: it's checking whether our current transaction has already suboverflowed. But (a) that doesn't prove it won't suboverflow later, and (b) the relevant logic in XidInMVCCSnapshot needs to run SubTransGetTopmostTransaction if *any* proc in the snapshot has suboverflowed. Lastly, I don't see what the "transaction on same page" business has got to do with anything. The comment is certainly failing to make the case that it's safe to skip filling subtrans when that is true. I think we could salvage this small idea: + * Insert entries into subtrans for this xid, noting that the entry + * points directly to the topxid, not the immediate parent. This is + * done for two reasons: + * (1) so it is faster in a long chain of subxids, because the + * algorithm is then O(1), no matter how many subxids are assigned. but some work would be needed to update the comments around SubTransGetParent and SubTransGetTopmostTransaction to explain that they're no longer reliably different. I think that that is okay for the existing use-cases, but they'd better be documented. In fact, couldn't we simplify them down to one function? Given the restriction that we don't look back in pg_subtrans further than TransactionXmin, I don't think that updated code would ever need to resolve cases written by older code. So we could remove the recursive checks entirely, or at least be confident that they don't recurse more than once. regards, tom lane
On Tue, 15 Nov 2022 at 21:03, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Simon Riggs <simon.riggs@enterprisedb.com> writes: > > New version attached. > > I looked at this patch and I do not see how it can possibly be safe. I grant you it is complex, so please bear with me. > The most direct counterexample arises from the fact that > HeapCheckForSerializableConflictOut checks SubTransGetTopmostTransaction > in some cases. You haven't tried to analyze when, but just disabled > the optimization in serializable mode: > > + * 2. When IsolationIsSerializable() we sometimes need to access topxid. > + * This occurs only when SERIALIZABLE is requested by app user. > +... > + if (MyProc->subxidStatus.overflowed || > + IsolationIsSerializable() || > > However, what this is checking is whether *our current transaction* > is serializable. If we're not serializable, but other transactions > in the system are, then this fails to store information that they'll > need for correctness. You'd have to have some way of knowing that > no transaction in the system is using serializable mode, and that > none will start to do so while this transaction is still in-doubt. Not true. src/backend/storage/lmgr/README-SSI says this... * Any transaction which is run at a transaction isolation level other than SERIALIZABLE will not be affected by SSI. If you want to enforce business rules through SSI, all transactions should be run at the SERIALIZABLE transaction isolation level, and that should probably be set as the default. If HeapCheckForSerializableConflictOut() cannot find a subxid's parent then it will not be involved in serialization errors. So skipping the storage of subxids in subtrans for non-serializable xacts is valid for both SSI and non-SSI xacts. Thus the owning transaction can decide to skip the insert into subtrans if it is not serializable. > I fear that's already enough to kill the idea; but there's more. > The subxidStatus.overflowed check quoted above has a similar sort > of myopia: it's checking whether our current transaction has > already suboverflowed. But (a) that doesn't prove it won't suboverflow > later, and (b) the relevant logic in XidInMVCCSnapshot needs to run > SubTransGetTopmostTransaction if *any* proc in the snapshot has > suboverflowed. Not the way it is coded now. First, we search the subxid cache in snapshot->subxip. Then, and only if the snapshot overflowed (i.e. ANY xact overflowed), do we check subtrans. Thus, the owning xact knows that anyone else will find the first 64 xids in the subxid cache, so it need not insert them into subtrans, even if someone else overflowed. When the owning xact overflows, it knows it must now insert the subxid into subtrans before the xid is used anywhere in storage, which the patch does. This allows each owning xact to decide what to do, independent of the actions of others. > Lastly, I don't see what the "transaction on same page" business > has got to do with anything. The comment is certainly failing > to make the case that it's safe to skip filling subtrans when that > is true. That seems strange, I grant you. It's the same logic that is used in TransactionIdSetTreeStatus(), in reverse. I understand it 'cos I wrote it. TRANSACTION_STATUS_SUB_COMMITTED is only ever used if the topxid and subxid are on different pages. Therefore TransactionIdDidCommit() won't ever see a value of TRANSACTION_STATUS_SUB_COMMITTED unless they are on separate pages. So the owning transaction can predict in advance whether anyone will ever call SubTransGetParent() for one of its xids. If they might, then we record the values just in case. If they NEVER will, then we can skip recording them. And just to be clear, all 3 of the above preconditions must be true before the owning xact decides to skip writing a subxid to subtrans. > I think we could salvage this small idea: > > + * Insert entries into subtrans for this xid, noting that the entry > + * points directly to the topxid, not the immediate parent. This is > + * done for two reasons: > + * (1) so it is faster in a long chain of subxids, because the > + * algorithm is then O(1), no matter how many subxids are assigned. > > but some work would be needed to update the comments around > SubTransGetParent and SubTransGetTopmostTransaction to explain that > they're no longer reliably different. I think that that is okay for > the existing use-cases, but they'd better be documented. In fact, > couldn't we simplify them down to one function? Given the restriction > that we don't look back in pg_subtrans further than TransactionXmin, > I don't think that updated code would ever need to resolve cases > written by older code. So we could remove the recursive checks > entirely, or at least be confident that they don't recurse more > than once. Happy to do so, I'd left it that way to soften the blow of the absorbing the earlier thoughts. (Since we know all subxids point directly to parent we know we only ever need to do one lookup). I know that if there is a flaw in the above logic then you will find it. Happy to make any comments changes needed to record the above thoughts more permanently. I tried, but clearly didn't get everything down clearly. Thanks for your detailed thoughts. -- Simon Riggs http://www.EnterpriseDB.com/
Simon Riggs <simon.riggs@enterprisedb.com> writes: > On Tue, 15 Nov 2022 at 21:03, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> The subxidStatus.overflowed check quoted above has a similar sort >> of myopia: it's checking whether our current transaction has >> already suboverflowed. But (a) that doesn't prove it won't suboverflow >> later, and (b) the relevant logic in XidInMVCCSnapshot needs to run >> SubTransGetTopmostTransaction if *any* proc in the snapshot has >> suboverflowed. > Not the way it is coded now. > First, we search the subxid cache in snapshot->subxip. > Then, and only if the snapshot overflowed (i.e. ANY xact overflowed), > do we check subtrans. No, that's not what XidInMVCCSnapshot does. If snapshot->suboverflowed is set (ie, somebody somewhere/somewhen overflowed), then it does SubTransGetTopmostTransaction and searches only the xips with the result. This behavior requires that all live subxids be correctly mapped by SubTransGetTopmostTransaction, or we'll draw false conclusions. We could perhaps make it do what you suggest, but that would require a complete performance analysis to make sure we're not giving up more than we would gain. Also, both GetSnapshotData and CopySnapshot assume that the subxips array is not used if suboverflowed is set, and don't bother (continuing to) populate it. So we would need code changes and additional cycles in those areas too. I'm not sure about your other claims, but I'm pretty sure this one point is enough to kill the patch. regards, tom lane
On Wed, 16 Nov 2022 at 00:09, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Simon Riggs <simon.riggs@enterprisedb.com> writes: > > On Tue, 15 Nov 2022 at 21:03, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> The subxidStatus.overflowed check quoted above has a similar sort > >> of myopia: it's checking whether our current transaction has > >> already suboverflowed. But (a) that doesn't prove it won't suboverflow > >> later, and (b) the relevant logic in XidInMVCCSnapshot needs to run > >> SubTransGetTopmostTransaction if *any* proc in the snapshot has > >> suboverflowed. > > > Not the way it is coded now. > > > First, we search the subxid cache in snapshot->subxip. > > Then, and only if the snapshot overflowed (i.e. ANY xact overflowed), > > do we check subtrans. > > No, that's not what XidInMVCCSnapshot does. If snapshot->suboverflowed > is set (ie, somebody somewhere/somewhen overflowed), then it does > SubTransGetTopmostTransaction and searches only the xips with the result. > This behavior requires that all live subxids be correctly mapped by > SubTransGetTopmostTransaction, or we'll draw false conclusions. Your comments are correct wrt to the existing coding, but not to the patch, which is coded as described and does not suffer those issues. > We could perhaps make it do what you suggest, Already done in the patch since v5. > but that would require > a complete performance analysis to make sure we're not giving up > more than we would gain. I agree that a full performance analysis is sensible and an objective analysis has been performed by Julien Tachoires. This is described here, along with other explanations: https://docs.google.com/presentation/d/1A7Ar8_LM5EdC2OHL_j3U9J-QwjMiGw9mmXeBLJOmFlg/edit?usp=sharing It is important to understand the context here: there is already a well documented LOSS of performance with the current coding. The patch alleviates that, and I have not been able to find a performance case where there is any negative impact. Further tests welcome. > Also, both GetSnapshotData and CopySnapshot assume that the subxips > array is not used if suboverflowed is set, and don't bother > (continuing to) populate it. So we would need code changes and > additional cycles in those areas too. Already done in the patch since v5. Any additional cycles apply only to the case of snapshot overflow, which currently performs very badly. > I'm not sure about your other claims, but I'm pretty sure this one > point is enough to kill the patch. Then please look again because there are misunderstandings above. -- Simon Riggs http://www.EnterpriseDB.com/
On Wed, Nov 16, 2022 at 8:41 AM Simon Riggs <simon.riggs@enterprisedb.com> wrote: > > > No, that's not what XidInMVCCSnapshot does. If snapshot->suboverflowed > > is set (ie, somebody somewhere/somewhen overflowed), then it does > > SubTransGetTopmostTransaction and searches only the xips with the result. > > This behavior requires that all live subxids be correctly mapped by > > SubTransGetTopmostTransaction, or we'll draw false conclusions. > > Your comments are correct wrt to the existing coding, but not to the > patch, which is coded as described and does not suffer those issues. > This will work because of these two changes in patch 1) even though the snapshot is marked "overflow" we will include all the subtransactions information in snapshot->subxip. 2) As Simon mentioned in XidInMVCCSnapshot(), first, we search the subxip cache in snapshot->subxip, and only if it is not found in that we will look into the SLRU. So now because of 1) we will always find any concurrent subtransaction in "snapshot->subxip". -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Simon Riggs <simon.riggs@enterprisedb.com> writes: > On Wed, 16 Nov 2022 at 00:09, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> No, that's not what XidInMVCCSnapshot does. If snapshot->suboverflowed >> is set (ie, somebody somewhere/somewhen overflowed), then it does >> SubTransGetTopmostTransaction and searches only the xips with the result. >> This behavior requires that all live subxids be correctly mapped by >> SubTransGetTopmostTransaction, or we'll draw false conclusions. > Your comments are correct wrt to the existing coding, but not to the > patch, which is coded as described and does not suffer those issues. Ah, OK. Still ... I really do not like this patch. It introduces a number of extremely fragile assumptions, and I think those would come back to bite us someday, even if they hold now which I'm still unsure about. It doesn't help that you've chosen to document them only at the place making them and not at the place(s) likely to break them. Also, to be blunt, this is not Ready For Committer. It's more WIP, because even if the code is okay there are comments all over the system that you've invalidated. (At the very least, the file header comments in subtrans.c and the comments in struct SnapshotData need work; I've not looked hard but I'm sure there are more places with comments bearing on these data structures.) Perhaps it would be a good idea to split up the patch. The business about making pg_subtrans flat rather than a tree seems like a good idea in any event, although as I said it doesn't seem like we've got a fleshed-out version of that here. We could push forward on getting that done and then separately consider the rest of it. regards, tom lane
On Wed, 16 Nov 2022 at 15:44, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Simon Riggs <simon.riggs@enterprisedb.com> writes: > > On Wed, 16 Nov 2022 at 00:09, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> No, that's not what XidInMVCCSnapshot does. If snapshot->suboverflowed > >> is set (ie, somebody somewhere/somewhen overflowed), then it does > >> SubTransGetTopmostTransaction and searches only the xips with the result. > >> This behavior requires that all live subxids be correctly mapped by > >> SubTransGetTopmostTransaction, or we'll draw false conclusions. > > > Your comments are correct wrt to the existing coding, but not to the > > patch, which is coded as described and does not suffer those issues. > > Ah, OK. > > Still ... I really do not like this patch. It introduces a number of > extremely fragile assumptions, and I think those would come back to > bite us someday, even if they hold now which I'm still unsure about. Completely understand. It took me months to think this through. > It doesn't help that you've chosen to document them only at the place > making them and not at the place(s) likely to break them. Yes, apologies for that, I focused on the holistic explanation in the slides. > Also, to be blunt, this is not Ready For Committer. It's more WIP, > because even if the code is okay there are comments all over the system > that you've invalidated. (At the very least, the file header comments > in subtrans.c and the comments in struct SnapshotData need work; I've > not looked hard but I'm sure there are more places with comments > bearing on these data structures.) New version with greatly improved comments coming very soon. > Perhaps it would be a good idea to split up the patch. The business > about making pg_subtrans flat rather than a tree seems like a good > idea in any event, although as I said it doesn't seem like we've got > a fleshed-out version of that here. We could push forward on getting > that done and then separately consider the rest of it. Yes, I thought you might ask that so, after some thought, have found a clean way to do that and have split this into two parts. Julien has agreed to do further perf tests and is working on that now. I will post new versions soon, earliest tomorrow. -- Simon Riggs http://www.EnterpriseDB.com/
On Thu, 17 Nov 2022 at 17:04, Simon Riggs <simon.riggs@enterprisedb.com> wrote: > > New version with greatly improved comments coming very soon. > > Perhaps it would be a good idea to split up the patch. The business > > about making pg_subtrans flat rather than a tree seems like a good > > idea in any event, although as I said it doesn't seem like we've got > > a fleshed-out version of that here. We could push forward on getting > > that done and then separately consider the rest of it. > > Yes, I thought you might ask that so, after some thought, have found a > clean way to do that and have split this into two parts. Attached. 002 includes many comment revisions, as well as flattening the loops in SubTransGetTopmostTransaction and TransactionIdDidCommit/Abort 003 includes the idea to not-always do SubTransSetParent() > Julien has agreed to do further perf tests and is working on that now. > > I will post new versions soon, earliest tomorrow. Julien's results show that 002 patch on its own is probably all we need, but I'm posting 003 also in case that situation changes based on other later results with different test cases. Detailed numbers shown here, plus graph derived from them - thanks Julien! nsubxacts HEAD (3d0c95) patched 002-v13 patched 002+003-v13 0 434161 436778 437287 8 432619 434718 435381 16 432856 434710 435092 24 429954 431835 431974 32 434643 436134 436793 40 433939 436121 435622 48 434503 434368 435662 56 432965 434229 436182 64 433672 433951 436192 72 93555 431626 433551 80 66642 431421 434305 88 61349 432776 433664 96 55892 432306 434212 104 52270 432571 434133 112 49166 433655 434754 120 46477 432817 434104 128 43226 432258 432611 (yes, the last line shows x10 performance patched, that is not a typo) -- Simon Riggs http://www.EnterpriseDB.com/
Attachment
On 11/17/22 18:29, Simon Riggs wrote: > On Thu, 17 Nov 2022 at 17:04, Simon Riggs <simon.riggs@enterprisedb.com> wrote: >> >> New version with greatly improved comments coming very soon. > >>> Perhaps it would be a good idea to split up the patch. The business >>> about making pg_subtrans flat rather than a tree seems like a good >>> idea in any event, although as I said it doesn't seem like we've got >>> a fleshed-out version of that here. We could push forward on getting >>> that done and then separately consider the rest of it. >> >> Yes, I thought you might ask that so, after some thought, have found a >> clean way to do that and have split this into two parts. > > Attached. > > 002 includes many comment revisions, as well as flattening the loops > in SubTransGetTopmostTransaction and TransactionIdDidCommit/Abort > 003 includes the idea to not-always do SubTransSetParent() > I'm a bit confused by the TransactionIdsAreOnSameXactPage naming. Isn't this really checking clog pages? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, 17 Nov 2022 at 20:29, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > On 11/17/22 18:29, Simon Riggs wrote: > > On Thu, 17 Nov 2022 at 17:04, Simon Riggs <simon.riggs@enterprisedb.com> wrote: > >> > > 003 includes the idea to not-always do SubTransSetParent() > > > I'm a bit confused by the TransactionIdsAreOnSameXactPage naming. Isn't > this really checking clog pages? Yes, clog page. I named it to match the new name of pg_xact -- Simon Riggs http://www.EnterpriseDB.com/
On Thu, 17 Nov 2022 at 17:29, Simon Riggs <simon.riggs@enterprisedb.com> wrote: > (yes, the last line shows x10 performance patched, that is not a typo) New version of patch, now just a one-line patch! Results show it's still a good win for XidInMVCCSnapshot(). -- Simon Riggs http://www.EnterpriseDB.com/
Attachment
On 11/29/22 13:49, Simon Riggs wrote: > On Thu, 17 Nov 2022 at 17:29, Simon Riggs <simon.riggs@enterprisedb.com> wrote: > >> (yes, the last line shows x10 performance patched, that is not a typo) > > New version of patch, now just a one-line patch! > > Results show it's still a good win for XidInMVCCSnapshot(). > I'm a bit confused - for which workload/benchmark are there results? It's generally a good idea to share the scripts used to run the test and not just a chart. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi Tomas, Le mar. 29 nov. 2022 à 14:06, Tomas Vondra <tomas.vondra@enterprisedb.com> a écrit : > > On 11/29/22 13:49, Simon Riggs wrote: > > On Thu, 17 Nov 2022 at 17:29, Simon Riggs <simon.riggs@enterprisedb.com> wrote: > > > >> (yes, the last line shows x10 performance patched, that is not a typo) > > > > New version of patch, now just a one-line patch! > > > > Results show it's still a good win for XidInMVCCSnapshot(). > > > > I'm a bit confused - for which workload/benchmark are there results? > It's generally a good idea to share the scripts used to run the test and > not just a chart. The scripts have been attached to this thread with the initial performance results. Anyway, re-sending those (including a minor fix). -- Julien Tachoires EDB
Attachment
Simon Riggs <simon.riggs@enterprisedb.com> writes: > New version of patch, now just a one-line patch! Of course, there's all the documentation and comments that you falsified. Also, what of the SubTransSetParent call in ProcessTwoPhaseBuffer? (The one in ProcArrayApplyXidAssignment is actually okay, though the comment making excuses for it no longer is.) Also, if we're going to go over to a one-level structure in pg_subtrans, we really ought to simplify the code in subtrans.c accordingly, and get rid of the extra lookup currently done for the top parent's parent. I still wonder whether we'll regret losing information about the subtransaction tree structure, as discussed in the other thread [1]. That seems like the main barrier to proceeding with this. regards, tom lane [1] https://www.postgresql.org/message-id/flat/CANbhV-HYfP0ebZRERkpt84ZCDsNX-UYJGYsjfS88jtbYzY%2BKcQ%40mail.gmail.com
Hi, On 2022-11-29 13:30:02 -0500, Tom Lane wrote: > I still wonder whether we'll regret losing information about the > subtransaction tree structure, as discussed in the other thread [1]. > That seems like the main barrier to proceeding with this. Yea, this has me worried. I suspect that we have a bunch of places relying on this. I'm not at all convinced that optimizing XidInMVCCSnapshot() is a good reason for this structural change, given that there's other possible ways to optimize (e.g. my proposal to add overflowed subxids to the Snapshot during lookup when there's space). Greetings, Andres Freund
On Tue, Nov 29, 2022 at 1:35 PM Andres Freund <andres@anarazel.de> wrote: > On 2022-11-29 13:30:02 -0500, Tom Lane wrote: > > I still wonder whether we'll regret losing information about the > > subtransaction tree structure, as discussed in the other thread [1]. > > That seems like the main barrier to proceeding with this. > > Yea, this has me worried. Me, too. -- Robert Haas EDB: http://www.enterprisedb.com