Bug in wait time when waiting on nested subtransaction - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Bug in wait time when waiting on nested subtransaction |
Date | |
Msg-id | CANbhV-HYfP0ebZRERkpt84ZCDsNX-UYJGYsjfS88jtbYzY+KcQ@mail.gmail.com Whole thread Raw |
Responses |
Re: Bug in wait time when waiting on nested subtransaction
Re: Bug in wait time when waiting on nested subtransaction |
List | pgsql-hackers |
Issue in current XactLockTableWait, starting with 3c27944fb2141, discovered while reviewing https://commitfest.postgresql.org/40/3806/ Test demonstrating the problem is 001-isotest-tuplelock-subxact.v1.patch A narrative description of the issue follows: session1 - requests multiple nested subtransactions like this: BEGIN; ... SAVEPOINT subxid1; ... SAVEPOINT subxid2; ... If another session2 sees an xid from subxid2, it waits. If subxid2 aborts, then session2 sees the abort and can continue processing normally. However, if subxid2 subcommits, then the lock wait moves from subxid2 to the topxid. If subxid1 subsequently aborts, it will also abort subxid2, but session2 waiting for subxid2 to complete doesn't see this and waits for topxid instead. Which means that it waits longer than it should, and later arriving lock waiters may actually get served first. So it's a bug, but not too awful, since in many cases people don't use nested subtransactions, and if they do use SAVEPOINTs, they don't often close them using RELEASE. And in most cases the extra wait time is not very long, hence why nobody ever reported this issue. Thanks to Robert Haas and Julien Tachoires for asking the question "are you sure the existing coding is correct?". You were both right; it is not. How to fix? Correct lock wait can be achieved while a subxid is running if we do either * a lock table entry for the subxid OR * a subtrans entry that points to its immediate parent So we have these options 1. Removing the XactLockTableDelete() call in CommitSubTransaction(). That releases lock waiters earlier than expected, which requires pushups in XactLockTableWait() to cope with that (which are currently broken). Why do we do it? To save shmem space in the lock table should anyone want to run a transaction that contains thousands of subtransactions, or more. So option (1) alone would eventually cause us to run out of space in the lock table and a transaction would receive ERRORs rather than be allowed to run for a long time. 2. In XactLockTableWait(), replace the call to SubTransGetParent(), so we go up the levels one by one as we did before. However, (2) causes huge subtrans contention and if we implemented that and backpatched it the performance issues could be significant. So my feeling is that if we do (2) then we should not backpatch it. So both (1) and (2) have issues. The main result from patch https://commitfest.postgresql.org/40/3806/ is that having subtrans point direct to topxid is very good for performance in XidIsInMVCCSnapshot(), and presumably other places also. This bug prevents the simple application of a patch to improve performance. So now we need a stronger mix of ideas to both resolve the bug and fix the subtrans contention issue in HEAD. My preferred solution would be a mix of the above, call it option (3) 3. a) Include the lock table entry for the first 64 subtransactions only, so that we limit shmem. For those first 64 entries, have the subtrans point direct to top, since this makes a subtrans lookup into an O(1) operation, which is important for performance of later actions. b) For any subtransactions after first 64, delete the subxid lock on subcommit, to save shmem, but make subtrans point to the immediate parent (only), not the topxid. That fixes the bug, but causes performance problems in XidInMVCCSnapshot() and others, so we also do c) and d) c) At top level commit, go back and alter subtrans again for subxids so now it points to the topxid, so that we avoid O(N) behavior in XidInMVCCSnapshot() and other callers. Additional cost for longer transactions, but it saves considerable cost in later callers that need to call GetTopmostTransaction. d) Optimize SubTransGetTopmostTransaction() so it retrieves entries page-at-a-time. This will reduce the contention of repeatedly re-visiting the same page(s) and ensure that a page is less often paged out when we are still using it. Thoughts? -- Simon Riggs http://www.EnterpriseDB.com/
Attachment
pgsql-hackers by date: