Thread: Re: pgsql: Improve performance of subsystems on top of SLRU
On 2024-Feb-28, Alvaro Herrera wrote: > Improve performance of subsystems on top of SLRU Coverity had the following complaint about this commit: ________________________________________________________________________________________________________ *** CID NNNNNNN: Control flow issues (DEADCODE) /srv/coverity/git/pgsql-git/postgresql/src/backend/access/transam/multixact.c: 1375 in GetMultiXactIdMembers() 1369 * and acquire the lock of the new bank. 1370 */ 1371 lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno); 1372 if (lock != prevlock) 1373 { 1374 if (prevlock != NULL) >>> CID 1592913: Control flow issues (DEADCODE) >>> Execution cannot reach this statement: "LWLockRelease(prevlock);". 1375 LWLockRelease(prevlock); 1376 LWLockAcquire(lock, LW_EXCLUSIVE); 1377 prevlock = lock; 1378 } 1379 1380 slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, multi); And I think it's correct that this is somewhat bogus, or at least confusing: the only way to have control back here on line 1371 after having executed once is via the "goto retry" line below; and there we release "prevlock" and set it to NULL beforehand, so it's impossible for prevlock to be NULL. Looking closer I think this code is all confused, so I suggest to rework it as shown in the attached patch. I'll have a look at the other places where we use this "prevlock" coding pattern tomorrow. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Attachment
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > And I think it's correct that this is somewhat bogus, or at least > confusing: the only way to have control back here on line 1371 after > having executed once is via the "goto retry" line below; and there we > release "prevlock" and set it to NULL beforehand, so it's impossible for > prevlock to be NULL. Looking closer I think this code is all confused, > so I suggest to rework it as shown in the attached patch. This is certainly simpler, but I notice that it holds the current LWLock across the line ptr = (MultiXactMember *) palloc(length * sizeof(MultiXactMember)); where the old code did not. Could the palloc take long enough that holding the lock is bad? Also, with this coding the "lock = NULL;" assignment just before "goto retry" is a dead store. Not sure if Coverity or other static analyzers would whine about that. regards, tom lane
On Mon, Mar 4, 2024 at 1:56 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > On 2024-Feb-28, Alvaro Herrera wrote: > > > Improve performance of subsystems on top of SLRU > > Coverity had the following complaint about this commit: > > ________________________________________________________________________________________________________ > *** CID NNNNNNN: Control flow issues (DEADCODE) > /srv/coverity/git/pgsql-git/postgresql/src/backend/access/transam/multixact.c: 1375 in GetMultiXactIdMembers() > 1369 * and acquire the lock of the new bank. > 1370 */ > 1371 lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno); > 1372 if (lock != prevlock) > 1373 { > 1374 if (prevlock != NULL) > >>> CID 1592913: Control flow issues (DEADCODE) > >>> Execution cannot reach this statement: "LWLockRelease(prevlock);". > 1375 LWLockRelease(prevlock); > 1376 LWLockAcquire(lock, LW_EXCLUSIVE); > 1377 prevlock = lock; > 1378 } > 1379 > 1380 slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, multi); > > And I think it's correct that this is somewhat bogus, or at least > confusing: the only way to have control back here on line 1371 after > having executed once is via the "goto retry" line below; and there we > release "prevlock" and set it to NULL beforehand, so it's impossible for > prevlock to be NULL. Looking closer I think this code is all confused, > so I suggest to rework it as shown in the attached patch. > > I'll have a look at the other places where we use this "prevlock" coding > pattern tomorrow. + /* Acquire the bank lock for the page we need. */ lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno); - if (lock != prevlock) - { - if (prevlock != NULL) - LWLockRelease(prevlock); - LWLockAcquire(lock, LW_EXCLUSIVE); - prevlock = lock; - } + LWLockAcquire(lock, LW_EXCLUSIVE); This part is definitely an improvement. I am not sure about the other changes, I mean that makes the code much simpler but now we are not releasing the 'MultiXactOffsetCtl' related bank lock, and later in the following loop, we are comparing that lock against 'MultiXactMemberCtl' related bank lock. This makes code simpler because now in the loop we are sure that we are always holding the lock but I do not like comparing the bank locks for 2 different SLRUs, although there is no problem as there would not be a common lock address, anyway, I do not have any strong objection to what you have done here. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On 2024-Mar-03, Tom Lane wrote: > This is certainly simpler, but I notice that it holds the current > LWLock across the line > > ptr = (MultiXactMember *) palloc(length * sizeof(MultiXactMember)); > > where the old code did not. Could the palloc take long enough that > holding the lock is bad? Hmm, I guess most of the time it shouldn't be much of a problem (if the length is small so we can palloc without malloc'ing); but it could be in the worst case. But the fix is simple: just release the lock before. There's no correctness argument for holding it all the way down. I was just confused about how the original code worked. > Also, with this coding the "lock = NULL;" assignment just before > "goto retry" is a dead store. Not sure if Coverity or other static > analyzers would whine about that. Oh, right. I removed that. On 2024-Mar-04, Dilip Kumar wrote: > I am not sure about the other changes, I mean that makes the code much > simpler but now we are not releasing the 'MultiXactOffsetCtl' related > bank lock, and later in the following loop, we are comparing that lock > against 'MultiXactMemberCtl' related bank lock. This makes code > simpler because now in the loop we are sure that we are always holding > the lock but I do not like comparing the bank locks for 2 different > SLRUs, although there is no problem as there would not be a common > lock address, True. This can be addressed in the same way Tom's first comment is: just release the lock before entering the second loop, and setting lock to NULL. This brings the code to a similar state than before, except that the additional LWLock * variables are in a tighter scope. That's in 0001. Now, I had a look at the other users of slru.c and noticed in subtrans.c that StartupSUBTRANS we have some duplicate code that I think could be rewritten by making the "while" block test the condition at the end instead of at the start; changed that in 0002. I'll leave this one for later, because I want to add some test code for it -- right now it's pretty much test-uncovered code. I also looked at slru.c for uses of shared->bank_locks and noticed a couple that could be made simpler. That's 0003 here. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "If it is not right, do not do it. If it is not true, do not say it." (Marcus Aurelius, Meditations)
Attachment
FWIW there's a stupid bug in 0002, which is fixed here. I'm writing a simple test for it. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "Now I have my system running, not a byte was off the shelf; It rarely breaks and when it does I fix the code myself. It's stable, clean and elegant, and lightning fast as well, And it doesn't cost a nickel, so Bill Gates can go to hell."