Thread: Double partition lock in bufmgr
Hi hackers, I am investigating incident with one of out customers: performance of the system isdropped dramatically. Stack traces of all backends can be found here: http://www.garret.ru/diag_20201217_102056.stacks_59644 (this file is 6Mb so I have not attached it to this mail). What I have see in this stack traces is that 642 backends and blocked in LWLockAcquire, mostly in obtaining shared buffer lock: #0 0x00007f0e7fe7a087 in semop () from /lib64/libc.so.6 #1 0x0000000000682fb1 in PGSemaphoreLock (sema=sema@entry=0x7f0e1c1f63a0) at pg_sema.c:387 #2 0x00000000006ed60b in LWLockAcquire (lock=lock@entry=0x7e8b6176d800, mode=mode@entry=LW_SHARED) at lwlock.c:1338 #3 0x00000000006c88a7 in BufferAlloc (foundPtr=0x7ffcc3c8de9b "\001", strategy=0x0, blockNum=997, forkNum=MAIN_FORKNUM, relpersistence=112 'p', smgr=0x2fb2df8) at bufmgr.c:1177 #4 ReadBuffer_common (smgr=0x2fb2df8, relpersistence=<optimized out>, relkind=<optimized out>, forkNum=forkNum@entry=MAIN_FORKNUM, blockNum=blockNum@entry=997, mode=RBM_NORMAL, strategy=0x0, hit=hit@entry=0x7ffcc3c8df97 "") at bufmgr.c:894 #5 0x00000000006c928b in ReadBufferExtended (reln=0x32c7ed0, forkNum=forkNum@entry=MAIN_FORKNUM, blockNum=997, mode=mode@entry=RBM_NORMAL, strategy=strategy@entry=0x0) at bufmgr.c:753 #6 0x00000000006c93ab in ReadBuffer (blockNum=<optimized out>, reln=<optimized out>) at bufmgr.c:685 ... Only 11 locks from this 642 are unique. Moreover: 358 backends are waiting for one lock and 183 - for another. There are two backends (pids 291121 and 285927) which are trying to obtain exclusive lock while already holding another exclusive lock. And them block all other backends. This is single place in bufmgr (and in postgres) where process tries to lock two buffers: /* * To change the association of a valid buffer, we'll need to have * exclusive lock on both the old and new mapping partitions. */ if (oldFlags & BM_TAG_VALID) { ... /* * Must lock the lower-numbered partition first to avoid * deadlocks. */ if (oldPartitionLock < newPartitionLock) { LWLockAcquire(oldPartitionLock, LW_EXCLUSIVE); LWLockAcquire(newPartitionLock, LW_EXCLUSIVE); } else if (oldPartitionLock > newPartitionLock) { LWLockAcquire(newPartitionLock, LW_EXCLUSIVE); LWLockAcquire(oldPartitionLock, LW_EXCLUSIVE); } This two backends are blocked in the second lock request. I read all connects in bufmgr.c and README file but didn't find explanation why do we need to lock both partitions. Why it is not possible first free old buffer (as it is done in InvalidateBuffer) and then repeat attempt to allocate the buffer? Yes, it may require more efforts than just "gabbing" the buffer. But in this case there is no need to keep two locks. I wonder if somebody in the past faced with the similar symptoms and was this problem with holding locks of two partitions in bufmgr already discussed? P.S. The customer is using 9.6 version of Postgres, but I have checked that the same code fragment is present in the master. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Hi,
w.r.t. the code in BufferAlloc(), the pointers are compared.
Should we instead compare the tranche Id of the two LWLock ?
Cheers
On 19.12.2020 10:53, Zhihong Yu wrote: > Hi, > w.r.t. the code in BufferAlloc(), the pointers are compared. > > Should we instead compare the tranche Id of the two LWLock ? > > Cheers As far as LWlocks are stored in the array, comparing indexes in this array (tranche Id) is equivalent to comparing element's pointers. So I do not see any problem here. Just as experiment I tried a version of BufferAlloc without double locking (patch is attached). I am not absolutely sure that my patch is correct: my main intention was to estimate influence of this buffer reassignment on performance. I just run standard pgbench for database with scale 100 and default shared buffers size (256Mb). So there are should be a lot of page replacements. I do not see any noticeable difference: vanilla: 13087.596845 patch: 13184.442130
Attachment
Hi Konstantin, On Sat, Dec 19, 2020 at 9:50 PM Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote: > > > > On 19.12.2020 10:53, Zhihong Yu wrote: > > Hi, > > w.r.t. the code in BufferAlloc(), the pointers are compared. > > > > Should we instead compare the tranche Id of the two LWLock ? > > > > Cheers > > As far as LWlocks are stored in the array, comparing indexes in this > array (tranche Id) is equivalent to comparing element's pointers. > So I do not see any problem here. > > Just as experiment I tried a version of BufferAlloc without double > locking (patch is attached). > I am not absolutely sure that my patch is correct: my main intention was > to estimate influence of this buffer reassignment on performance. > I just run standard pgbench for database with scale 100 and default > shared buffers size (256Mb). So there are should be a lot of page > replacements. > I do not see any noticeable difference: > > vanilla: 13087.596845 > patch: 13184.442130 > You sent in your patch, bufmgr.patch to pgsql-hackers on Dec 19, but you did not post it to the next CommitFest[1]. If this was intentional, then you need to take no action. However, if you want your patch to be reviewed as part of the upcoming CommitFest, then you need to add it yourself before 2021-01-01 AoE[2]. Thanks for your contributions. Regards, [1] https://commitfest.postgresql.org/31/ [2] https://en.wikipedia.org/wiki/Anywhere_on_Earth -- Masahiko Sawada EnterpriseDB: https://www.enterprisedb.com/
В Пт, 18/12/2020 в 15:20 +0300, Konstantin Knizhnik пишет: > Hi hackers, > > I am investigating incident with one of out customers: performance of > the system isdropped dramatically. > Stack traces of all backends can be found here: > http://www.garret.ru/diag_20201217_102056.stacks_59644 > (this file is 6Mb so I have not attached it to this mail). > > What I have see in this stack traces is that 642 backends and blocked > in > LWLockAcquire, > mostly in obtaining shared buffer lock: > > #0 0x00007f0e7fe7a087 in semop () from /lib64/libc.so.6 > #1 0x0000000000682fb1 in PGSemaphoreLock > (sema=sema@entry=0x7f0e1c1f63a0) at pg_sema.c:387 > #2 0x00000000006ed60b in LWLockAcquire (lock=lock@entry=0x7e8b6176d80 > 0, > mode=mode@entry=LW_SHARED) at lwlock.c:1338 > #3 0x00000000006c88a7 in BufferAlloc (foundPtr=0x7ffcc3c8de9b > "\001", > strategy=0x0, blockNum=997, forkNum=MAIN_FORKNUM, relpersistence=112 > 'p', smgr=0x2fb2df8) at bufmgr.c:1177 > #4 ReadBuffer_common (smgr=0x2fb2df8, relpersistence=<optimized > out>, > relkind=<optimized out>, forkNum=forkNum@entry=MAIN_FORKNUM, > blockNum=blockNum@entry=997, mode=RBM_NORMAL, strategy=0x0, > hit=hit@entry=0x7ffcc3c8df97 "") at bufmgr.c:894 > #5 0x00000000006c928b in ReadBufferExtended (reln=0x32c7ed0, > forkNum=forkNum@entry=MAIN_FORKNUM, blockNum=997, > mode=mode@entry=RBM_NORMAL, strategy=strategy@entry=0x0) at > bufmgr.c:753 > #6 0x00000000006c93ab in ReadBuffer (blockNum=<optimized out>, > reln=<optimized out>) at bufmgr.c:685 > ... > > Only 11 locks from this 642 are unique. > Moreover: 358 backends are waiting for one lock and 183 - for another. > > There are two backends (pids 291121 and 285927) which are trying to > obtain exclusive lock while already holding another exclusive lock. > And them block all other backends. > > This is single place in bufmgr (and in postgres) where process tries > to > lock two buffers: > > /* > * To change the association of a valid buffer, we'll need to > have > * exclusive lock on both the old and new mapping partitions. > */ > if (oldFlags & BM_TAG_VALID) > { > ... > /* > * Must lock the lower-numbered partition first to avoid > * deadlocks. > */ > if (oldPartitionLock < newPartitionLock) > { > LWLockAcquire(oldPartitionLock, LW_EXCLUSIVE); > LWLockAcquire(newPartitionLock, LW_EXCLUSIVE); > } > else if (oldPartitionLock > newPartitionLock) > { > LWLockAcquire(newPartitionLock, LW_EXCLUSIVE); > LWLockAcquire(oldPartitionLock, LW_EXCLUSIVE); > } > > This two backends are blocked in the second lock request. > I read all connects in bufmgr.c and README file but didn't find > explanation why do we need to lock both partitions. > Why it is not possible first free old buffer (as it is done in > InvalidateBuffer) and then repeat attempt to allocate the buffer? > > Yes, it may require more efforts than just "gabbing" the buffer. > But in this case there is no need to keep two locks. > > I wonder if somebody in the past faced with the similar symptoms and > was this problem with holding locks of two partitions in bufmgr > already > discussed? Looks like there is no real need for this double lock. And the change to consequitive lock acquisition really provides scalability gain: https://bit.ly/3AytNoN regards Sokolov Yura y.sokolov@postgrespro.ru funny.falcon@gmail.com