Thread: Double partition lock in bufmgr

Double partition lock in bufmgr

From
Konstantin Knizhnik
Date:
Hi hackers,

I am investigating incident with one of out customers: performance of 
the system isdropped dramatically.
Stack traces of all backends can be found here: 
http://www.garret.ru/diag_20201217_102056.stacks_59644
(this file is 6Mb so I have not attached it to this mail).

What I have see in this stack traces is that 642 backends and blocked in 
LWLockAcquire,
mostly in obtaining shared buffer lock:

#0  0x00007f0e7fe7a087 in semop () from /lib64/libc.so.6
#1  0x0000000000682fb1 in PGSemaphoreLock 
(sema=sema@entry=0x7f0e1c1f63a0) at pg_sema.c:387
#2  0x00000000006ed60b in LWLockAcquire (lock=lock@entry=0x7e8b6176d800, 
mode=mode@entry=LW_SHARED) at lwlock.c:1338
#3  0x00000000006c88a7 in BufferAlloc (foundPtr=0x7ffcc3c8de9b "\001", 
strategy=0x0, blockNum=997, forkNum=MAIN_FORKNUM, relpersistence=112 
'p', smgr=0x2fb2df8) at bufmgr.c:1177
#4  ReadBuffer_common (smgr=0x2fb2df8, relpersistence=<optimized out>, 
relkind=<optimized out>, forkNum=forkNum@entry=MAIN_FORKNUM, 
blockNum=blockNum@entry=997, mode=RBM_NORMAL, strategy=0x0, 
hit=hit@entry=0x7ffcc3c8df97 "") at bufmgr.c:894
#5  0x00000000006c928b in ReadBufferExtended (reln=0x32c7ed0, 
forkNum=forkNum@entry=MAIN_FORKNUM, blockNum=997, 
mode=mode@entry=RBM_NORMAL, strategy=strategy@entry=0x0) at bufmgr.c:753
#6  0x00000000006c93ab in ReadBuffer (blockNum=<optimized out>, 
reln=<optimized out>) at bufmgr.c:685
...

Only 11 locks from this 642 are unique.
Moreover: 358 backends are waiting for one lock and 183 - for another.

There are two backends (pids 291121 and 285927) which are trying to 
obtain exclusive lock while already holding another exclusive lock.
And them block all other backends.

This is single place in bufmgr (and in postgres) where process tries to 
lock two buffers:

         /*
          * To change the association of a valid buffer, we'll need to have
          * exclusive lock on both the old and new mapping partitions.
          */
         if (oldFlags & BM_TAG_VALID)
         {
             ...
             /*
              * Must lock the lower-numbered partition first to avoid
              * deadlocks.
              */
             if (oldPartitionLock < newPartitionLock)
             {
                 LWLockAcquire(oldPartitionLock, LW_EXCLUSIVE);
                 LWLockAcquire(newPartitionLock, LW_EXCLUSIVE);
             }
             else if (oldPartitionLock > newPartitionLock)
             {
                 LWLockAcquire(newPartitionLock, LW_EXCLUSIVE);
                 LWLockAcquire(oldPartitionLock, LW_EXCLUSIVE);
             }

This two backends are blocked in the second lock request.
I read all connects in bufmgr.c and README file but didn't find 
explanation why do we need to lock both partitions.
Why it is not possible first free old buffer (as it is done in 
InvalidateBuffer) and then repeat attempt to allocate the buffer?

Yes, it may require more efforts than just "gabbing" the buffer.
But in this case there is no need to keep two locks.

I wonder if somebody in the past  faced with the similar symptoms and 
was this problem with holding locks of two partitions in bufmgr already 
discussed?

P.S.
The customer is using 9.6 version of Postgres, but I have checked that 
the same code fragment is present in the master.

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Double partition lock in bufmgr

From
Zhihong Yu
Date:
Hi,
w.r.t. the code in BufferAlloc(), the pointers are compared.

Should we instead compare the tranche Id of the two LWLock ?

Cheers

Re: Double partition lock in bufmgr

From
Konstantin Knizhnik
Date:

On 19.12.2020 10:53, Zhihong Yu wrote:
> Hi,
> w.r.t. the code in BufferAlloc(), the pointers are compared.
>
> Should we instead compare the tranche Id of the two LWLock ?
>
> Cheers

As far as LWlocks are stored in the array, comparing indexes in this 
array (tranche Id) is equivalent to comparing element's pointers.
So I do not see any problem here.

Just as experiment I tried a version of BufferAlloc without double 
locking (patch is attached).
I am not absolutely sure that my patch is correct: my main intention was 
to estimate influence of this buffer reassignment on performance.
I just run standard pgbench for database with scale 100 and default 
shared buffers size (256Mb). So there are should be a lot of page 
replacements.
I do not see any noticeable difference:

vanilla: 13087.596845
patch:   13184.442130


Attachment

Re: Double partition lock in bufmgr

From
Masahiko Sawada
Date:
Hi Konstantin,

On Sat, Dec 19, 2020 at 9:50 PM Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:
>
>
>
> On 19.12.2020 10:53, Zhihong Yu wrote:
> > Hi,
> > w.r.t. the code in BufferAlloc(), the pointers are compared.
> >
> > Should we instead compare the tranche Id of the two LWLock ?
> >
> > Cheers
>
> As far as LWlocks are stored in the array, comparing indexes in this
> array (tranche Id) is equivalent to comparing element's pointers.
> So I do not see any problem here.
>
> Just as experiment I tried a version of BufferAlloc without double
> locking (patch is attached).
> I am not absolutely sure that my patch is correct: my main intention was
> to estimate influence of this buffer reassignment on performance.
> I just run standard pgbench for database with scale 100 and default
> shared buffers size (256Mb). So there are should be a lot of page
> replacements.
> I do not see any noticeable difference:
>
> vanilla: 13087.596845
> patch:   13184.442130
>

You sent in your patch, bufmgr.patch to pgsql-hackers on Dec 19, but
you did not post it to the next CommitFest[1].  If this was
intentional, then you need to take no action.  However, if you want
your patch to be reviewed as part of the upcoming CommitFest, then you
need to add it yourself before 2021-01-01 AoE[2]. Thanks for your
contributions.

Regards,

[1] https://commitfest.postgresql.org/31/
[2] https://en.wikipedia.org/wiki/Anywhere_on_Earth


-- 
Masahiko Sawada
EnterpriseDB:  https://www.enterprisedb.com/



Re: Double partition lock in bufmgr

From
Yura Sokolov
Date:
В Пт, 18/12/2020 в 15:20 +0300, Konstantin Knizhnik пишет:
> Hi hackers,
> 
> I am investigating incident with one of out customers: performance of 
> the system isdropped dramatically.
> Stack traces of all backends can be found here: 
> http://www.garret.ru/diag_20201217_102056.stacks_59644
> (this file is 6Mb so I have not attached it to this mail).
> 
> What I have see in this stack traces is that 642 backends and blocked
> in 
> LWLockAcquire,
> mostly in obtaining shared buffer lock:
> 
> #0  0x00007f0e7fe7a087 in semop () from /lib64/libc.so.6
> #1  0x0000000000682fb1 in PGSemaphoreLock 
> (sema=sema@entry=0x7f0e1c1f63a0) at pg_sema.c:387
> #2  0x00000000006ed60b in LWLockAcquire (lock=lock@entry=0x7e8b6176d80
> 0, 
> mode=mode@entry=LW_SHARED) at lwlock.c:1338
> #3  0x00000000006c88a7 in BufferAlloc (foundPtr=0x7ffcc3c8de9b
> "\001", 
> strategy=0x0, blockNum=997, forkNum=MAIN_FORKNUM, relpersistence=112 
> 'p', smgr=0x2fb2df8) at bufmgr.c:1177
> #4  ReadBuffer_common (smgr=0x2fb2df8, relpersistence=<optimized
> out>, 
> relkind=<optimized out>, forkNum=forkNum@entry=MAIN_FORKNUM, 
> blockNum=blockNum@entry=997, mode=RBM_NORMAL, strategy=0x0, 
> hit=hit@entry=0x7ffcc3c8df97 "") at bufmgr.c:894
> #5  0x00000000006c928b in ReadBufferExtended (reln=0x32c7ed0, 
> forkNum=forkNum@entry=MAIN_FORKNUM, blockNum=997, 
> mode=mode@entry=RBM_NORMAL, strategy=strategy@entry=0x0) at
> bufmgr.c:753
> #6  0x00000000006c93ab in ReadBuffer (blockNum=<optimized out>, 
> reln=<optimized out>) at bufmgr.c:685
> ...
> 
> Only 11 locks from this 642 are unique.
> Moreover: 358 backends are waiting for one lock and 183 - for another.
> 
> There are two backends (pids 291121 and 285927) which are trying to 
> obtain exclusive lock while already holding another exclusive lock.
> And them block all other backends.
> 
> This is single place in bufmgr (and in postgres) where process tries
> to 
> lock two buffers:
> 
>          /*
>           * To change the association of a valid buffer, we'll need to
> have
>           * exclusive lock on both the old and new mapping partitions.
>           */
>          if (oldFlags & BM_TAG_VALID)
>          {
>              ...
>              /*
>               * Must lock the lower-numbered partition first to avoid
>               * deadlocks.
>               */
>              if (oldPartitionLock < newPartitionLock)
>              {
>                  LWLockAcquire(oldPartitionLock, LW_EXCLUSIVE);
>                  LWLockAcquire(newPartitionLock, LW_EXCLUSIVE);
>              }
>              else if (oldPartitionLock > newPartitionLock)
>              {
>                  LWLockAcquire(newPartitionLock, LW_EXCLUSIVE);
>                  LWLockAcquire(oldPartitionLock, LW_EXCLUSIVE);
>              }
> 
> This two backends are blocked in the second lock request.
> I read all connects in bufmgr.c and README file but didn't find 
> explanation why do we need to lock both partitions.
> Why it is not possible first free old buffer (as it is done in 
> InvalidateBuffer) and then repeat attempt to allocate the buffer?
> 
> Yes, it may require more efforts than just "gabbing" the buffer.
> But in this case there is no need to keep two locks.
> 
> I wonder if somebody in the past  faced with the similar symptoms and 
> was this problem with holding locks of two partitions in bufmgr
> already 
> discussed?

Looks like there is no real need for this double lock. And the change to
consequitive lock acquisition really provides scalability gain:
https://bit.ly/3AytNoN

regards
Sokolov Yura
y.sokolov@postgrespro.ru
funny.falcon@gmail.com