Re: SyncRepLock acquired exclusively in default configuration - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: SyncRepLock acquired exclusively in default configuration
Date
Msg-id 4cb789b3-5b8d-7fb3-b458-8321bc7788ff@oss.nttdata.com
Whole thread Raw
In response to Re: SyncRepLock acquired exclusively in default configuration  (Masahiko Sawada <masahiko.sawada@2ndquadrant.com>)
Responses Re: SyncRepLock acquired exclusively in default configuration  (Masahiko Sawada <masahiko.sawada@2ndquadrant.com>)
List pgsql-hackers

On 2020/04/10 14:11, Masahiko Sawada wrote:
> On Fri, 10 Apr 2020 at 13:20, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>
>>
>>
>> On 2020/04/08 3:01, Ashwin Agrawal wrote:
>>>
>>> On Mon, Apr 6, 2020 at 2:14 PM Andres Freund <andres@anarazel.de <mailto:andres@anarazel.de>> wrote:
>>>
>>>       > How about we change it to this ?
>>>
>>>      Hm. Better. But I think it might need at least a compiler barrier /
>>>      volatile memory load?  Unlikely here, but otherwise the compiler could
>>>      theoretically just stash the variable somewhere locally (it's not likely
>>>      to be a problem because it'd not be long ago that we acquired an lwlock,
>>>      which is a full barrier).
>>>
>>>
>>> That's the part, I am not fully sure about. But reading the comment above SyncRepUpdateSyncStandbysDefined(), it
seemsfine.
 
>>>
>>>       > Bring back the check which existed based on GUC but instead of just blindly
>>>       > returning based on just GUC not being set, check
>>>       > WalSndCtl->sync_standbys_defined. Thoughts?
>>>
>>>      Hm. Is there any reason not to just check
>>>      WalSndCtl->sync_standbys_defined? rather than both !SyncStandbysDefined()
>>>      and WalSndCtl->sync_standbys_defined?
>>>
>>>
>>> Agree, just checking for WalSndCtl->sync_standbys_defined seems fine.
>>
>> So the consensus is something like the following? Patch attached.
>>
>>           /*
>> -        * Fast exit if user has not requested sync replication.
>> +        * Fast exit if user has not requested sync replication, or there are no
>> +        * sync replication standby names defined.
>>            */
>> -       if (!SyncRepRequested())
>> +       if (!SyncRepRequested() ||
>> +               !((volatile WalSndCtlData *) WalSndCtl)->sync_standbys_defined)
>>                   return;
>>
> 
> I think we need more comments describing why checking
> sync_standby_defined without SyncRepLock is safe here. For example:

Yep, agreed!

> This routine gets called every commit time. So, to check if the
> synchronous standbys is defined as quick as possible we check
> WalSndCtl->sync_standbys_defined without acquiring SyncRepLock. Since
> we make this test unlocked, there's a change we might fail to notice
> that it has been turned off and continue processing.

Does this really happen? I was thinking that the problem by not taking
the lock here is that SyncRepWaitForLSN() can see that shared flag after
SyncRepUpdateSyncStandbysDefined() wakes up all the waiters and
before it sets the flag to false. Then if SyncRepWaitForLSN() adds  itself
into the wait queue becaues the flag was true, without lock, it may keep
sleeping infinitely.

> But since the
> subsequent check will check it again while holding SyncRepLock, it's
> no problem. Similarly even if we fail to notice that it has been
> turned on
Is this true? ISTM that after SyncRepUpdateSyncStandbysDefined()
sets the flag to true, SyncRepWaitForLSN() basically doesn't seem
to fail to notice that. No?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: Report error position in partition bound check
Next
From: Robert Haas
Date:
Subject: Re: Parallel copy