Thread: SyncRepLock acquired exclusively in default configuration
Hi, Due to the change below, when using the default postgres configuration of ynchronous_commit = on, max_wal_senders = 10, will now acquire a new exclusive lwlock after writing a commit record. commit 48c9f4926562278a2fd2b85e7486c6d11705f177 Author: Simon Riggs <simon@2ndQuadrant.com> Date: 2017-12-29 14:30:33 +0000 Fix race condition when changing synchronous_standby_names A momentary window exists when synchronous_standby_names changes that allows commands issued after the change to continue to act as async until the change becomes visible. Remove the race by using more appropriate test in syncrep.c Author: Asim Rama Praveen and Ashwin Agrawal Reported-by: Xin Zhang, Ashwin Agrawal, and Asim Rama Praveen Reviewed-by: Michael Paquier, Masahiko Sawada As far as I can tell there was no discussion about the added contention due this change in the relevant thread [1]. The default configuration has an empty synchronous_standby_names. Before this change we'd fall out of SyncRepWaitForLSN() before acquiring SyncRepLock in exlusive mode. Now we don't anymore. I'm really not ok with unneccessarily adding an exclusive lock acquisition to such a crucial path. Greetings, Andres Freund [1] https://postgr.es/m/CABrsG8j3kPD%2Bkbbsx_isEpFvAgaOBNGyGpsqSjQ6L8vwVUaZAQ%40mail.gmail.com
On Mon, 6 Apr 2020 at 14:04, Andres Freund <andres@anarazel.de> wrote: > > Hi, > > Due to the change below, when using the default postgres configuration > of ynchronous_commit = on, max_wal_senders = 10, will now acquire a new > exclusive lwlock after writing a commit record. Indeed. > > commit 48c9f4926562278a2fd2b85e7486c6d11705f177 > Author: Simon Riggs <simon@2ndQuadrant.com> > Date: 2017-12-29 14:30:33 +0000 > > Fix race condition when changing synchronous_standby_names > > A momentary window exists when synchronous_standby_names > changes that allows commands issued after the change to > continue to act as async until the change becomes visible. > Remove the race by using more appropriate test in syncrep.c > > Author: Asim Rama Praveen and Ashwin Agrawal > Reported-by: Xin Zhang, Ashwin Agrawal, and Asim Rama Praveen > Reviewed-by: Michael Paquier, Masahiko Sawada > > As far as I can tell there was no discussion about the added contention > due this change in the relevant thread [1]. > > The default configuration has an empty synchronous_standby_names. Before > this change we'd fall out of SyncRepWaitForLSN() before acquiring > SyncRepLock in exlusive mode. Now we don't anymore. > > > I'm really not ok with unneccessarily adding an exclusive lock > acquisition to such a crucial path. > I think we can acquire SyncRepLock in share mode once to check WalSndCtl->sync_standbys_defined and if it's true then check it again after acquiring it in exclusive mode. But it in turn ends up with adding one extra LWLockAcquire and LWLockRelease in sync rep path. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Apr 6, 2020 at 1:52 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote:
On Mon, 6 Apr 2020 at 14:04, Andres Freund <andres@anarazel.de> wrote:
>
> commit 48c9f4926562278a2fd2b85e7486c6d11705f177
> Author: Simon Riggs <simon@2ndQuadrant.com>
> Date: 2017-12-29 14:30:33 +0000
>
> Fix race condition when changing synchronous_standby_names
>
> A momentary window exists when synchronous_standby_names
> changes that allows commands issued after the change to
> continue to act as async until the change becomes visible.
> Remove the race by using more appropriate test in syncrep.c
>
> Author: Asim Rama Praveen and Ashwin Agrawal
> Reported-by: Xin Zhang, Ashwin Agrawal, and Asim Rama Praveen
> Reviewed-by: Michael Paquier, Masahiko Sawada
>
> As far as I can tell there was no discussion about the added contention
> due this change in the relevant thread [1].
>
> The default configuration has an empty synchronous_standby_names. Before
> this change we'd fall out of SyncRepWaitForLSN() before acquiring
> SyncRepLock in exlusive mode. Now we don't anymore.
>
>
> I'm really not ok with unneccessarily adding an exclusive lock
> acquisition to such a crucial path.
>
I think we can acquire SyncRepLock in share mode once to check
WalSndCtl->sync_standbys_defined and if it's true then check it again
after acquiring it in exclusive mode. But it in turn ends up with
adding one extra LWLockAcquire and LWLockRelease in sync rep path.
How about we change it to this ?
diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index ffd5b31eb2..cdb82a8b28 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -165,8 +165,11 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
/*
* Fast exit if user has not requested sync replication.
*/
- if (!SyncRepRequested())
- return;
+ if (!SyncRepRequested() || !SyncStandbysDefined())
+ {
+ if (!WalSndCtl->sync_standbys_defined)
+ return;
+ }
Assert(SHMQueueIsDetached(&(MyProc->syncRepLinks)));
Assert(WalSndCtl != NULL);
index ffd5b31eb2..cdb82a8b28 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -165,8 +165,11 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
/*
* Fast exit if user has not requested sync replication.
*/
- if (!SyncRepRequested())
- return;
+ if (!SyncRepRequested() || !SyncStandbysDefined())
+ {
+ if (!WalSndCtl->sync_standbys_defined)
+ return;
+ }
Assert(SHMQueueIsDetached(&(MyProc->syncRepLinks)));
Assert(WalSndCtl != NULL);
Hi, On 2020-04-06 11:51:06 -0700, Ashwin Agrawal wrote: > On Mon, Apr 6, 2020 at 1:52 AM Masahiko Sawada < > masahiko.sawada@2ndquadrant.com> wrote: > > On Mon, 6 Apr 2020 at 14:04, Andres Freund <andres@anarazel.de> wrote: > > > I'm really not ok with unneccessarily adding an exclusive lock > > > acquisition to such a crucial path. > > > > > > > I think we can acquire SyncRepLock in share mode once to check > > WalSndCtl->sync_standbys_defined and if it's true then check it again > > after acquiring it in exclusive mode. But it in turn ends up with > > adding one extra LWLockAcquire and LWLockRelease in sync rep path. That's still too much. Adding another lwlock acquisition, where the same lock is acquired by all backends (contrasting e.g. to buffer locks), to the commit path, for the benefit of a feature that the vast majority of people aren't going to use, isn't good. > How about we change it to this ? Hm. Better. But I think it might need at least a compiler barrier / volatile memory load? Unlikely here, but otherwise the compiler could theoretically just stash the variable somewhere locally (it's not likely to be a problem because it'd not be long ago that we acquired an lwlock, which is a full barrier). > Bring back the check which existed based on GUC but instead of just blindly > returning based on just GUC not being set, check > WalSndCtl->sync_standbys_defined. Thoughts? Hm. Is there any reason not to just check WalSndCtl->sync_standbys_defined? rather than both !SyncStandbysDefined() and WalSndCtl->sync_standbys_defined? Greetings, Andres Freund
On Tue, 7 Apr 2020 at 06:14, Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2020-04-06 11:51:06 -0700, Ashwin Agrawal wrote: > > On Mon, Apr 6, 2020 at 1:52 AM Masahiko Sawada < > > masahiko.sawada@2ndquadrant.com> wrote: > > > On Mon, 6 Apr 2020 at 14:04, Andres Freund <andres@anarazel.de> wrote: > > > > I'm really not ok with unneccessarily adding an exclusive lock > > > > acquisition to such a crucial path. > > > > > > > > > > I think we can acquire SyncRepLock in share mode once to check > > > WalSndCtl->sync_standbys_defined and if it's true then check it again > > > after acquiring it in exclusive mode. But it in turn ends up with > > > adding one extra LWLockAcquire and LWLockRelease in sync rep path. > > That's still too much. Adding another lwlock acquisition, where the same > lock is acquired by all backends (contrasting e.g. to buffer locks), to > the commit path, for the benefit of a feature that the vast majority of > people aren't going to use, isn't good. Agreed. In this case it seems okay to read WalSndCtl->sync_standbys_defined without SyncRepLock before we acquire SyncRepLock in exclusive mode. While changing WalSndCtl->sync_standbys_defined to true, in the current code a backend who reached SyncRepWaitForLSN() waits on SyncRepLock, see sync_standbys_defined is true and enqueue itself. With this change, since we don't acquire SyncRepLock to read WalSndCtl->sync_standbys_defined these backends return without waiting for the change of WalSndCtl->sync_standbys_defined but it would not be a problem. Similarly, I've considered the case where changing to false, but I think there is no problem. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Apr 6, 2020 at 2:14 PM Andres Freund <andres@anarazel.de> wrote:
> How about we change it to this ?
Hm. Better. But I think it might need at least a compiler barrier /
volatile memory load? Unlikely here, but otherwise the compiler could
theoretically just stash the variable somewhere locally (it's not likely
to be a problem because it'd not be long ago that we acquired an lwlock,
which is a full barrier).
That's the part, I am not fully sure about. But reading the comment above SyncRepUpdateSyncStandbysDefined(), it seems fine.
> Bring back the check which existed based on GUC but instead of just blindly
> returning based on just GUC not being set, check
> WalSndCtl->sync_standbys_defined. Thoughts?
Hm. Is there any reason not to just check
WalSndCtl->sync_standbys_defined? rather than both !SyncStandbysDefined()
and WalSndCtl->sync_standbys_defined?
Agree, just checking for WalSndCtl->sync_standbys_defined seems fine.
I wasn't fully thinking there, as I got distracted by if lock will be required or not for reading the same. If lock was required then checking for guc first would have been better, but seems not required.
On 2020/04/08 3:01, Ashwin Agrawal wrote: > > On Mon, Apr 6, 2020 at 2:14 PM Andres Freund <andres@anarazel.de <mailto:andres@anarazel.de>> wrote: > > > How about we change it to this ? > > Hm. Better. But I think it might need at least a compiler barrier / > volatile memory load? Unlikely here, but otherwise the compiler could > theoretically just stash the variable somewhere locally (it's not likely > to be a problem because it'd not be long ago that we acquired an lwlock, > which is a full barrier). > > > That's the part, I am not fully sure about. But reading the comment above SyncRepUpdateSyncStandbysDefined(), it seemsfine. > > > Bring back the check which existed based on GUC but instead of just blindly > > returning based on just GUC not being set, check > > WalSndCtl->sync_standbys_defined. Thoughts? > > Hm. Is there any reason not to just check > WalSndCtl->sync_standbys_defined? rather than both !SyncStandbysDefined() > and WalSndCtl->sync_standbys_defined? > > > Agree, just checking for WalSndCtl->sync_standbys_defined seems fine. So the consensus is something like the following? Patch attached. /* - * Fast exit if user has not requested sync replication. + * Fast exit if user has not requested sync replication, or there are no + * sync replication standby names defined. */ - if (!SyncRepRequested()) + if (!SyncRepRequested() || + !((volatile WalSndCtlData *) WalSndCtl)->sync_standbys_defined) return; Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Attachment
On Fri, 10 Apr 2020 at 13:20, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > > > On 2020/04/08 3:01, Ashwin Agrawal wrote: > > > > On Mon, Apr 6, 2020 at 2:14 PM Andres Freund <andres@anarazel.de <mailto:andres@anarazel.de>> wrote: > > > > > How about we change it to this ? > > > > Hm. Better. But I think it might need at least a compiler barrier / > > volatile memory load? Unlikely here, but otherwise the compiler could > > theoretically just stash the variable somewhere locally (it's not likely > > to be a problem because it'd not be long ago that we acquired an lwlock, > > which is a full barrier). > > > > > > That's the part, I am not fully sure about. But reading the comment above SyncRepUpdateSyncStandbysDefined(), it seemsfine. > > > > > Bring back the check which existed based on GUC but instead of just blindly > > > returning based on just GUC not being set, check > > > WalSndCtl->sync_standbys_defined. Thoughts? > > > > Hm. Is there any reason not to just check > > WalSndCtl->sync_standbys_defined? rather than both !SyncStandbysDefined() > > and WalSndCtl->sync_standbys_defined? > > > > > > Agree, just checking for WalSndCtl->sync_standbys_defined seems fine. > > So the consensus is something like the following? Patch attached. > > /* > - * Fast exit if user has not requested sync replication. > + * Fast exit if user has not requested sync replication, or there are no > + * sync replication standby names defined. > */ > - if (!SyncRepRequested()) > + if (!SyncRepRequested() || > + !((volatile WalSndCtlData *) WalSndCtl)->sync_standbys_defined) > return; > I think we need more comments describing why checking sync_standby_defined without SyncRepLock is safe here. For example: This routine gets called every commit time. So, to check if the synchronous standbys is defined as quick as possible we check WalSndCtl->sync_standbys_defined without acquiring SyncRepLock. Since we make this test unlocked, there's a change we might fail to notice that it has been turned off and continue processing. But since the subsequent check will check it again while holding SyncRepLock, it's no problem. Similarly even if we fail to notice that it has been turned on, it's okay to return quickly since all backend consistently behaves so. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2020/04/10 14:11, Masahiko Sawada wrote: > On Fri, 10 Apr 2020 at 13:20, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >> >> >> >> On 2020/04/08 3:01, Ashwin Agrawal wrote: >>> >>> On Mon, Apr 6, 2020 at 2:14 PM Andres Freund <andres@anarazel.de <mailto:andres@anarazel.de>> wrote: >>> >>> > How about we change it to this ? >>> >>> Hm. Better. But I think it might need at least a compiler barrier / >>> volatile memory load? Unlikely here, but otherwise the compiler could >>> theoretically just stash the variable somewhere locally (it's not likely >>> to be a problem because it'd not be long ago that we acquired an lwlock, >>> which is a full barrier). >>> >>> >>> That's the part, I am not fully sure about. But reading the comment above SyncRepUpdateSyncStandbysDefined(), it seemsfine. >>> >>> > Bring back the check which existed based on GUC but instead of just blindly >>> > returning based on just GUC not being set, check >>> > WalSndCtl->sync_standbys_defined. Thoughts? >>> >>> Hm. Is there any reason not to just check >>> WalSndCtl->sync_standbys_defined? rather than both !SyncStandbysDefined() >>> and WalSndCtl->sync_standbys_defined? >>> >>> >>> Agree, just checking for WalSndCtl->sync_standbys_defined seems fine. >> >> So the consensus is something like the following? Patch attached. >> >> /* >> - * Fast exit if user has not requested sync replication. >> + * Fast exit if user has not requested sync replication, or there are no >> + * sync replication standby names defined. >> */ >> - if (!SyncRepRequested()) >> + if (!SyncRepRequested() || >> + !((volatile WalSndCtlData *) WalSndCtl)->sync_standbys_defined) >> return; >> > > I think we need more comments describing why checking > sync_standby_defined without SyncRepLock is safe here. For example: Yep, agreed! > This routine gets called every commit time. So, to check if the > synchronous standbys is defined as quick as possible we check > WalSndCtl->sync_standbys_defined without acquiring SyncRepLock. Since > we make this test unlocked, there's a change we might fail to notice > that it has been turned off and continue processing. Does this really happen? I was thinking that the problem by not taking the lock here is that SyncRepWaitForLSN() can see that shared flag after SyncRepUpdateSyncStandbysDefined() wakes up all the waiters and before it sets the flag to false. Then if SyncRepWaitForLSN() adds itself into the wait queue becaues the flag was true, without lock, it may keep sleeping infinitely. > But since the > subsequent check will check it again while holding SyncRepLock, it's > no problem. Similarly even if we fail to notice that it has been > turned on Is this true? ISTM that after SyncRepUpdateSyncStandbysDefined() sets the flag to true, SyncRepWaitForLSN() basically doesn't seem to fail to notice that. No? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On Fri, 10 Apr 2020 at 18:57, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > > > On 2020/04/10 14:11, Masahiko Sawada wrote: > > On Fri, 10 Apr 2020 at 13:20, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > >> > >> > >> > >> On 2020/04/08 3:01, Ashwin Agrawal wrote: > >>> > >>> On Mon, Apr 6, 2020 at 2:14 PM Andres Freund <andres@anarazel.de <mailto:andres@anarazel.de>> wrote: > >>> > >>> > How about we change it to this ? > >>> > >>> Hm. Better. But I think it might need at least a compiler barrier / > >>> volatile memory load? Unlikely here, but otherwise the compiler could > >>> theoretically just stash the variable somewhere locally (it's not likely > >>> to be a problem because it'd not be long ago that we acquired an lwlock, > >>> which is a full barrier). > >>> > >>> > >>> That's the part, I am not fully sure about. But reading the comment above SyncRepUpdateSyncStandbysDefined(), it seemsfine. > >>> > >>> > Bring back the check which existed based on GUC but instead of just blindly > >>> > returning based on just GUC not being set, check > >>> > WalSndCtl->sync_standbys_defined. Thoughts? > >>> > >>> Hm. Is there any reason not to just check > >>> WalSndCtl->sync_standbys_defined? rather than both !SyncStandbysDefined() > >>> and WalSndCtl->sync_standbys_defined? > >>> > >>> > >>> Agree, just checking for WalSndCtl->sync_standbys_defined seems fine. > >> > >> So the consensus is something like the following? Patch attached. > >> > >> /* > >> - * Fast exit if user has not requested sync replication. > >> + * Fast exit if user has not requested sync replication, or there are no > >> + * sync replication standby names defined. > >> */ > >> - if (!SyncRepRequested()) > >> + if (!SyncRepRequested() || > >> + !((volatile WalSndCtlData *) WalSndCtl)->sync_standbys_defined) > >> return; > >> > > > > I think we need more comments describing why checking > > sync_standby_defined without SyncRepLock is safe here. For example: > > Yep, agreed! > > > This routine gets called every commit time. So, to check if the > > synchronous standbys is defined as quick as possible we check > > WalSndCtl->sync_standbys_defined without acquiring SyncRepLock. Since > > we make this test unlocked, there's a change we might fail to notice > > that it has been turned off and continue processing. > > Does this really happen? I was thinking that the problem by not taking > the lock here is that SyncRepWaitForLSN() can see that shared flag after > SyncRepUpdateSyncStandbysDefined() wakes up all the waiters and > before it sets the flag to false. Then if SyncRepWaitForLSN() adds itself > into the wait queue becaues the flag was true, without lock, it may keep > sleeping infinitely. I think that because a backend does the following check after acquiring SyncRepLock, in that case, once the backend has taken SyncRepLock it can see that sync_standbys_defined is false and return. But you meant that we do both checks without SyncRepLock? /* * We don't wait for sync rep if WalSndCtl->sync_standbys_defined is not * set. See SyncRepUpdateSyncStandbysDefined. * * Also check that the standby hasn't already replied. Unlikely race * condition but we'll be fetching that cache line anyway so it's likely * to be a low cost check. */ if (!WalSndCtl->sync_standbys_defined || lsn <= WalSndCtl->lsn[mode]) { LWLockRelease(SyncRepLock); return; } > > > But since the > > subsequent check will check it again while holding SyncRepLock, it's > > no problem. Similarly even if we fail to notice that it has been > > turned on > Is this true? ISTM that after SyncRepUpdateSyncStandbysDefined() > sets the flag to true, SyncRepWaitForLSN() basically doesn't seem > to fail to notice that. No? What I wanted to say is, in the current code, while the checkpointer process is holding SyncRepLock to turn off sync_standbys_defined, backends who reach SyncRepWaitForLSN() wait for the lock. Then, after the checkpointer process releases SyncRepLock these backend can enqueue themselves to the wait queue because they can see that sync_standbys_defined is turned on. On the other hand if we do the check without SyncRepLock, backends who reach SyncRepWaitForLSN() will return instead of waiting, in spite of checkpointer process being turning on sync_standbys_defined. Which means these backends are failing to notice that it has been turned on, I thought. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2020/04/10 20:56, Masahiko Sawada wrote: > On Fri, 10 Apr 2020 at 18:57, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >> >> >> >> On 2020/04/10 14:11, Masahiko Sawada wrote: >>> On Fri, 10 Apr 2020 at 13:20, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >>>> >>>> >>>> >>>> On 2020/04/08 3:01, Ashwin Agrawal wrote: >>>>> >>>>> On Mon, Apr 6, 2020 at 2:14 PM Andres Freund <andres@anarazel.de <mailto:andres@anarazel.de>> wrote: >>>>> >>>>> > How about we change it to this ? >>>>> >>>>> Hm. Better. But I think it might need at least a compiler barrier / >>>>> volatile memory load? Unlikely here, but otherwise the compiler could >>>>> theoretically just stash the variable somewhere locally (it's not likely >>>>> to be a problem because it'd not be long ago that we acquired an lwlock, >>>>> which is a full barrier). >>>>> >>>>> >>>>> That's the part, I am not fully sure about. But reading the comment above SyncRepUpdateSyncStandbysDefined(), it seemsfine. >>>>> >>>>> > Bring back the check which existed based on GUC but instead of just blindly >>>>> > returning based on just GUC not being set, check >>>>> > WalSndCtl->sync_standbys_defined. Thoughts? >>>>> >>>>> Hm. Is there any reason not to just check >>>>> WalSndCtl->sync_standbys_defined? rather than both !SyncStandbysDefined() >>>>> and WalSndCtl->sync_standbys_defined? >>>>> >>>>> >>>>> Agree, just checking for WalSndCtl->sync_standbys_defined seems fine. >>>> >>>> So the consensus is something like the following? Patch attached. >>>> >>>> /* >>>> - * Fast exit if user has not requested sync replication. >>>> + * Fast exit if user has not requested sync replication, or there are no >>>> + * sync replication standby names defined. >>>> */ >>>> - if (!SyncRepRequested()) >>>> + if (!SyncRepRequested() || >>>> + !((volatile WalSndCtlData *) WalSndCtl)->sync_standbys_defined) >>>> return; >>>> >>> >>> I think we need more comments describing why checking >>> sync_standby_defined without SyncRepLock is safe here. For example: >> >> Yep, agreed! >> >>> This routine gets called every commit time. So, to check if the >>> synchronous standbys is defined as quick as possible we check >>> WalSndCtl->sync_standbys_defined without acquiring SyncRepLock. Since >>> we make this test unlocked, there's a change we might fail to notice >>> that it has been turned off and continue processing. >> >> Does this really happen? I was thinking that the problem by not taking >> the lock here is that SyncRepWaitForLSN() can see that shared flag after >> SyncRepUpdateSyncStandbysDefined() wakes up all the waiters and >> before it sets the flag to false. Then if SyncRepWaitForLSN() adds itself >> into the wait queue becaues the flag was true, without lock, it may keep >> sleeping infinitely. > > I think that because a backend does the following check after > acquiring SyncRepLock, in that case, once the backend has taken > SyncRepLock it can see that sync_standbys_defined is false and return. Yes, but the backend can see that sync_standby_defined indicates false whether holding SyncRepLock or not, after the checkpointer sets it to false. > But you meant that we do both checks without SyncRepLock? Maybe No. The change that the latest patch provides should be applied, I think. That is, sync_standbys_defined should be check without lock at first, then only if it's true, it should be checked again with lock. ISTM that basically SyncRepLock is used in SyncRepWaitForLSN() and SyncRepUpdateSyncStandbysDefined() to make operation on the queue and enabling sync_standbys_defined atomic. Without lock, the issue that the comment in SyncRepUpdateSyncStandbysDefined() explains would happen. That is, the backend may keep waiting infinitely as follows. 1. checkpointer calls SyncRepUpdateSyncStandbysDefined() 2. checkpointer sees that the flag indicates true but the config indicates false 3. checkpointer takes lock and wakes up all the waiters 4. backend calls SyncRepWaitForLSN() can see that the flag indicates true 5. checkpointer sets the flag to false and releases the lock 6. backend adds itself to the queue and wait until it's waken up, but will not happen immediately So after the backend sees that the flag indicates true without lock, it must check the flag again with lock immediately without operating the queue. If this my understanding is right, I was thinking that the comment should mention these things. > /* > * We don't wait for sync rep if WalSndCtl->sync_standbys_defined is not > * set. See SyncRepUpdateSyncStandbysDefined. > * > * Also check that the standby hasn't already replied. Unlikely race > * condition but we'll be fetching that cache line anyway so it's likely > * to be a low cost check. > */ > if (!WalSndCtl->sync_standbys_defined || > lsn <= WalSndCtl->lsn[mode]) > { > LWLockRelease(SyncRepLock); > return; > } > >> >>> But since the >>> subsequent check will check it again while holding SyncRepLock, it's >>> no problem. Similarly even if we fail to notice that it has been >>> turned on >> Is this true? ISTM that after SyncRepUpdateSyncStandbysDefined() >> sets the flag to true, SyncRepWaitForLSN() basically doesn't seem >> to fail to notice that. No? > > What I wanted to say is, in the current code, while the checkpointer > process is holding SyncRepLock to turn off sync_standbys_defined, > backends who reach SyncRepWaitForLSN() wait for the lock. Then, after > the checkpointer process releases SyncRepLock these backend can > enqueue themselves to the wait queue because they can see that > sync_standbys_defined is turned on. In this case, since the checkpointer turned the flag off while holding the lock, the backend sees that the flag is turned off, and doesn't enqueue itself. No? > On the other hand if we do the > check without SyncRepLock, backends who reach SyncRepWaitForLSN() will > return instead of waiting, in spite of checkpointer process being > turning on sync_standbys_defined. Which means these backends are > failing to notice that it has been turned on, I thought. No. Or I'm missing something... In this case, the backend sees that the flag is turned on without lock since checkpointer turned it on. So you're thinking the following. Right? 1. sync_standbys_defined flag is false 2. checkpointer takes the lock and turns the flag on 3. backend sees the flag 4. checkpointer releases the lock In #3, the flag indicates true, I think. But you think it's false? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On Fri, 10 Apr 2020 at 21:52, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > > > On 2020/04/10 20:56, Masahiko Sawada wrote: > > On Fri, 10 Apr 2020 at 18:57, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > >> > >> > >> > >> On 2020/04/10 14:11, Masahiko Sawada wrote: > >>> On Fri, 10 Apr 2020 at 13:20, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > >>>> > >>>> > >>>> > >>>> On 2020/04/08 3:01, Ashwin Agrawal wrote: > >>>>> > >>>>> On Mon, Apr 6, 2020 at 2:14 PM Andres Freund <andres@anarazel.de <mailto:andres@anarazel.de>> wrote: > >>>>> > >>>>> > How about we change it to this ? > >>>>> > >>>>> Hm. Better. But I think it might need at least a compiler barrier / > >>>>> volatile memory load? Unlikely here, but otherwise the compiler could > >>>>> theoretically just stash the variable somewhere locally (it's not likely > >>>>> to be a problem because it'd not be long ago that we acquired an lwlock, > >>>>> which is a full barrier). > >>>>> > >>>>> > >>>>> That's the part, I am not fully sure about. But reading the comment above SyncRepUpdateSyncStandbysDefined(), itseems fine. > >>>>> > >>>>> > Bring back the check which existed based on GUC but instead of just blindly > >>>>> > returning based on just GUC not being set, check > >>>>> > WalSndCtl->sync_standbys_defined. Thoughts? > >>>>> > >>>>> Hm. Is there any reason not to just check > >>>>> WalSndCtl->sync_standbys_defined? rather than both !SyncStandbysDefined() > >>>>> and WalSndCtl->sync_standbys_defined? > >>>>> > >>>>> > >>>>> Agree, just checking for WalSndCtl->sync_standbys_defined seems fine. > >>>> > >>>> So the consensus is something like the following? Patch attached. > >>>> > >>>> /* > >>>> - * Fast exit if user has not requested sync replication. > >>>> + * Fast exit if user has not requested sync replication, or there are no > >>>> + * sync replication standby names defined. > >>>> */ > >>>> - if (!SyncRepRequested()) > >>>> + if (!SyncRepRequested() || > >>>> + !((volatile WalSndCtlData *) WalSndCtl)->sync_standbys_defined) > >>>> return; > >>>> > >>> > >>> I think we need more comments describing why checking > >>> sync_standby_defined without SyncRepLock is safe here. For example: > >> > >> Yep, agreed! > >> > >>> This routine gets called every commit time. So, to check if the > >>> synchronous standbys is defined as quick as possible we check > >>> WalSndCtl->sync_standbys_defined without acquiring SyncRepLock. Since > >>> we make this test unlocked, there's a change we might fail to notice > >>> that it has been turned off and continue processing. > >> > >> Does this really happen? I was thinking that the problem by not taking > >> the lock here is that SyncRepWaitForLSN() can see that shared flag after > >> SyncRepUpdateSyncStandbysDefined() wakes up all the waiters and > >> before it sets the flag to false. Then if SyncRepWaitForLSN() adds itself > >> into the wait queue becaues the flag was true, without lock, it may keep > >> sleeping infinitely. > > > > I think that because a backend does the following check after > > acquiring SyncRepLock, in that case, once the backend has taken > > SyncRepLock it can see that sync_standbys_defined is false and return. > > Yes, but the backend can see that sync_standby_defined indicates false > whether holding SyncRepLock or not, after the checkpointer sets it to false. > > > But you meant that we do both checks without SyncRepLock? > > Maybe No. The change that the latest patch provides should be applied, I think. > That is, sync_standbys_defined should be check without lock at first, then > only if it's true, it should be checked again with lock. Yes. My understanding is the same. After applying your patch, SyncRepWaitForLSN() is going to become something like: /* * Fast exit if user has not requested sync replication, or there are no * sync replication standby names defined. */ if (!SyncRepRequested() || !((volatile WalSndCtlData *) WalSndCtl)->sync_standbys_defined) return; Assert(SHMQueueIsDetached(&(MyProc->syncRepLinks))); Assert(WalSndCtl != NULL); LWLockAcquire(SyncRepLock, LW_EXCLUSIVE); Assert(MyProc->syncRepState == SYNC_REP_NOT_WAITING); /* * We don't wait for sync rep if WalSndCtl->sync_standbys_defined is not * set. See SyncRepUpdateSyncStandbysDefined. * * Also check that the standby hasn't already replied. Unlikely race * condition but we'll be fetching that cache line anyway so it's likely * to be a low cost check. */ if (!WalSndCtl->sync_standbys_defined || lsn <= WalSndCtl->lsn[mode]) { LWLockRelease(SyncRepLock); return; } /* * Set our waitLSN so WALSender will know when to wake us, and add * ourselves to the queue. */ MyProc->waitLSN = lsn; MyProc->syncRepState = SYNC_REP_WAITING; SyncRepQueueInsert(mode); Assert(SyncRepQueueIsOrderedByLSN(mode)); LWLockRelease(SyncRepLock); There are two checks of sync_standbys_defined. The first check is performed without SyncRepLock and the second check is performed with SyncRepLock. That's what you and I are expecting. Right? > > ISTM that basically SyncRepLock is used in SyncRepWaitForLSN() and > SyncRepUpdateSyncStandbysDefined() to make operation on the queue > and enabling sync_standbys_defined atomic. Without lock, the issue that > the comment in SyncRepUpdateSyncStandbysDefined() explains would > happen. That is, the backend may keep waiting infinitely as follows. > Let me think the following sequence after applying your changes: > 1. checkpointer calls SyncRepUpdateSyncStandbysDefined() > 2. checkpointer sees that the flag indicates true but the config indicates false > 3. checkpointer takes lock and wakes up all the waiters > 4. backend calls SyncRepWaitForLSN() can see that the flag indicates true Yes, I suppose this is the first check of sync_standbys_defined. And before the second check, backend tries to acquire SyncRepLock but since the lock is already being held by checkohpointer it must wait. > 5. checkpointer sets the flag to false and releases the lock After checkpointer release the lock, the backend is woken up. > 6. backend adds itself to the queue and wait until it's waken up, but will not happen immediately The backend sees that the flag has been false at the second check, so return. If we didn't acquire SyncRepLock even for the second check I think the backend would keep waiting infinitely as you mentioned. > > So after the backend sees that the flag indicates true without lock, > it must check the flag again with lock immediately without operating > the queue. If this my understanding is right, I was thinking that > the comment should mention these things. I think that's right. I was going to describe why we do the first check without SyncRepLock and why it is safe but it seems to me that these things you mentioned are related to the second check, if I'm not missing something. > > > /* > > * We don't wait for sync rep if WalSndCtl->sync_standbys_defined is not > > * set. See SyncRepUpdateSyncStandbysDefined. > > * > > * Also check that the standby hasn't already replied. Unlikely race > > * condition but we'll be fetching that cache line anyway so it's likely > > * to be a low cost check. > > */ > > if (!WalSndCtl->sync_standbys_defined || > > lsn <= WalSndCtl->lsn[mode]) > > { > > LWLockRelease(SyncRepLock); > > return; > > } > > > >> > >>> But since the > >>> subsequent check will check it again while holding SyncRepLock, it's > >>> no problem. Similarly even if we fail to notice that it has been > >>> turned on > >> Is this true? ISTM that after SyncRepUpdateSyncStandbysDefined() > >> sets the flag to true, SyncRepWaitForLSN() basically doesn't seem > >> to fail to notice that. No? > > > > What I wanted to say is, in the current code, while the checkpointer > > process is holding SyncRepLock to turn off sync_standbys_defined, > > backends who reach SyncRepWaitForLSN() wait for the lock. Then, after > > the checkpointer process releases SyncRepLock these backend can > > enqueue themselves to the wait queue because they can see that > > sync_standbys_defined is turned on. > > In this case, since the checkpointer turned the flag off while holding > the lock, the backend sees that the flag is turned off, and doesn't > enqueue itself. No? Oops, I had mistake here. It should be "while the checkpointer process is holding SyncRepLock to turn *on* sync_standbys_defined, ...". > > > On the other hand if we do the > > check without SyncRepLock, backends who reach SyncRepWaitForLSN() will > > return instead of waiting, in spite of checkpointer process being > > turning on sync_standbys_defined. Which means these backends are > > failing to notice that it has been turned on, I thought. > > No. Or I'm missing something... In this case, the backend sees that > the flag is turned on without lock since checkpointer turned it on. > So you're thinking the following. Right? > > 1. sync_standbys_defined flag is false > 2. checkpointer takes the lock and turns the flag on > 3. backend sees the flag > 4. checkpointer releases the lock > > In #3, the flag indicates true, I think. But you think it's false? I meant the backends who reach SyncRepLock() while checkpointer is at after acquiring the lock but before turning the flag on, described at #3 step in the following sequence. Such backends will wait for the lock in the current code, but after applying the patch they return quickly. So what I'm thinking is: 1. sync_standbys_defined flag is false 2. checkpointer takes the lock 3. backend sees the flag, and return as it's still false 4. checkpointer turns the flag on 5. checkpointer releases the lock If a backend reaches SyncRepWaitForLSN() between #4 and #5 it will wait for the lock and then enqueue itself after acquiring the lock. But such behavior is not changed before and after applying the patch. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sat, 11 Apr 2020 at 09:30, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Fri, 10 Apr 2020 at 21:52, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > > > > > > > On 2020/04/10 20:56, Masahiko Sawada wrote: > > > On Fri, 10 Apr 2020 at 18:57, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > >> > > >> > > >> > > >> On 2020/04/10 14:11, Masahiko Sawada wrote: > > >>> On Fri, 10 Apr 2020 at 13:20, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > >>>> > > >>>> > > >>>> > > >>>> On 2020/04/08 3:01, Ashwin Agrawal wrote: > > >>>>> > > >>>>> On Mon, Apr 6, 2020 at 2:14 PM Andres Freund <andres@anarazel.de <mailto:andres@anarazel.de>> wrote: > > >>>>> > > >>>>> > How about we change it to this ? > > >>>>> > > >>>>> Hm. Better. But I think it might need at least a compiler barrier / > > >>>>> volatile memory load? Unlikely here, but otherwise the compiler could > > >>>>> theoretically just stash the variable somewhere locally (it's not likely > > >>>>> to be a problem because it'd not be long ago that we acquired an lwlock, > > >>>>> which is a full barrier). > > >>>>> > > >>>>> > > >>>>> That's the part, I am not fully sure about. But reading the comment above SyncRepUpdateSyncStandbysDefined(), itseems fine. > > >>>>> > > >>>>> > Bring back the check which existed based on GUC but instead of just blindly > > >>>>> > returning based on just GUC not being set, check > > >>>>> > WalSndCtl->sync_standbys_defined. Thoughts? > > >>>>> > > >>>>> Hm. Is there any reason not to just check > > >>>>> WalSndCtl->sync_standbys_defined? rather than both !SyncStandbysDefined() > > >>>>> and WalSndCtl->sync_standbys_defined? > > >>>>> > > >>>>> > > >>>>> Agree, just checking for WalSndCtl->sync_standbys_defined seems fine. > > >>>> > > >>>> So the consensus is something like the following? Patch attached. > > >>>> > > >>>> /* > > >>>> - * Fast exit if user has not requested sync replication. > > >>>> + * Fast exit if user has not requested sync replication, or there are no > > >>>> + * sync replication standby names defined. > > >>>> */ > > >>>> - if (!SyncRepRequested()) > > >>>> + if (!SyncRepRequested() || > > >>>> + !((volatile WalSndCtlData *) WalSndCtl)->sync_standbys_defined) > > >>>> return; > > >>>> > > >>> > > >>> I think we need more comments describing why checking > > >>> sync_standby_defined without SyncRepLock is safe here. For example: > > >> > > >> Yep, agreed! > > >> > > >>> This routine gets called every commit time. So, to check if the > > >>> synchronous standbys is defined as quick as possible we check > > >>> WalSndCtl->sync_standbys_defined without acquiring SyncRepLock. Since > > >>> we make this test unlocked, there's a change we might fail to notice > > >>> that it has been turned off and continue processing. > > >> > > >> Does this really happen? I was thinking that the problem by not taking > > >> the lock here is that SyncRepWaitForLSN() can see that shared flag after > > >> SyncRepUpdateSyncStandbysDefined() wakes up all the waiters and > > >> before it sets the flag to false. Then if SyncRepWaitForLSN() adds itself > > >> into the wait queue becaues the flag was true, without lock, it may keep > > >> sleeping infinitely. > > > > > > I think that because a backend does the following check after > > > acquiring SyncRepLock, in that case, once the backend has taken > > > SyncRepLock it can see that sync_standbys_defined is false and return. > > > > Yes, but the backend can see that sync_standby_defined indicates false > > whether holding SyncRepLock or not, after the checkpointer sets it to false. > > > > > But you meant that we do both checks without SyncRepLock? > > > > Maybe No. The change that the latest patch provides should be applied, I think. > > That is, sync_standbys_defined should be check without lock at first, then > > only if it's true, it should be checked again with lock. > > Yes. My understanding is the same. > > After applying your patch, SyncRepWaitForLSN() is going to become > something like: > > /* > * Fast exit if user has not requested sync replication, or there are no > * sync replication standby names defined. > */ > if (!SyncRepRequested() || > !((volatile WalSndCtlData *) WalSndCtl)->sync_standbys_defined) > return; > > Assert(SHMQueueIsDetached(&(MyProc->syncRepLinks))); > Assert(WalSndCtl != NULL); > > LWLockAcquire(SyncRepLock, LW_EXCLUSIVE); > Assert(MyProc->syncRepState == SYNC_REP_NOT_WAITING); > > /* > * We don't wait for sync rep if WalSndCtl->sync_standbys_defined is not > * set. See SyncRepUpdateSyncStandbysDefined. > * > * Also check that the standby hasn't already replied. Unlikely race > * condition but we'll be fetching that cache line anyway so it's likely > * to be a low cost check. > */ > if (!WalSndCtl->sync_standbys_defined || > lsn <= WalSndCtl->lsn[mode]) > { > LWLockRelease(SyncRepLock); > return; > } > > /* > * Set our waitLSN so WALSender will know when to wake us, and add > * ourselves to the queue. > */ > MyProc->waitLSN = lsn; > MyProc->syncRepState = SYNC_REP_WAITING; > SyncRepQueueInsert(mode); > Assert(SyncRepQueueIsOrderedByLSN(mode)); > LWLockRelease(SyncRepLock); > > There are two checks of sync_standbys_defined. The first check is > performed without SyncRepLock and the second check is performed with > SyncRepLock. That's what you and I are expecting. Right? > > > > > ISTM that basically SyncRepLock is used in SyncRepWaitForLSN() and > > SyncRepUpdateSyncStandbysDefined() to make operation on the queue > > and enabling sync_standbys_defined atomic. Without lock, the issue that > > the comment in SyncRepUpdateSyncStandbysDefined() explains would > > happen. That is, the backend may keep waiting infinitely as follows. > > > > Let me think the following sequence after applying your changes: > > > 1. checkpointer calls SyncRepUpdateSyncStandbysDefined() > > 2. checkpointer sees that the flag indicates true but the config indicates false > > 3. checkpointer takes lock and wakes up all the waiters > > 4. backend calls SyncRepWaitForLSN() can see that the flag indicates true > > Yes, I suppose this is the first check of sync_standbys_defined. > > And before the second check, backend tries to acquire SyncRepLock but > since the lock is already being held by checkohpointer it must wait. > > > 5. checkpointer sets the flag to false and releases the lock > > After checkpointer release the lock, the backend is woken up. > > > 6. backend adds itself to the queue and wait until it's waken up, but will not happen immediately > > The backend sees that the flag has been false at the second check, so return. > > If we didn't acquire SyncRepLock even for the second check I think the > backend would keep waiting infinitely as you mentioned. > > > > > So after the backend sees that the flag indicates true without lock, > > it must check the flag again with lock immediately without operating > > the queue. If this my understanding is right, I was thinking that > > the comment should mention these things. > > I think that's right. I was going to describe why we do the first > check without SyncRepLock and why it is safe but it seems to me that > these things you mentioned are related to the second check, if I'm not > missing something. > > > > > > /* > > > * We don't wait for sync rep if WalSndCtl->sync_standbys_defined is not > > > * set. See SyncRepUpdateSyncStandbysDefined. > > > * > > > * Also check that the standby hasn't already replied. Unlikely race > > > * condition but we'll be fetching that cache line anyway so it's likely > > > * to be a low cost check. > > > */ > > > if (!WalSndCtl->sync_standbys_defined || > > > lsn <= WalSndCtl->lsn[mode]) > > > { > > > LWLockRelease(SyncRepLock); > > > return; > > > } > > > > > >> > > >>> But since the > > >>> subsequent check will check it again while holding SyncRepLock, it's > > >>> no problem. Similarly even if we fail to notice that it has been > > >>> turned on > > >> Is this true? ISTM that after SyncRepUpdateSyncStandbysDefined() > > >> sets the flag to true, SyncRepWaitForLSN() basically doesn't seem > > >> to fail to notice that. No? > > > > > > What I wanted to say is, in the current code, while the checkpointer > > > process is holding SyncRepLock to turn off sync_standbys_defined, > > > backends who reach SyncRepWaitForLSN() wait for the lock. Then, after > > > the checkpointer process releases SyncRepLock these backend can > > > enqueue themselves to the wait queue because they can see that > > > sync_standbys_defined is turned on. > > > > In this case, since the checkpointer turned the flag off while holding > > the lock, the backend sees that the flag is turned off, and doesn't > > enqueue itself. No? > > Oops, I had mistake here. It should be "while the checkpointer process > is holding SyncRepLock to turn *on* sync_standbys_defined, ...". > > > > > > On the other hand if we do the > > > check without SyncRepLock, backends who reach SyncRepWaitForLSN() will > > > return instead of waiting, in spite of checkpointer process being > > > turning on sync_standbys_defined. Which means these backends are > > > failing to notice that it has been turned on, I thought. > > > > No. Or I'm missing something... In this case, the backend sees that > > the flag is turned on without lock since checkpointer turned it on. > > So you're thinking the following. Right? > > > > 1. sync_standbys_defined flag is false > > 2. checkpointer takes the lock and turns the flag on > > 3. backend sees the flag > > 4. checkpointer releases the lock > > > > In #3, the flag indicates true, I think. But you think it's false? > > I meant the backends who reach SyncRepLock() while checkpointer is at > after acquiring the lock but before turning the flag on, described at > #3 step in the following sequence. Such backends will wait for the > lock in the current code, but after applying the patch they return > quickly. So what I'm thinking is: > > 1. sync_standbys_defined flag is false > 2. checkpointer takes the lock > 3. backend sees the flag, and return as it's still false > 4. checkpointer turns the flag on > 5. checkpointer releases the lock > > If a backend reaches SyncRepWaitForLSN() between #4 and #5 it will > wait for the lock and then enqueue itself after acquiring the lock. > But such behavior is not changed before and after applying the patch. > Fujii-san, I think we agree on how to fix this issue and on the patch you proposed so please add your comments. This item is for PG14, right? If so I'd like to add this item to the next commit fest. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, May 18, 2020 at 7:41 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote:
This item is for PG14, right? If so I'd like to add this item to the
next commit fest.
Sure, add it to commit fest.
Seems though it should be backpatched to relevant branches as well.
On Tue, May 19, 2020 at 08:56:13AM -0700, Ashwin Agrawal wrote: > Sure, add it to commit fest. > Seems though it should be backpatched to relevant branches as well. It does not seem to be listed yet. Are you planning to add it under the section for bug fixes? -- Michael
Attachment
On 2020/05/19 11:41, Masahiko Sawada wrote: > On Sat, 11 Apr 2020 at 09:30, Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: >> >> On Fri, 10 Apr 2020 at 21:52, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >>> >>> >>> >>> On 2020/04/10 20:56, Masahiko Sawada wrote: >>>> On Fri, 10 Apr 2020 at 18:57, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >>>>> >>>>> >>>>> >>>>> On 2020/04/10 14:11, Masahiko Sawada wrote: >>>>>> On Fri, 10 Apr 2020 at 13:20, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 2020/04/08 3:01, Ashwin Agrawal wrote: >>>>>>>> >>>>>>>> On Mon, Apr 6, 2020 at 2:14 PM Andres Freund <andres@anarazel.de <mailto:andres@anarazel.de>> wrote: >>>>>>>> >>>>>>>> > How about we change it to this ? >>>>>>>> >>>>>>>> Hm. Better. But I think it might need at least a compiler barrier / >>>>>>>> volatile memory load? Unlikely here, but otherwise the compiler could >>>>>>>> theoretically just stash the variable somewhere locally (it's not likely >>>>>>>> to be a problem because it'd not be long ago that we acquired an lwlock, >>>>>>>> which is a full barrier). >>>>>>>> >>>>>>>> >>>>>>>> That's the part, I am not fully sure about. But reading the comment above SyncRepUpdateSyncStandbysDefined(), itseems fine. >>>>>>>> >>>>>>>> > Bring back the check which existed based on GUC but instead of just blindly >>>>>>>> > returning based on just GUC not being set, check >>>>>>>> > WalSndCtl->sync_standbys_defined. Thoughts? >>>>>>>> >>>>>>>> Hm. Is there any reason not to just check >>>>>>>> WalSndCtl->sync_standbys_defined? rather than both !SyncStandbysDefined() >>>>>>>> and WalSndCtl->sync_standbys_defined? >>>>>>>> >>>>>>>> >>>>>>>> Agree, just checking for WalSndCtl->sync_standbys_defined seems fine. >>>>>>> >>>>>>> So the consensus is something like the following? Patch attached. >>>>>>> >>>>>>> /* >>>>>>> - * Fast exit if user has not requested sync replication. >>>>>>> + * Fast exit if user has not requested sync replication, or there are no >>>>>>> + * sync replication standby names defined. >>>>>>> */ >>>>>>> - if (!SyncRepRequested()) >>>>>>> + if (!SyncRepRequested() || >>>>>>> + !((volatile WalSndCtlData *) WalSndCtl)->sync_standbys_defined) >>>>>>> return; >>>>>>> >>>>>> >>>>>> I think we need more comments describing why checking >>>>>> sync_standby_defined without SyncRepLock is safe here. For example: >>>>> >>>>> Yep, agreed! >>>>> >>>>>> This routine gets called every commit time. So, to check if the >>>>>> synchronous standbys is defined as quick as possible we check >>>>>> WalSndCtl->sync_standbys_defined without acquiring SyncRepLock. Since >>>>>> we make this test unlocked, there's a change we might fail to notice >>>>>> that it has been turned off and continue processing. >>>>> >>>>> Does this really happen? I was thinking that the problem by not taking >>>>> the lock here is that SyncRepWaitForLSN() can see that shared flag after >>>>> SyncRepUpdateSyncStandbysDefined() wakes up all the waiters and >>>>> before it sets the flag to false. Then if SyncRepWaitForLSN() adds itself >>>>> into the wait queue becaues the flag was true, without lock, it may keep >>>>> sleeping infinitely. >>>> >>>> I think that because a backend does the following check after >>>> acquiring SyncRepLock, in that case, once the backend has taken >>>> SyncRepLock it can see that sync_standbys_defined is false and return. >>> >>> Yes, but the backend can see that sync_standby_defined indicates false >>> whether holding SyncRepLock or not, after the checkpointer sets it to false. >>> >>>> But you meant that we do both checks without SyncRepLock? >>> >>> Maybe No. The change that the latest patch provides should be applied, I think. >>> That is, sync_standbys_defined should be check without lock at first, then >>> only if it's true, it should be checked again with lock. >> >> Yes. My understanding is the same. >> >> After applying your patch, SyncRepWaitForLSN() is going to become >> something like: >> >> /* >> * Fast exit if user has not requested sync replication, or there are no >> * sync replication standby names defined. >> */ >> if (!SyncRepRequested() || >> !((volatile WalSndCtlData *) WalSndCtl)->sync_standbys_defined) >> return; >> >> Assert(SHMQueueIsDetached(&(MyProc->syncRepLinks))); >> Assert(WalSndCtl != NULL); >> >> LWLockAcquire(SyncRepLock, LW_EXCLUSIVE); >> Assert(MyProc->syncRepState == SYNC_REP_NOT_WAITING); >> >> /* >> * We don't wait for sync rep if WalSndCtl->sync_standbys_defined is not >> * set. See SyncRepUpdateSyncStandbysDefined. >> * >> * Also check that the standby hasn't already replied. Unlikely race >> * condition but we'll be fetching that cache line anyway so it's likely >> * to be a low cost check. >> */ >> if (!WalSndCtl->sync_standbys_defined || >> lsn <= WalSndCtl->lsn[mode]) >> { >> LWLockRelease(SyncRepLock); >> return; >> } >> >> /* >> * Set our waitLSN so WALSender will know when to wake us, and add >> * ourselves to the queue. >> */ >> MyProc->waitLSN = lsn; >> MyProc->syncRepState = SYNC_REP_WAITING; >> SyncRepQueueInsert(mode); >> Assert(SyncRepQueueIsOrderedByLSN(mode)); >> LWLockRelease(SyncRepLock); >> >> There are two checks of sync_standbys_defined. The first check is >> performed without SyncRepLock and the second check is performed with >> SyncRepLock. That's what you and I are expecting. Right? >> >>> >>> ISTM that basically SyncRepLock is used in SyncRepWaitForLSN() and >>> SyncRepUpdateSyncStandbysDefined() to make operation on the queue >>> and enabling sync_standbys_defined atomic. Without lock, the issue that >>> the comment in SyncRepUpdateSyncStandbysDefined() explains would >>> happen. That is, the backend may keep waiting infinitely as follows. >>> >> >> Let me think the following sequence after applying your changes: >> >>> 1. checkpointer calls SyncRepUpdateSyncStandbysDefined() >>> 2. checkpointer sees that the flag indicates true but the config indicates false >>> 3. checkpointer takes lock and wakes up all the waiters >>> 4. backend calls SyncRepWaitForLSN() can see that the flag indicates true >> >> Yes, I suppose this is the first check of sync_standbys_defined. >> >> And before the second check, backend tries to acquire SyncRepLock but >> since the lock is already being held by checkohpointer it must wait. >> >>> 5. checkpointer sets the flag to false and releases the lock >> >> After checkpointer release the lock, the backend is woken up. >> >>> 6. backend adds itself to the queue and wait until it's waken up, but will not happen immediately >> >> The backend sees that the flag has been false at the second check, so return. >> >> If we didn't acquire SyncRepLock even for the second check I think the >> backend would keep waiting infinitely as you mentioned. >> >>> >>> So after the backend sees that the flag indicates true without lock, >>> it must check the flag again with lock immediately without operating >>> the queue. If this my understanding is right, I was thinking that >>> the comment should mention these things. >> >> I think that's right. I was going to describe why we do the first >> check without SyncRepLock and why it is safe but it seems to me that >> these things you mentioned are related to the second check, if I'm not >> missing something. >> >>> >>>> /* >>>> * We don't wait for sync rep if WalSndCtl->sync_standbys_defined is not >>>> * set. See SyncRepUpdateSyncStandbysDefined. >>>> * >>>> * Also check that the standby hasn't already replied. Unlikely race >>>> * condition but we'll be fetching that cache line anyway so it's likely >>>> * to be a low cost check. >>>> */ >>>> if (!WalSndCtl->sync_standbys_defined || >>>> lsn <= WalSndCtl->lsn[mode]) >>>> { >>>> LWLockRelease(SyncRepLock); >>>> return; >>>> } >>>> >>>>> >>>>>> But since the >>>>>> subsequent check will check it again while holding SyncRepLock, it's >>>>>> no problem. Similarly even if we fail to notice that it has been >>>>>> turned on >>>>> Is this true? ISTM that after SyncRepUpdateSyncStandbysDefined() >>>>> sets the flag to true, SyncRepWaitForLSN() basically doesn't seem >>>>> to fail to notice that. No? >>>> >>>> What I wanted to say is, in the current code, while the checkpointer >>>> process is holding SyncRepLock to turn off sync_standbys_defined, >>>> backends who reach SyncRepWaitForLSN() wait for the lock. Then, after >>>> the checkpointer process releases SyncRepLock these backend can >>>> enqueue themselves to the wait queue because they can see that >>>> sync_standbys_defined is turned on. >>> >>> In this case, since the checkpointer turned the flag off while holding >>> the lock, the backend sees that the flag is turned off, and doesn't >>> enqueue itself. No? >> >> Oops, I had mistake here. It should be "while the checkpointer process >> is holding SyncRepLock to turn *on* sync_standbys_defined, ...". >> >>> >>>> On the other hand if we do the >>>> check without SyncRepLock, backends who reach SyncRepWaitForLSN() will >>>> return instead of waiting, in spite of checkpointer process being >>>> turning on sync_standbys_defined. Which means these backends are >>>> failing to notice that it has been turned on, I thought. >>> >>> No. Or I'm missing something... In this case, the backend sees that >>> the flag is turned on without lock since checkpointer turned it on. >>> So you're thinking the following. Right? >>> >>> 1. sync_standbys_defined flag is false >>> 2. checkpointer takes the lock and turns the flag on >>> 3. backend sees the flag >>> 4. checkpointer releases the lock >>> >>> In #3, the flag indicates true, I think. But you think it's false? >> >> I meant the backends who reach SyncRepLock() while checkpointer is at >> after acquiring the lock but before turning the flag on, described at >> #3 step in the following sequence. Such backends will wait for the >> lock in the current code, but after applying the patch they return >> quickly. So what I'm thinking is: >> >> 1. sync_standbys_defined flag is false >> 2. checkpointer takes the lock >> 3. backend sees the flag, and return as it's still false >> 4. checkpointer turns the flag on >> 5. checkpointer releases the lock >> >> If a backend reaches SyncRepWaitForLSN() between #4 and #5 it will >> wait for the lock and then enqueue itself after acquiring the lock. >> But such behavior is not changed before and after applying the patch. >> > > Fujii-san, I think we agree on how to fix this issue and on the patch > you proposed so please add your comments. Sorry for the late reply... Regarding how to fix, don't we need memory barrier when reading sync_standbys_defined? Without that, after SyncRepUpdateSyncStandbysDefined() updates it to true, SyncRepWaitForLSN() can see the previous value, i.e., false, and then exit out of the function. Is this right? If this is right, we need memory barrier to avoid this issue? > This item is for PG14, right? Yes! Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
The proposed fix looks good, it resolves the lock contention problem as intended. +1 from my side. > On 09-Jul-2020, at 4:22 PM, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > > Regarding how to fix, don't we need memory barrier when reading > sync_standbys_defined? Without that, after SyncRepUpdateSyncStandbysDefined() > updates it to true, SyncRepWaitForLSN() can see the previous value, > i.e., false, and then exit out of the function. Is this right? > If this is right, we need memory barrier to avoid this issue? > There is no out-of-order execution hazard in the scenario you are describing, memory barriers don’t seem to fit. Using locksto synchronise checkpointer process and a committing backend process is the right way. We have made a conscious decisionto bypass the lock, which looks correct in this case. As an aside, there is a small (?) window where a change to synchronous_standby_names GUC is partially propagated among committingbackends, checkpointer and walsender. Such a window may result in walsender declaring a standby as synchronouswhile a commit backend fails to wait for it in SyncRepWaitForLSN. The root cause is walsender uses sync_standby_priority,a per-walsender variable to tell if a standby is synchronous. It is updated when walsender processesa config change. Whereas sync_standbys_defined, a variable updated by checkpointer, is used by committing backendsto determine if they need to wait. If checkpointer is busy flushing buffers, it may take longer than walsender toreflect a change in sync_standbys_defined. This is a low impact problem, should be ok to live with it. Asim
On Tue, Aug 11, 2020 at 7:55 AM Asim Praveen <pasim@vmware.com> wrote: > There is no out-of-order execution hazard in the scenario you are describing, memory barriers don’t seem to fit. Usinglocks to synchronise checkpointer process and a committing backend process is the right way. We have made a consciousdecision to bypass the lock, which looks correct in this case. Yeah, I am not immediately seeing why a memory barrier would help anything here. > As an aside, there is a small (?) window where a change to synchronous_standby_names GUC is partially propagated amongcommitting backends, checkpointer and walsender. Such a window may result in walsender declaring a standby as synchronouswhile a commit backend fails to wait for it in SyncRepWaitForLSN. The root cause is walsender uses sync_standby_priority,a per-walsender variable to tell if a standby is synchronous. It is updated when walsender processesa config change. Whereas sync_standbys_defined, a variable updated by checkpointer, is used by committing backendsto determine if they need to wait. If checkpointer is busy flushing buffers, it may take longer than walsender toreflect a change in sync_standbys_defined. This is a low impact problem, should be ok to live with it. I think this gets to the root of the issue. If we check the flag without a lock, we might see a slightly stale value. But, considering that there's no particular amount of time within which configuration changes are guaranteed to take effect, maybe that's OK. However, there is one potential gotcha here: if the walsender declares the standby to be synchronous, a user can see that, right? So maybe there's this problem: a user sees that the standby is synchronous and expects a transaction committing afterward to provoke a wait, but really it doesn't. Now the user is unhappy, feeling that the system didn't perform according to expectations. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> On 11-Aug-2020, at 8:57 PM, Robert Haas <robertmhaas@gmail.com> wrote: > > I think this gets to the root of the issue. If we check the flag > without a lock, we might see a slightly stale value. But, considering > that there's no particular amount of time within which configuration > changes are guaranteed to take effect, maybe that's OK. However, there > is one potential gotcha here: if the walsender declares the standby to > be synchronous, a user can see that, right? So maybe there's this > problem: a user sees that the standby is synchronous and expects a > transaction committing afterward to provoke a wait, but really it > doesn't. Now the user is unhappy, feeling that the system didn't > perform according to expectations. Yes, pg_stat_replication reports a standby in sync as soon as walsender updates priority of the standby to something otherthan 0. The potential gotcha referred above doesn’t seem too severe. What is the likelihood of someone setting synchronous_standby_namesGUC with either “*” or a standby name and then immediately promoting that standby? If the standbyis promoted before the checkpointer on master gets a chance to update sync_standbys_defined in shared memory, commitsmade during this interval on master may not make it to standby. Upon promotion, those commits may be lost. Asim
On Wed, 12 Aug 2020 at 14:06, Asim Praveen <pasim@vmware.com> wrote: > > > > > On 11-Aug-2020, at 8:57 PM, Robert Haas <robertmhaas@gmail.com> wrote: > > > > I think this gets to the root of the issue. If we check the flag > > without a lock, we might see a slightly stale value. But, considering > > that there's no particular amount of time within which configuration > > changes are guaranteed to take effect, maybe that's OK. However, there > > is one potential gotcha here: if the walsender declares the standby to > > be synchronous, a user can see that, right? So maybe there's this > > problem: a user sees that the standby is synchronous and expects a > > transaction committing afterward to provoke a wait, but really it > > doesn't. Now the user is unhappy, feeling that the system didn't > > perform according to expectations. > > Yes, pg_stat_replication reports a standby in sync as soon as walsender updates priority of the standby to something otherthan 0. > > The potential gotcha referred above doesn’t seem too severe. What is the likelihood of someone setting synchronous_standby_namesGUC with either “*” or a standby name and then immediately promoting that standby? If the standbyis promoted before the checkpointer on master gets a chance to update sync_standbys_defined in shared memory, commitsmade during this interval on master may not make it to standby. Upon promotion, those commits may be lost. I think that if the standby is quite behind the primary and in case of the primary crashes, the likelihood of losing commits might get higher. The user can see the standby became synchronous standby via pg_stat_replication but commit completes without a wait because the checkpointer doesn't update sync_standbys_defined yet. If the primary crashes before standby catching up and the user does failover, the committed transaction will be lost, even though the user expects that transaction commit has been replicated to the standby synchronously. And this can happen even without the patch, right? Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2020/08/12 15:32, Masahiko Sawada wrote: > On Wed, 12 Aug 2020 at 14:06, Asim Praveen <pasim@vmware.com> wrote: >> >> >> >>> On 11-Aug-2020, at 8:57 PM, Robert Haas <robertmhaas@gmail.com> wrote: >>> >>> I think this gets to the root of the issue. If we check the flag >>> without a lock, we might see a slightly stale value. But, considering >>> that there's no particular amount of time within which configuration >>> changes are guaranteed to take effect, maybe that's OK. However, there >>> is one potential gotcha here: if the walsender declares the standby to >>> be synchronous, a user can see that, right? So maybe there's this >>> problem: a user sees that the standby is synchronous and expects a >>> transaction committing afterward to provoke a wait, but really it >>> doesn't. Now the user is unhappy, feeling that the system didn't >>> perform according to expectations. >> >> Yes, pg_stat_replication reports a standby in sync as soon as walsender updates priority of the standby to something otherthan 0. >> >> The potential gotcha referred above doesn’t seem too severe. What is the likelihood of someone setting synchronous_standby_namesGUC with either “*” or a standby name and then immediately promoting that standby? If the standbyis promoted before the checkpointer on master gets a chance to update sync_standbys_defined in shared memory, commitsmade during this interval on master may not make it to standby. Upon promotion, those commits may be lost. > > I think that if the standby is quite behind the primary and in case of > the primary crashes, the likelihood of losing commits might get > higher. The user can see the standby became synchronous standby via > pg_stat_replication but commit completes without a wait because the > checkpointer doesn't update sync_standbys_defined yet. If the primary > crashes before standby catching up and the user does failover, the > committed transaction will be lost, even though the user expects that > transaction commit has been replicated to the standby synchronously. > And this can happen even without the patch, right? I think you're right. This issue can happen even without the patch. Maybe we should not mark the standby as "sync" whenever sync_standbys_defined is false even if synchronous_standby_names is actually set and walsenders have already detect that? Or we need more aggressive approach; make the checkpointer update sync_standby_priority values of all the walsenders? ISTM that the latter looks overkill... Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
> On 12-Aug-2020, at 12:02 PM, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Wed, 12 Aug 2020 at 14:06, Asim Praveen <pasim@vmware.com> wrote: >> >> >> >>> On 11-Aug-2020, at 8:57 PM, Robert Haas <robertmhaas@gmail.com> wrote: >>> >>> I think this gets to the root of the issue. If we check the flag >>> without a lock, we might see a slightly stale value. But, considering >>> that there's no particular amount of time within which configuration >>> changes are guaranteed to take effect, maybe that's OK. However, there >>> is one potential gotcha here: if the walsender declares the standby to >>> be synchronous, a user can see that, right? So maybe there's this >>> problem: a user sees that the standby is synchronous and expects a >>> transaction committing afterward to provoke a wait, but really it >>> doesn't. Now the user is unhappy, feeling that the system didn't >>> perform according to expectations. >> >> Yes, pg_stat_replication reports a standby in sync as soon as walsender updates priority of the standby to something otherthan 0. >> >> The potential gotcha referred above doesn’t seem too severe. What is the likelihood of someone setting synchronous_standby_namesGUC with either “*” or a standby name and then immediately promoting that standby? If the standbyis promoted before the checkpointer on master gets a chance to update sync_standbys_defined in shared memory, commitsmade during this interval on master may not make it to standby. Upon promotion, those commits may be lost. > > I think that if the standby is quite behind the primary and in case of > the primary crashes, the likelihood of losing commits might get > higher. The user can see the standby became synchronous standby via > pg_stat_replication but commit completes without a wait because the > checkpointer doesn't update sync_standbys_defined yet. If the primary > crashes before standby catching up and the user does failover, the > committed transaction will be lost, even though the user expects that > transaction commit has been replicated to the standby synchronously. > And this can happen even without the patch, right? > It is correct that the issue is orthogonal to the patch upthread and I’ve marked the commitfest entry as ready-for-committer. Regarding the issue described above, the amount by which the standby is lagging behind the primary does not affect the severity. A standby’s state will be reported as “sync” to the user only after the standby has caught up (state == WALSNDSTATE_STREAMING). The time it takes for the checkpointer to update the sync_standbys_defined flag in shared memory is the important factor. Once checkpointer sets this flag, commits start waiting for the standby (as long as it is in-sync). Asim
At Wed, 19 Aug 2020 13:20:46 +0000, Asim Praveen <pasim@vmware.com> wrote in > > > > On 12-Aug-2020, at 12:02 PM, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Wed, 12 Aug 2020 at 14:06, Asim Praveen <pasim@vmware.com> wrote: > >> > >> > >> > >>> On 11-Aug-2020, at 8:57 PM, Robert Haas <robertmhaas@gmail.com> wrote: > >>> > >>> I think this gets to the root of the issue. If we check the flag > >>> without a lock, we might see a slightly stale value. But, considering > >>> that there's no particular amount of time within which configuration > >>> changes are guaranteed to take effect, maybe that's OK. However, there > >>> is one potential gotcha here: if the walsender declares the standby to > >>> be synchronous, a user can see that, right? So maybe there's this > >>> problem: a user sees that the standby is synchronous and expects a > >>> transaction committing afterward to provoke a wait, but really it > >>> doesn't. Now the user is unhappy, feeling that the system didn't > >>> perform according to expectations. > >> > >> Yes, pg_stat_replication reports a standby in sync as soon as walsender updates priority of the standby to somethingother than 0. > >> > >> The potential gotcha referred above doesn’t seem too severe. What is the likelihood of someone setting synchronous_standby_namesGUC with either “*” or a standby name and then immediately promoting that standby? If the standbyis promoted before the checkpointer on master gets a chance to update sync_standbys_defined in shared memory, commitsmade during this interval on master may not make it to standby. Upon promotion, those commits may be lost. > > > > I think that if the standby is quite behind the primary and in case of > > the primary crashes, the likelihood of losing commits might get > > higher. The user can see the standby became synchronous standby via > > pg_stat_replication but commit completes without a wait because the > > checkpointer doesn't update sync_standbys_defined yet. If the primary > > crashes before standby catching up and the user does failover, the > > committed transaction will be lost, even though the user expects that > > transaction commit has been replicated to the standby synchronously. > > And this can happen even without the patch, right? > > > > It is correct that the issue is orthogonal to the patch upthread and I’ve marked > the commitfest entry as ready-for-committer. I find the name of SyncStandbysDefined macro is very confusing with the struct member sync_standbys_defined, but that might be another issue.. - * Fast exit if user has not requested sync replication. + * Fast exit if user has not requested sync replication, or there are no + * sync replication standby names defined. This comment sounds like we just do that twice. The reason for the check is to avoid wasteful exclusive locks on SyncRepLock, or to form double-checked locking on the variable. I think we should explain that here. > Regarding the issue described above, the amount by which the standby is lagging > behind the primary does not affect the severity. A standby’s state will be > reported as “sync” to the user only after the standby has caught up (state == > WALSNDSTATE_STREAMING). The time it takes for the checkpointer to update the > sync_standbys_defined flag in shared memory is the important factor. Once > checkpointer sets this flag, commits start waiting for the standby (as long as > it is in-sync). CATCHUP state is only entered at replication startup. It stays at STREAMING when sync_sby_names is switched from '' to a valid name, thus sync_state shows 'sync' even if checkpointer hasn't changed sync_standbys_defined. If the standby being switched had a big lag, the chance of losing commits getting larger (up to certain extent) for the same extent of checkpointer lag. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
At Wed, 19 Aug 2020 21:41:03 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in > > > On 2020/08/12 15:32, Masahiko Sawada wrote: > > On Wed, 12 Aug 2020 at 14:06, Asim Praveen <pasim@vmware.com> wrote: > >> > >> > >> > >>> On 11-Aug-2020, at 8:57 PM, Robert Haas <robertmhaas@gmail.com> wrote: > >>> > >>> I think this gets to the root of the issue. If we check the flag > >>> without a lock, we might see a slightly stale value. But, considering > >>> that there's no particular amount of time within which configuration > >>> changes are guaranteed to take effect, maybe that's OK. However, there > >>> is one potential gotcha here: if the walsender declares the standby to > >>> be synchronous, a user can see that, right? So maybe there's this > >>> problem: a user sees that the standby is synchronous and expects a > >>> transaction committing afterward to provoke a wait, but really it > >>> doesn't. Now the user is unhappy, feeling that the system didn't > >>> perform according to expectations. > >> > >> Yes, pg_stat_replication reports a standby in sync as soon as > >> walsender updates priority of the standby to something other than 0. > >> > >> The potential gotcha referred above doesn’t seem too severe. What is > >> the likelihood of someone setting synchronous_standby_names GUC with > >> either “*” or a standby name and then immediately promoting that > >> standby? If the standby is promoted before the checkpointer on master > >> gets a chance to update sync_standbys_defined in shared memory, > >> commits made during this interval on master may not make it to > >> standby. Upon promotion, those commits may be lost. > > I think that if the standby is quite behind the primary and in case of > > the primary crashes, the likelihood of losing commits might get > > higher. The user can see the standby became synchronous standby via > > pg_stat_replication but commit completes without a wait because the > > checkpointer doesn't update sync_standbys_defined yet. If the primary > > crashes before standby catching up and the user does failover, the > > committed transaction will be lost, even though the user expects that > > transaction commit has been replicated to the standby synchronously. > > And this can happen even without the patch, right? > > I think you're right. This issue can happen even without the patch. > > Maybe we should not mark the standby as "sync" whenever > sync_standbys_defined > is false even if synchronous_standby_names is actually set and > walsenders have > already detect that? Or we need more aggressive approach; > make the checkpointer update sync_standby_priority values of > all the walsenders? ISTM that the latter looks overkill... It seems to me that the issue here is what pg_stat_replication.sync_status doens't show "the working state of the walsdner", but "the state the walsender is commanded". Non-zero WalSnd.sync_standby_priority is immediately considered as "I am in sync" but actually it is "I am going to sync from async or am already in sync". And it is precisely "..., or am already in sync if checkpointer already notices any sync walsender exists". 1. if a walsender changes its state from async to sync, it should once change its state back to "CATCHUP" or something like that. 2. pg_stat_replication.sync_status need to consider WalSnd.state and WalSndCtlData.sync_standbys_defined. We might be able to let SyncRepUpdateSyncStandbysDefined postpone changing sync_standbys_defined until any sync standby actually comes, but it would be complex. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Wed, 19 Aug 2020 at 21:41, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > > > On 2020/08/12 15:32, Masahiko Sawada wrote: > > On Wed, 12 Aug 2020 at 14:06, Asim Praveen <pasim@vmware.com> wrote: > >> > >> > >> > >>> On 11-Aug-2020, at 8:57 PM, Robert Haas <robertmhaas@gmail.com> wrote: > >>> > >>> I think this gets to the root of the issue. If we check the flag > >>> without a lock, we might see a slightly stale value. But, considering > >>> that there's no particular amount of time within which configuration > >>> changes are guaranteed to take effect, maybe that's OK. However, there > >>> is one potential gotcha here: if the walsender declares the standby to > >>> be synchronous, a user can see that, right? So maybe there's this > >>> problem: a user sees that the standby is synchronous and expects a > >>> transaction committing afterward to provoke a wait, but really it > >>> doesn't. Now the user is unhappy, feeling that the system didn't > >>> perform according to expectations. > >> > >> Yes, pg_stat_replication reports a standby in sync as soon as walsender updates priority of the standby to somethingother than 0. > >> > >> The potential gotcha referred above doesn’t seem too severe. What is the likelihood of someone setting synchronous_standby_namesGUC with either “*” or a standby name and then immediately promoting that standby? If the standbyis promoted before the checkpointer on master gets a chance to update sync_standbys_defined in shared memory, commitsmade during this interval on master may not make it to standby. Upon promotion, those commits may be lost. > > > > I think that if the standby is quite behind the primary and in case of > > the primary crashes, the likelihood of losing commits might get > > higher. The user can see the standby became synchronous standby via > > pg_stat_replication but commit completes without a wait because the > > checkpointer doesn't update sync_standbys_defined yet. If the primary > > crashes before standby catching up and the user does failover, the > > committed transaction will be lost, even though the user expects that > > transaction commit has been replicated to the standby synchronously. > > And this can happen even without the patch, right? > > I think you're right. This issue can happen even without the patch. > > Maybe we should not mark the standby as "sync" whenever sync_standbys_defined > is false even if synchronous_standby_names is actually set and walsenders have > already detect that? It seems good. I guess that we can set 'async' to sync_status and 0 to sync_priority when sync_standbys_defined is not true regardless of walsender's actual priority value. We print the message "standby \"%s\" now has synchronous standby priority %u" in SyncRepInitConfig() regardless of sync_standbys_defined but maybe it's fine as the message isn't incorrect and it's DEBUG1 message. > Or we need more aggressive approach; > make the checkpointer update sync_standby_priority values of > all the walsenders? ISTM that the latter looks overkill... I think so too. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2020/08/20 11:29, Kyotaro Horiguchi wrote: > At Wed, 19 Aug 2020 13:20:46 +0000, Asim Praveen <pasim@vmware.com> wrote in >> >> >>> On 12-Aug-2020, at 12:02 PM, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: >>> >>> On Wed, 12 Aug 2020 at 14:06, Asim Praveen <pasim@vmware.com> wrote: >>>> >>>> >>>> >>>>> On 11-Aug-2020, at 8:57 PM, Robert Haas <robertmhaas@gmail.com> wrote: >>>>> >>>>> I think this gets to the root of the issue. If we check the flag >>>>> without a lock, we might see a slightly stale value. But, considering >>>>> that there's no particular amount of time within which configuration >>>>> changes are guaranteed to take effect, maybe that's OK. However, there >>>>> is one potential gotcha here: if the walsender declares the standby to >>>>> be synchronous, a user can see that, right? So maybe there's this >>>>> problem: a user sees that the standby is synchronous and expects a >>>>> transaction committing afterward to provoke a wait, but really it >>>>> doesn't. Now the user is unhappy, feeling that the system didn't >>>>> perform according to expectations. >>>> >>>> Yes, pg_stat_replication reports a standby in sync as soon as walsender updates priority of the standby to somethingother than 0. >>>> >>>> The potential gotcha referred above doesn’t seem too severe. What is the likelihood of someone setting synchronous_standby_namesGUC with either “*” or a standby name and then immediately promoting that standby? If the standbyis promoted before the checkpointer on master gets a chance to update sync_standbys_defined in shared memory, commitsmade during this interval on master may not make it to standby. Upon promotion, those commits may be lost. >>> >>> I think that if the standby is quite behind the primary and in case of >>> the primary crashes, the likelihood of losing commits might get >>> higher. The user can see the standby became synchronous standby via >>> pg_stat_replication but commit completes without a wait because the >>> checkpointer doesn't update sync_standbys_defined yet. If the primary >>> crashes before standby catching up and the user does failover, the >>> committed transaction will be lost, even though the user expects that >>> transaction commit has been replicated to the standby synchronously. >>> And this can happen even without the patch, right? >>> >> >> It is correct that the issue is orthogonal to the patch upthread and I’ve marked >> the commitfest entry as ready-for-committer. Yes, thanks for the review! > I find the name of SyncStandbysDefined macro is very confusing with > the struct member sync_standbys_defined, but that might be another > issue.. > > - * Fast exit if user has not requested sync replication. > + * Fast exit if user has not requested sync replication, or there are no > + * sync replication standby names defined. > > This comment sounds like we just do that twice. The reason for the > check is to avoid wasteful exclusive locks on SyncRepLock, or to form > double-checked locking on the variable. I think we should explain that > here. I added the following comments based on the suggestion by Sawada-san upthread. Thought? + * Since this routine gets called every commit time, it's important to + * exit quickly if sync replication is not requested. So we check + * WalSndCtl->sync_standbys_define without the lock and exit + * immediately if it's false. If it's true, we check it again later + * while holding the lock, to avoid the race condition described + * in SyncRepUpdateSyncStandbysDefined(). Attached is the updated version of the patch. I didn't change how to fix the issue. But I changed the check for fast exit so that it's called before setting the "mode", to avoid a few cycle. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Attachment
> On 26-Aug-2020, at 11:10 PM, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > I added the following comments based on the suggestion by Sawada-san upthread. Thought? > > + * Since this routine gets called every commit time, it's important to > + * exit quickly if sync replication is not requested. So we check > + * WalSndCtl->sync_standbys_define without the lock and exit > + * immediately if it's false. If it's true, we check it again later > + * while holding the lock, to avoid the race condition described > + * in SyncRepUpdateSyncStandbysDefined(). > +1. May I suggest the following addition to the above comment (feel free to rephrase / reject)? "If sync_standbys_defined was being set from false to true and we observe it as false, it ok to skip the wait. Replication was async and it is in the process of being changed to sync, due to user request. Subsequent commits will observe the change and start waiting.” > > Attached is the updated version of the patch. I didn't change how to > fix the issue. But I changed the check for fast exit so that it's called > before setting the "mode", to avoid a few cycle. > Looks good to me. There is a typo in the comment: s/sync_standbys_define/sync_standbys_defined/ Asim
On 2020/08/27 15:59, Asim Praveen wrote: > >> On 26-Aug-2020, at 11:10 PM, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >> >> I added the following comments based on the suggestion by Sawada-san upthread. Thought? >> >> + * Since this routine gets called every commit time, it's important to >> + * exit quickly if sync replication is not requested. So we check >> + * WalSndCtl->sync_standbys_define without the lock and exit >> + * immediately if it's false. If it's true, we check it again later >> + * while holding the lock, to avoid the race condition described >> + * in SyncRepUpdateSyncStandbysDefined(). >> > > +1. May I suggest the following addition to the above comment (feel free to > rephrase / reject)? > > "If sync_standbys_defined was being set from false to true and we observe it as > false, it ok to skip the wait. Replication was async and it is in the process > of being changed to sync, due to user request. Subsequent commits will observe > the change and start waiting.” Thanks for the suggestion! I'm not sure if it's worth adding this because it seems obvious thing. But maybe you imply that we need to comment why the lock is not necessary when sync_standbys_defined is false. Right? If so, what about updating the comments as follows? + * Since this routine gets called every commit time, it's important to + * exit quickly if sync replication is not requested. So we check + * WalSndCtl->sync_standbys_defined flag without the lock and exit + * immediately if it's false. If it's true, we need to check it again later + * while holding the lock, to check the flag and operate the sync rep + * queue atomically. This is necessary to avoid the race condition + * described in SyncRepUpdateSyncStandbysDefined(). On the other + * hand, if it's false, the lock is not necessary because we don't touch + * the queue. > >> >> Attached is the updated version of the patch. I didn't change how to >> fix the issue. But I changed the check for fast exit so that it's called >> before setting the "mode", to avoid a few cycle. >> > > > Looks good to me. There is a typo in the comment: > > s/sync_standbys_define/sync_standbys_defined/ Fixed. Thanks! Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Attachment
> On 28-Aug-2020, at 7:03 AM, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > On 2020/08/27 15:59, Asim Praveen wrote: >> >> +1. May I suggest the following addition to the above comment (feel free to >> rephrase / reject)? >> "If sync_standbys_defined was being set from false to true and we observe it as >> false, it ok to skip the wait. Replication was async and it is in the process >> of being changed to sync, due to user request. Subsequent commits will observe >> the change and start waiting.” > > Thanks for the suggestion! I'm not sure if it's worth adding this because > it seems obvious thing. But maybe you imply that we need to comment > why the lock is not necessary when sync_standbys_defined is false. Right? > If so, what about updating the comments as follows? > > + * Since this routine gets called every commit time, it's important to > + * exit quickly if sync replication is not requested. So we check > + * WalSndCtl->sync_standbys_defined flag without the lock and exit > + * immediately if it's false. If it's true, we need to check it again later > + * while holding the lock, to check the flag and operate the sync rep > + * queue atomically. This is necessary to avoid the race condition > + * described in SyncRepUpdateSyncStandbysDefined(). On the other > + * hand, if it's false, the lock is not necessary because we don't touch > + * the queue. Thank you for updating the comment. This looks better. Asim
On Fri, 28 Aug 2020 at 10:33, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > > > On 2020/08/27 15:59, Asim Praveen wrote: > > > >> On 26-Aug-2020, at 11:10 PM, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > >> > >> I added the following comments based on the suggestion by Sawada-san upthread. Thought? > >> > >> + * Since this routine gets called every commit time, it's important to > >> + * exit quickly if sync replication is not requested. So we check > >> + * WalSndCtl->sync_standbys_define without the lock and exit > >> + * immediately if it's false. If it's true, we check it again later > >> + * while holding the lock, to avoid the race condition described > >> + * in SyncRepUpdateSyncStandbysDefined(). > >> > > > > +1. May I suggest the following addition to the above comment (feel free to > > rephrase / reject)? > > > > "If sync_standbys_defined was being set from false to true and we observe it as > > false, it ok to skip the wait. Replication was async and it is in the process > > of being changed to sync, due to user request. Subsequent commits will observe > > the change and start waiting.” > > Thanks for the suggestion! I'm not sure if it's worth adding this because > it seems obvious thing. But maybe you imply that we need to comment > why the lock is not necessary when sync_standbys_defined is false. Right? > If so, what about updating the comments as follows? > > + * Since this routine gets called every commit time, it's important to > + * exit quickly if sync replication is not requested. So we check > + * WalSndCtl->sync_standbys_defined flag without the lock and exit > + * immediately if it's false. If it's true, we need to check it again later > + * while holding the lock, to check the flag and operate the sync rep > + * queue atomically. This is necessary to avoid the race condition > + * described in SyncRepUpdateSyncStandbysDefined(). On the other > + * hand, if it's false, the lock is not necessary because we don't touch > + * the queue. > > > > >> > >> Attached is the updated version of the patch. I didn't change how to > >> fix the issue. But I changed the check for fast exit so that it's called > >> before setting the "mode", to avoid a few cycle. > >> > > > > > > Looks good to me. There is a typo in the comment: > > > > s/sync_standbys_define/sync_standbys_defined/ > > Fixed. Thanks! > Both v2 and v3 look good to me too. IMO I'm okay with and without the last sentence. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2020/08/28 21:20, Masahiko Sawada wrote: > On Fri, 28 Aug 2020 at 10:33, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >> >> >> >> On 2020/08/27 15:59, Asim Praveen wrote: >>> >>>> On 26-Aug-2020, at 11:10 PM, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >>>> >>>> I added the following comments based on the suggestion by Sawada-san upthread. Thought? >>>> >>>> + * Since this routine gets called every commit time, it's important to >>>> + * exit quickly if sync replication is not requested. So we check >>>> + * WalSndCtl->sync_standbys_define without the lock and exit >>>> + * immediately if it's false. If it's true, we check it again later >>>> + * while holding the lock, to avoid the race condition described >>>> + * in SyncRepUpdateSyncStandbysDefined(). >>>> >>> >>> +1. May I suggest the following addition to the above comment (feel free to >>> rephrase / reject)? >>> >>> "If sync_standbys_defined was being set from false to true and we observe it as >>> false, it ok to skip the wait. Replication was async and it is in the process >>> of being changed to sync, due to user request. Subsequent commits will observe >>> the change and start waiting.” >> >> Thanks for the suggestion! I'm not sure if it's worth adding this because >> it seems obvious thing. But maybe you imply that we need to comment >> why the lock is not necessary when sync_standbys_defined is false. Right? >> If so, what about updating the comments as follows? >> >> + * Since this routine gets called every commit time, it's important to >> + * exit quickly if sync replication is not requested. So we check >> + * WalSndCtl->sync_standbys_defined flag without the lock and exit >> + * immediately if it's false. If it's true, we need to check it again later >> + * while holding the lock, to check the flag and operate the sync rep >> + * queue atomically. This is necessary to avoid the race condition >> + * described in SyncRepUpdateSyncStandbysDefined(). On the other >> + * hand, if it's false, the lock is not necessary because we don't touch >> + * the queue. >> >>> >>>> >>>> Attached is the updated version of the patch. I didn't change how to >>>> fix the issue. But I changed the check for fast exit so that it's called >>>> before setting the "mode", to avoid a few cycle. >>>> >>> >>> >>> Looks good to me. There is a typo in the comment: >>> >>> s/sync_standbys_define/sync_standbys_defined/ >> >> Fixed. Thanks! >> > > Both v2 and v3 look good to me too. IMO I'm okay with and without the > last sentence. Asim and Sawada-san, thanks for the review! I pushed the patch. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2020-09-02 10:58:58 +0900, Fujii Masao wrote: > Asim and Sawada-san, thanks for the review! I pushed the patch. Thanks for all your combined work!