Thread: Problem with synchronous replication
Hi,
I recently discovered two possible bugs about synchronous replication.
1. SyncRepCleanupAtProcExit may delete an element that has been deleted
SyncRepCleanupAtProcExit first checks whether the queue is detached, if it is not detached,
acquires the SyncRepLock lock and deletes it. If this element has been deleted by walsender,
it will be deleted repeatedly, SHMQueueDelete will core with a segment fault.
IMO, like SyncRepCancelWait, we should lock the SyncRepLock first and then check
whether the queue is detached or not.
2. SyncRepWaitForLSN may not call SyncRepCancelWait if ereport check one interrupt.
For SyncRepWaitForLSN, if a query cancel interrupt arrives, we just terminate the wait
with suitable warning. As follows:
a. set QueryCancelPending to false
b. errport outputs one warning
c. calls SyncRepCancelWait to delete one element from the queue
If another cancel interrupt arrives when we are outputting warning at step b, the errfinish
will call CHECK_FOR_INTERRUPTS that will output an ERROR, such as "canceling autovacuum
task", then the process will jump to the sigsetjmp. Unfortunately, the step c will be skipped
and the element that should be deleted by SyncRepCancelWait is remained.
The easiest way to fix this is to swap the order of step b and step c. On the other hand,
let sigsetjmp clean up the queue may also be a good choice. What do you think?
Attached the patch, any feedback is greatly appreciated.
Best regards,
--
Dongming Liu
Attachment
Can someone help me to confirm that these two problems are bugs?
If they are bugs, please help review the patch or provide better fix suggestions.
Thanks.
Best regards,
--
Dongming Liu
------------------------------------------------------------------From:LIU Dongming <lingce.ldm@alibaba-inc.com>Sent At:2019 Oct. 25 (Fri.) 15:18To:pgsql-hackers <pgsql-hackers@postgresql.org>Subject:Problem with synchronous replicationHi,I recently discovered two possible bugs about synchronous replication.1. SyncRepCleanupAtProcExit may delete an element that has been deletedSyncRepCleanupAtProcExit first checks whether the queue is detached, if it is not detached,acquires the SyncRepLock lock and deletes it. If this element has been deleted by walsender,it will be deleted repeatedly, SHMQueueDelete will core with a segment fault.IMO, like SyncRepCancelWait, we should lock the SyncRepLock first and then checkwhether the queue is detached or not.2. SyncRepWaitForLSN may not call SyncRepCancelWait if ereport check one interrupt.For SyncRepWaitForLSN, if a query cancel interrupt arrives, we just terminate the waitwith suitable warning. As follows:a. set QueryCancelPending to falseb. errport outputs one warningc. calls SyncRepCancelWait to delete one element from the queueIf another cancel interrupt arrives when we are outputting warning at step b, the errfinishwill call CHECK_FOR_INTERRUPTS that will output an ERROR, such as "canceling autovacuumtask", then the process will jump to the sigsetjmp. Unfortunately, the step c will be skippedand the element that should be deleted by SyncRepCancelWait is remained.The easiest way to fix this is to swap the order of step b and step c. On the other hand,let sigsetjmp clean up the queue may also be a good choice. What do you think?Attached the patch, any feedback is greatly appreciated.Best regards,--Dongming Liu
Hello. At Fri, 25 Oct 2019 15:18:34 +0800, "Dongming Liu" <lingce.ldm@alibaba-inc.com> wrote in > > Hi, > > I recently discovered two possible bugs about synchronous replication. > > 1. SyncRepCleanupAtProcExit may delete an element that has been deleted > SyncRepCleanupAtProcExit first checks whether the queue is detached, if it is not detached, > acquires the SyncRepLock lock and deletes it. If this element has been deleted by walsender, > it will be deleted repeatedly, SHMQueueDelete will core with a segment fault. > > IMO, like SyncRepCancelWait, we should lock the SyncRepLock first and then check > whether the queue is detached or not. I think you're right here. > 2. SyncRepWaitForLSN may not call SyncRepCancelWait if ereport check one interrupt. > For SyncRepWaitForLSN, if a query cancel interrupt arrives, we just terminate the wait > with suitable warning. As follows: > > a. set QueryCancelPending to false > b. errport outputs one warning > c. calls SyncRepCancelWait to delete one element from the queue > > If another cancel interrupt arrives when we are outputting warning at step b, the errfinish > will call CHECK_FOR_INTERRUPTS that will output an ERROR, such as "canceling autovacuum > task", then the process will jump to the sigsetjmp. Unfortunately, the step c will be skipped > and the element that should be deleted by SyncRepCancelWait is remained. > > The easiest way to fix this is to swap the order of step b and step c. On the other hand, > let sigsetjmp clean up the queue may also be a good choice. What do you think? > > Attached the patch, any feedback is greatly appreciated. This is not right. It is in transaction commit so it is in a HOLD_INTERRUPTS section. ProcessInterrupt does not respond to cancel/die interrupts thus the ereport should return. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Tue, Oct 29, 2019 at 01:40:41PM +0800, Dongming Liu wrote: > Can someone help me to confirm that these two problems are bugs? > If they are bugs, please help review the patch or provide better fix > suggestions. There is no need to send periodic pings. Sometimes it takes time to get replies as time is an important resource that is always limited. I can see that Horiguchi-san has already provided some feedback, and I am looking now at your suggestions. -- Michael
Attachment
On Tue, Oct 29, 2019 at 07:50:01PM +0900, Kyotaro Horiguchi wrote: > At Fri, 25 Oct 2019 15:18:34 +0800, "Dongming Liu" <lingce.ldm@alibaba-inc.com> wrote in >> I recently discovered two possible bugs about synchronous replication. >> >> 1. SyncRepCleanupAtProcExit may delete an element that has been deleted >> SyncRepCleanupAtProcExit first checks whether the queue is detached, if it is not detached, >> acquires the SyncRepLock lock and deletes it. If this element has been deleted by walsender, >> it will be deleted repeatedly, SHMQueueDelete will core with a segment fault. >> >> IMO, like SyncRepCancelWait, we should lock the SyncRepLock first and then check >> whether the queue is detached or not. > > I think you're right here. Indeed. Looking at the surroundings we expect some code paths to hold SyncRepLock in exclusive or shared mode but we don't actually check that the lock is hold. So let's add some assertions while on it. > This is not right. It is in transaction commit so it is in a > HOLD_INTERRUPTS section. ProcessInterrupt does not respond to > cancel/die interrupts thus the ereport should return. Yeah. There is an easy way to check after that: InterruptHoldoffCount needs to be strictly positive. My suggestions are attached. Any thoughts? -- Michael
Attachment
At Wed, 30 Oct 2019 10:45:11 +0900, Michael Paquier <michael@paquier.xyz> wrote in > On Tue, Oct 29, 2019 at 07:50:01PM +0900, Kyotaro Horiguchi wrote: > > At Fri, 25 Oct 2019 15:18:34 +0800, "Dongming Liu" <lingce.ldm@alibaba-inc.com> wrote in > >> I recently discovered two possible bugs about synchronous replication. > >> > >> 1. SyncRepCleanupAtProcExit may delete an element that has been deleted > >> SyncRepCleanupAtProcExit first checks whether the queue is detached, if it is not detached, > >> acquires the SyncRepLock lock and deletes it. If this element has been deleted by walsender, > >> it will be deleted repeatedly, SHMQueueDelete will core with a segment fault. > >> > >> IMO, like SyncRepCancelWait, we should lock the SyncRepLock first and then check > >> whether the queue is detached or not. > > > > I think you're right here. > > Indeed. Looking at the surroundings we expect some code paths to hold > SyncRepLock in exclusive or shared mode but we don't actually check > that the lock is hold. So let's add some assertions while on it. I looked around closer. If we do that strictly, other functions like SyncRepGetOldestSyncRecPtr need the same Assert()s. I think static functions don't need Assert() and caution in their comments would be enough. On the other hand, the similar-looking code in SyncRepInitConfig and SyncRepUpdateSyncStandbysDefined seems safe since AFAICS it doesn't have (this kind of) racing condition on wirtes. It might need a comment like that. Or, we could go to (apparently) safer-side by applying the same amendment to it. SyncRepReleaseWaiters reads MyWalSnd->sync_standby_priority without holding SyncRepLock, which could lead to a message with wrong priority. I'm not sure it matters, though. > > This is not right. It is in transaction commit so it is in a > > HOLD_INTERRUPTS section. ProcessInterrupt does not respond to > > cancel/die interrupts thus the ereport should return. > > Yeah. There is an easy way to check after that: InterruptHoldoffCount > needs to be strictly positive. > > My suggestions are attached. Any thoughts? Seems reasonable for holdoffs. The same assertion would be needed in more places but it's another issue. By the way while I was looking this, I found a typo. Please find the attached. regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c index a21f7d3347..16aee1de4c 100644 --- a/src/backend/replication/syncrep.c +++ b/src/backend/replication/syncrep.c @@ -1065,8 +1065,8 @@ SyncRepUpdateSyncStandbysDefined(void) /* * If synchronous_standby_names has been reset to empty, it's futile - * for backends to continue to waiting. Since the user no longer - * wants synchronous replication, we'd better wake them up. + * for backends to continue waiting. Since the user no longer wants + * synchronous replication, we'd better wake them up. */ if (!sync_standbys_defined) {
On Wed, Oct 30, 2019 at 9:12 AM Michael Paquier <michael@paquier.xyz> wrote:
On Tue, Oct 29, 2019 at 01:40:41PM +0800, Dongming Liu wrote:
> Can someone help me to confirm that these two problems are bugs?
> If they are bugs, please help review the patch or provide better fix
> suggestions.
There is no need to send periodic pings. Sometimes it takes time to
get replies as time is an important resource that is always limited.
Thank you for your reply. I also realized my mistake, thank you for correcting me.
I can see that Horiguchi-san has already provided some feedback, and I
am looking now at your suggestions.
Thanks again.
On Oct 29, 2019, at 18:50, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
Thanks.
Hello.
At Fri, 25 Oct 2019 15:18:34 +0800, "Dongming Liu" <lingce.ldm@alibaba-inc.com> wrote in
Hi,
I recently discovered two possible bugs about synchronous replication.
1. SyncRepCleanupAtProcExit may delete an element that has been deleted
SyncRepCleanupAtProcExit first checks whether the queue is detached, if it is not detached,
acquires the SyncRepLock lock and deletes it. If this element has been deleted by walsender,
it will be deleted repeatedly, SHMQueueDelete will core with a segment fault.
IMO, like SyncRepCancelWait, we should lock the SyncRepLock first and then check
whether the queue is detached or not.
I think you're right here.
2. SyncRepWaitForLSN may not call SyncRepCancelWait if ereport check one interrupt.
For SyncRepWaitForLSN, if a query cancel interrupt arrives, we just terminate the wait
with suitable warning. As follows:
a. set QueryCancelPending to false
b. errport outputs one warning
c. calls SyncRepCancelWait to delete one element from the queue
If another cancel interrupt arrives when we are outputting warning at step b, the errfinish
will call CHECK_FOR_INTERRUPTS that will output an ERROR, such as "canceling autovacuum
task", then the process will jump to the sigsetjmp. Unfortunately, the step c will be skipped
and the element that should be deleted by SyncRepCancelWait is remained.
The easiest way to fix this is to swap the order of step b and step c. On the other hand,
let sigsetjmp clean up the queue may also be a good choice. What do you think?
Attached the patch, any feedback is greatly appreciated.
This is not right. It is in transaction commit so it is in a
HOLD_INTERRUPTS section. ProcessInterrupt does not respond to
cancel/die interrupts thus the ereport should return.
I read the relevant code, you are right. I called SyncRepWaitForLSN somewhere else,
but forgot to put it in a HOLD_INTERRUPTS and triggered an exception.
regards.
—
Dongming Liu
Attachment
On Oct 30, 2019, at 09:45, Michael Paquier <michael@paquier.xyz> wrote:
Thanks, this patch looks good to me.
On Tue, Oct 29, 2019 at 07:50:01PM +0900, Kyotaro Horiguchi wrote:At Fri, 25 Oct 2019 15:18:34 +0800, "Dongming Liu" <lingce.ldm@alibaba-inc.com> wrote inI recently discovered two possible bugs about synchronous replication.
1. SyncRepCleanupAtProcExit may delete an element that has been deleted
SyncRepCleanupAtProcExit first checks whether the queue is detached, if it is not detached,
acquires the SyncRepLock lock and deletes it. If this element has been deleted by walsender,
it will be deleted repeatedly, SHMQueueDelete will core with a segment fault.
IMO, like SyncRepCancelWait, we should lock the SyncRepLock first and then check
whether the queue is detached or not.
I think you're right here.
Indeed. Looking at the surroundings we expect some code paths to hold
SyncRepLock in exclusive or shared mode but we don't actually check
that the lock is hold. So let's add some assertions while on it.This is not right. It is in transaction commit so it is in a
HOLD_INTERRUPTS section. ProcessInterrupt does not respond to
cancel/die interrupts thus the ereport should return.
Yeah. There is an easy way to check after that: InterruptHoldoffCount
needs to be strictly positive.
My suggestions are attached. Any thoughts?
Thanks, this patch looks good to me.
Attachment
On Wed, Oct 30, 2019 at 4:16 PM lingce.ldm <lingce.ldm@alibaba-inc.com> wrote: > > On Oct 29, 2019, at 18:50, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > > Hello. > > At Fri, 25 Oct 2019 15:18:34 +0800, "Dongming Liu" <lingce.ldm@alibaba-inc.com> wrote in > > > Hi, > > I recently discovered two possible bugs about synchronous replication. > > 1. SyncRepCleanupAtProcExit may delete an element that has been deleted > SyncRepCleanupAtProcExit first checks whether the queue is detached, if it is not detached, > acquires the SyncRepLock lock and deletes it. If this element has been deleted by walsender, > it will be deleted repeatedly, SHMQueueDelete will core with a segment fault. > > IMO, like SyncRepCancelWait, we should lock the SyncRepLock first and then check > whether the queue is detached or not. > > > I think you're right here. This change causes every ending backends to always take the exclusive lock even when it's not in SyncRep queue. This may be problematic, for example, when terminating multiple backends at the same time? If yes, it might be better to check SHMQueueIsDetached() again after taking the lock. That is, if (!SHMQueueIsDetached(&(MyProc->syncRepLinks))) { LWLockAcquire(SyncRepLock, LW_EXCLUSIVE); if (!SHMQueueIsDetached(&(MyProc->syncRepLinks))) SHMQueueDelete(&(MyProc->syncRepLinks)); LWLockRelease(SyncRepLock); } Regards, -- Fujii Masao
Hello. At Wed, 30 Oct 2019 17:21:17 +0900, Fujii Masao <masao.fujii@gmail.com> wrote in > This change causes every ending backends to always take the exclusive lock > even when it's not in SyncRep queue. This may be problematic, for example, > when terminating multiple backends at the same time? If yes, > it might be better to check SHMQueueIsDetached() again after taking the lock. > That is, I'm not sure how much that harms but double-checked locking (releasing) is simple enough for reducing possible congestion here, I think. In short, + 1 for that. > if (!SHMQueueIsDetached(&(MyProc->syncRepLinks))) > { > LWLockAcquire(SyncRepLock, LW_EXCLUSIVE); > if (!SHMQueueIsDetached(&(MyProc->syncRepLinks))) > SHMQueueDelete(&(MyProc->syncRepLinks)); > LWLockRelease(SyncRepLock); > } regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Wed, Oct 30, 2019 at 05:21:17PM +0900, Fujii Masao wrote: > This change causes every ending backends to always take the exclusive lock > even when it's not in SyncRep queue. This may be problematic, for example, > when terminating multiple backends at the same time? If yes, > it might be better to check SHMQueueIsDetached() again after taking the lock. > That is, > > if (!SHMQueueIsDetached(&(MyProc->syncRepLinks))) > { > LWLockAcquire(SyncRepLock, LW_EXCLUSIVE); > if (!SHMQueueIsDetached(&(MyProc->syncRepLinks))) > SHMQueueDelete(&(MyProc->syncRepLinks)); > LWLockRelease(SyncRepLock); > } Makes sense. Thanks for the suggestion. -- Michael
Attachment
On Wed, Oct 30, 2019 at 12:34:28PM +0900, Kyotaro Horiguchi wrote: > If we do that strictly, other functions like > SyncRepGetOldestSyncRecPtr need the same Assert()s. I think static > functions don't need Assert() and caution in their comments would be > enough. Perhaps. I'd rather be careful though if we meddle again with this code in the future as it is shared across multiple places and callers. > SyncRepReleaseWaiters reads MyWalSnd->sync_standby_priority without > holding SyncRepLock, which could lead to a message with wrong > priority. I'm not sure it matters, though. The WAL sender is the only writer of its info in shared memory, so there is no problem to have it read data without its spin lock hold. > Seems reasonable for holdoffs. The same assertion would be needed in > more places but it's another issue. Sure. > By the way while I was looking this, I found a typo. Please find the > attached. Thanks, applied that one. -- Michael
Attachment
On Wed, Oct 30, 2019 at 05:43:04PM +0900, Kyotaro Horiguchi wrote: > At Wed, 30 Oct 2019 17:21:17 +0900, Fujii Masao <masao.fujii@gmail.com> wrote in >> This change causes every ending backends to always take the exclusive lock >> even when it's not in SyncRep queue. This may be problematic, for example, >> when terminating multiple backends at the same time? If yes, >> it might be better to check SHMQueueIsDetached() again after taking the lock. >> That is, > > I'm not sure how much that harms but double-checked locking > (releasing) is simple enough for reducing possible congestion here, I > think. FWIW, I could not measure any actual difference with pgbench -C, up to 500 sessions and an empty input file (just have one meta-command) and -c 20. I have added some comments in SyncRepCleanupAtProcExit(), and adjusted the patch with the suggestion from Fujii-san. Any comments? -- Michael
Attachment
On Thu, Oct 31, 2019 at 11:12 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Wed, Oct 30, 2019 at 05:43:04PM +0900, Kyotaro Horiguchi wrote: > > At Wed, 30 Oct 2019 17:21:17 +0900, Fujii Masao <masao.fujii@gmail.com> wrote in > >> This change causes every ending backends to always take the exclusive lock > >> even when it's not in SyncRep queue. This may be problematic, for example, > >> when terminating multiple backends at the same time? If yes, > >> it might be better to check SHMQueueIsDetached() again after taking the lock. > >> That is, > > > > I'm not sure how much that harms but double-checked locking > > (releasing) is simple enough for reducing possible congestion here, I > > think. > > FWIW, I could not measure any actual difference with pgbench -C, up to > 500 sessions and an empty input file (just have one meta-command) and > -c 20. > > I have added some comments in SyncRepCleanupAtProcExit(), and adjusted > the patch with the suggestion from Fujii-san. Any comments? Thanks for the patch! Looks good to me. Regards, -- Fujii Masao
On Oct 31, 2019, at 10:11, Michael Paquier <michael@paquier.xyz> wrote: > > On Wed, Oct 30, 2019 at 05:43:04PM +0900, Kyotaro Horiguchi wrote: >> At Wed, 30 Oct 2019 17:21:17 +0900, Fujii Masao <masao.fujii@gmail.com> wrote in >>> This change causes every ending backends to always take the exclusive lock >>> even when it's not in SyncRep queue. This may be problematic, for example, >>> when terminating multiple backends at the same time? If yes, >>> it might be better to check SHMQueueIsDetached() again after taking the lock. >>> That is, >> >> I'm not sure how much that harms but double-checked locking >> (releasing) is simple enough for reducing possible congestion here, I >> think. > > FWIW, I could not measure any actual difference with pgbench -C, up to > 500 sessions and an empty input file (just have one meta-command) and > -c 20. > > I have added some comments in SyncRepCleanupAtProcExit(), and adjusted > the patch with the suggestion from Fujii-san. Any comments? Thanks for the patch. Looks good to me +1. Regards, — Dongming Liu
Attachment
On Thu, Oct 31, 2019 at 05:38:32PM +0900, Fujii Masao wrote: > Thanks for the patch! Looks good to me. Thanks. I have applied the core fix down to 9.4, leaving the new assertion improvements only for HEAD. -- Michael