Thread: Problem with synchronous replication

Problem with synchronous replication

From
"Dongming Liu"
Date:

Hi,

I recently discovered two possible bugs about synchronous replication.

1. SyncRepCleanupAtProcExit may delete an element that has been deleted
SyncRepCleanupAtProcExit first checks whether the queue is detached, if it is not detached, 
acquires the SyncRepLock lock and deletes it. If this element has been deleted by walsender, 
it will be deleted repeatedly, SHMQueueDelete will core with a segment fault. 

IMO, like SyncRepCancelWait, we should lock the SyncRepLock first and then check
whether the queue is detached or not.


2. SyncRepWaitForLSN may not call SyncRepCancelWait if ereport check one interrupt.
For SyncRepWaitForLSN, if a query cancel interrupt arrives, we just terminate the wait 
with suitable warning. As follows:

a. set QueryCancelPending to false
b. errport outputs one warning
c. calls SyncRepCancelWait to delete one element from the queue

If another cancel interrupt arrives when we are outputting warning at step b, the errfinish
will call CHECK_FOR_INTERRUPTS that will output an ERROR, such as "canceling autovacuum
task", then the process will jump to the sigsetjmp. Unfortunately, the step c will be skipped
and the element that should be deleted by SyncRepCancelWait is remained.

The easiest way to fix this is to swap the order of step b and step c. On the other hand, 
let sigsetjmp clean up the queue may also be a good choice. What do you think?

Attached the patch, any feedback is greatly appreciated.

Best regards,
--
Dongming Liu
Attachment

Re: Problem with synchronous replication

From
"Dongming Liu"
Date:
Can someone help me to confirm that these two problems are bugs?
If they are bugs, please help review the patch or provide better fix suggestions.
Thanks.

Best regards,
--
Dongming Liu
------------------------------------------------------------------
From:LIU Dongming <lingce.ldm@alibaba-inc.com>
Sent At:2019 Oct. 25 (Fri.) 15:18
To:pgsql-hackers <pgsql-hackers@postgresql.org>
Subject:Problem with synchronous replication


Hi,

I recently discovered two possible bugs about synchronous replication.

1. SyncRepCleanupAtProcExit may delete an element that has been deleted
SyncRepCleanupAtProcExit first checks whether the queue is detached, if it is not detached, 
acquires the SyncRepLock lock and deletes it. If this element has been deleted by walsender, 
it will be deleted repeatedly, SHMQueueDelete will core with a segment fault. 

IMO, like SyncRepCancelWait, we should lock the SyncRepLock first and then check
whether the queue is detached or not.


2. SyncRepWaitForLSN may not call SyncRepCancelWait if ereport check one interrupt.
For SyncRepWaitForLSN, if a query cancel interrupt arrives, we just terminate the wait 
with suitable warning. As follows:

a. set QueryCancelPending to false
b. errport outputs one warning
c. calls SyncRepCancelWait to delete one element from the queue

If another cancel interrupt arrives when we are outputting warning at step b, the errfinish
will call CHECK_FOR_INTERRUPTS that will output an ERROR, such as "canceling autovacuum
task", then the process will jump to the sigsetjmp. Unfortunately, the step c will be skipped
and the element that should be deleted by SyncRepCancelWait is remained.

The easiest way to fix this is to swap the order of step b and step c. On the other hand, 
let sigsetjmp clean up the queue may also be a good choice. What do you think?

Attached the patch, any feedback is greatly appreciated.

Best regards,
--
Dongming Liu

Re: Problem with synchronous replication

From
Kyotaro Horiguchi
Date:
Hello.

At Fri, 25 Oct 2019 15:18:34 +0800, "Dongming Liu" <lingce.ldm@alibaba-inc.com> wrote in 
> 
> Hi,
> 
> I recently discovered two possible bugs about synchronous replication.
> 
> 1. SyncRepCleanupAtProcExit may delete an element that has been deleted
> SyncRepCleanupAtProcExit first checks whether the queue is detached, if it is not detached, 
> acquires the SyncRepLock lock and deletes it. If this element has been deleted by walsender, 
> it will be deleted repeatedly, SHMQueueDelete will core with a segment fault. 
> 
> IMO, like SyncRepCancelWait, we should lock the SyncRepLock first and then check
> whether the queue is detached or not.

I think you're right here.

> 2. SyncRepWaitForLSN may not call SyncRepCancelWait if ereport check one interrupt.
> For SyncRepWaitForLSN, if a query cancel interrupt arrives, we just terminate the wait 
> with suitable warning. As follows:
> 
> a. set QueryCancelPending to false
> b. errport outputs one warning
> c. calls SyncRepCancelWait to delete one element from the queue
> 
> If another cancel interrupt arrives when we are outputting warning at step b, the errfinish
> will call CHECK_FOR_INTERRUPTS that will output an ERROR, such as "canceling autovacuum
> task", then the process will jump to the sigsetjmp. Unfortunately, the step c will be skipped
> and the element that should be deleted by SyncRepCancelWait is remained.
> 
> The easiest way to fix this is to swap the order of step b and step c. On the other hand, 
> let sigsetjmp clean up the queue may also be a good choice. What do you think?
> 
> Attached the patch, any feedback is greatly appreciated.

This is not right. It is in transaction commit so it is in a
HOLD_INTERRUPTS section. ProcessInterrupt does not respond to
cancel/die interrupts thus the ereport should return.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Problem with synchronous replication

From
Michael Paquier
Date:
On Tue, Oct 29, 2019 at 01:40:41PM +0800, Dongming Liu wrote:
> Can someone help me to confirm that these two problems are bugs?
> If they are bugs, please help review the patch or provide better fix
> suggestions.

There is no need to send periodic pings.  Sometimes it takes time to
get replies as time is an important resource that is always limited.
I can see that Horiguchi-san has already provided some feedback, and I
am looking now at your suggestions.
--
Michael

Attachment

Re: Problem with synchronous replication

From
Michael Paquier
Date:
On Tue, Oct 29, 2019 at 07:50:01PM +0900, Kyotaro Horiguchi wrote:
> At Fri, 25 Oct 2019 15:18:34 +0800, "Dongming Liu" <lingce.ldm@alibaba-inc.com> wrote in
>> I recently discovered two possible bugs about synchronous replication.
>>
>> 1. SyncRepCleanupAtProcExit may delete an element that has been deleted
>> SyncRepCleanupAtProcExit first checks whether the queue is detached, if it is not detached,
>> acquires the SyncRepLock lock and deletes it. If this element has been deleted by walsender,
>> it will be deleted repeatedly, SHMQueueDelete will core with a segment fault.
>>
>> IMO, like SyncRepCancelWait, we should lock the SyncRepLock first and then check
>> whether the queue is detached or not.
>
> I think you're right here.

Indeed.  Looking at the surroundings we expect some code paths to hold
SyncRepLock in exclusive or shared mode but we don't actually check
that the lock is hold.  So let's add some assertions while on it.

> This is not right. It is in transaction commit so it is in a
> HOLD_INTERRUPTS section. ProcessInterrupt does not respond to
> cancel/die interrupts thus the ereport should return.

Yeah.  There is an easy way to check after that: InterruptHoldoffCount
needs to be strictly positive.

My suggestions are attached.  Any thoughts?
--
Michael

Attachment

Re: Problem with synchronous replication

From
Kyotaro Horiguchi
Date:
At Wed, 30 Oct 2019 10:45:11 +0900, Michael Paquier <michael@paquier.xyz> wrote in 
> On Tue, Oct 29, 2019 at 07:50:01PM +0900, Kyotaro Horiguchi wrote:
> > At Fri, 25 Oct 2019 15:18:34 +0800, "Dongming Liu" <lingce.ldm@alibaba-inc.com> wrote in 
> >> I recently discovered two possible bugs about synchronous replication.
> >> 
> >> 1. SyncRepCleanupAtProcExit may delete an element that has been deleted
> >> SyncRepCleanupAtProcExit first checks whether the queue is detached, if it is not detached, 
> >> acquires the SyncRepLock lock and deletes it. If this element has been deleted by walsender, 
> >> it will be deleted repeatedly, SHMQueueDelete will core with a segment fault. 
> >> 
> >> IMO, like SyncRepCancelWait, we should lock the SyncRepLock first and then check
> >> whether the queue is detached or not.
> > 
> > I think you're right here.
> 
> Indeed.  Looking at the surroundings we expect some code paths to hold
> SyncRepLock in exclusive or shared mode but we don't actually check
> that the lock is hold.  So let's add some assertions while on it.

I looked around closer.

If we do that strictly, other functions like
SyncRepGetOldestSyncRecPtr need the same Assert()s. I think static
functions don't need Assert() and caution in their comments would be
enough.

On the other hand, the similar-looking code in SyncRepInitConfig and
SyncRepUpdateSyncStandbysDefined seems safe since AFAICS it doesn't
have (this kind of) racing condition on wirtes. It might need a
comment like that. Or, we could go to (apparently) safer-side by
applying the same amendment to it.

SyncRepReleaseWaiters reads MyWalSnd->sync_standby_priority without
holding SyncRepLock, which could lead to a message with wrong
priority. I'm not sure it matters, though.

> > This is not right. It is in transaction commit so it is in a
> > HOLD_INTERRUPTS section. ProcessInterrupt does not respond to
> > cancel/die interrupts thus the ereport should return.
> 
> Yeah.  There is an easy way to check after that: InterruptHoldoffCount
> needs to be strictly positive.
> 
> My suggestions are attached.  Any thoughts?

Seems reasonable for holdoffs. The same assertion would be needed in
more places but it's another issue.


By the way while I was looking this, I found a typo. Please find the
attached.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index a21f7d3347..16aee1de4c 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -1065,8 +1065,8 @@ SyncRepUpdateSyncStandbysDefined(void)
 
         /*
          * If synchronous_standby_names has been reset to empty, it's futile
-         * for backends to continue to waiting.  Since the user no longer
-         * wants synchronous replication, we'd better wake them up.
+         * for backends to continue waiting.  Since the user no longer wants
+         * synchronous replication, we'd better wake them up.
          */
         if (!sync_standbys_defined)
         {

Re: Problem with synchronous replication

From
Dongming Liu
Date:
On Wed, Oct 30, 2019 at 9:12 AM Michael Paquier <michael@paquier.xyz> wrote:
On Tue, Oct 29, 2019 at 01:40:41PM +0800, Dongming Liu wrote:
> Can someone help me to confirm that these two problems are bugs?
> If they are bugs, please help review the patch or provide better fix
> suggestions.

There is no need to send periodic pings.  Sometimes it takes time to
get replies as time is an important resource that is always limited.
 
 Thank you for your reply. I also realized my mistake, thank you for correcting me.
 
I can see that Horiguchi-san has already provided some feedback, and I
am looking now at your suggestions.

Thanks again.

Re: Problem with synchronous replication

From
lingce.ldm
Date:
On Oct 29, 2019, at 18:50, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:

Hello.

At Fri, 25 Oct 2019 15:18:34 +0800, "Dongming Liu" <lingce.ldm@alibaba-inc.com> wrote in 

Hi,

I recently discovered two possible bugs about synchronous replication.

1. SyncRepCleanupAtProcExit may delete an element that has been deleted
SyncRepCleanupAtProcExit first checks whether the queue is detached, if it is not detached, 
acquires the SyncRepLock lock and deletes it. If this element has been deleted by walsender, 
it will be deleted repeatedly, SHMQueueDelete will core with a segment fault. 

IMO, like SyncRepCancelWait, we should lock the SyncRepLock first and then check
whether the queue is detached or not.

I think you're right here.

Thanks.


2. SyncRepWaitForLSN may not call SyncRepCancelWait if ereport check one interrupt.
For SyncRepWaitForLSN, if a query cancel interrupt arrives, we just terminate the wait 
with suitable warning. As follows:

a. set QueryCancelPending to false
b. errport outputs one warning
c. calls SyncRepCancelWait to delete one element from the queue

If another cancel interrupt arrives when we are outputting warning at step b, the errfinish
will call CHECK_FOR_INTERRUPTS that will output an ERROR, such as "canceling autovacuum
task", then the process will jump to the sigsetjmp. Unfortunately, the step c will be skipped
and the element that should be deleted by SyncRepCancelWait is remained.

The easiest way to fix this is to swap the order of step b and step c. On the other hand, 
let sigsetjmp clean up the queue may also be a good choice. What do you think?

Attached the patch, any feedback is greatly appreciated.

This is not right. It is in transaction commit so it is in a
HOLD_INTERRUPTS section. ProcessInterrupt does not respond to
cancel/die interrupts thus the ereport should return.

I read the relevant code, you are right. I called SyncRepWaitForLSN somewhere else, 
but forgot to put it in a HOLD_INTERRUPTS and  triggered an exception.

regards.

Dongming Liu
Attachment

Re: Problem with synchronous replication

From
lingce.ldm
Date:
On Oct 30, 2019, at 09:45, Michael Paquier <michael@paquier.xyz> wrote:

On Tue, Oct 29, 2019 at 07:50:01PM +0900, Kyotaro Horiguchi wrote:
At Fri, 25 Oct 2019 15:18:34 +0800, "Dongming Liu" <lingce.ldm@alibaba-inc.com> wrote in 
I recently discovered two possible bugs about synchronous replication.

1. SyncRepCleanupAtProcExit may delete an element that has been deleted
SyncRepCleanupAtProcExit first checks whether the queue is detached, if it is not detached, 
acquires the SyncRepLock lock and deletes it. If this element has been deleted by walsender, 
it will be deleted repeatedly, SHMQueueDelete will core with a segment fault. 

IMO, like SyncRepCancelWait, we should lock the SyncRepLock first and then check
whether the queue is detached or not.

I think you're right here.

Indeed.  Looking at the surroundings we expect some code paths to hold
SyncRepLock in exclusive or shared mode but we don't actually check
that the lock is hold.  So let's add some assertions while on it.

This is not right. It is in transaction commit so it is in a
HOLD_INTERRUPTS section. ProcessInterrupt does not respond to
cancel/die interrupts thus the ereport should return.

Yeah.  There is an easy way to check after that: InterruptHoldoffCount
needs to be strictly positive.

My suggestions are attached.  Any thoughts?

Thanks, this patch looks good to me.
Attachment

Re: Problem with synchronous replication

From
Fujii Masao
Date:
On Wed, Oct 30, 2019 at 4:16 PM lingce.ldm <lingce.ldm@alibaba-inc.com> wrote:
>
> On Oct 29, 2019, at 18:50, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
>
>
> Hello.
>
> At Fri, 25 Oct 2019 15:18:34 +0800, "Dongming Liu" <lingce.ldm@alibaba-inc.com> wrote in
>
>
> Hi,
>
> I recently discovered two possible bugs about synchronous replication.
>
> 1. SyncRepCleanupAtProcExit may delete an element that has been deleted
> SyncRepCleanupAtProcExit first checks whether the queue is detached, if it is not detached,
> acquires the SyncRepLock lock and deletes it. If this element has been deleted by walsender,
> it will be deleted repeatedly, SHMQueueDelete will core with a segment fault.
>
> IMO, like SyncRepCancelWait, we should lock the SyncRepLock first and then check
> whether the queue is detached or not.
>
>
> I think you're right here.

This change causes every ending backends to always take the exclusive lock
even when it's not in SyncRep queue. This may be problematic, for example,
when terminating multiple backends at the same time? If yes,
it might be better to check SHMQueueIsDetached() again after taking the lock.
That is,

if (!SHMQueueIsDetached(&(MyProc->syncRepLinks)))
{
    LWLockAcquire(SyncRepLock, LW_EXCLUSIVE);
    if (!SHMQueueIsDetached(&(MyProc->syncRepLinks)))
        SHMQueueDelete(&(MyProc->syncRepLinks));
    LWLockRelease(SyncRepLock);
}

Regards,

-- 
Fujii Masao



Re: Problem with synchronous replication

From
Kyotaro Horiguchi
Date:
Hello.

At Wed, 30 Oct 2019 17:21:17 +0900, Fujii Masao <masao.fujii@gmail.com> wrote in 
> This change causes every ending backends to always take the exclusive lock
> even when it's not in SyncRep queue. This may be problematic, for example,
> when terminating multiple backends at the same time? If yes,
> it might be better to check SHMQueueIsDetached() again after taking the lock.
> That is,

I'm not sure how much that harms but double-checked locking
(releasing) is simple enough for reducing possible congestion here, I
think. In short, + 1 for that.

> if (!SHMQueueIsDetached(&(MyProc->syncRepLinks)))
> {
>     LWLockAcquire(SyncRepLock, LW_EXCLUSIVE);
>     if (!SHMQueueIsDetached(&(MyProc->syncRepLinks)))
>         SHMQueueDelete(&(MyProc->syncRepLinks));
>     LWLockRelease(SyncRepLock);
> }

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Problem with synchronous replication

From
Michael Paquier
Date:
On Wed, Oct 30, 2019 at 05:21:17PM +0900, Fujii Masao wrote:
> This change causes every ending backends to always take the exclusive lock
> even when it's not in SyncRep queue. This may be problematic, for example,
> when terminating multiple backends at the same time? If yes,
> it might be better to check SHMQueueIsDetached() again after taking the lock.
> That is,
>
> if (!SHMQueueIsDetached(&(MyProc->syncRepLinks)))
> {
>     LWLockAcquire(SyncRepLock, LW_EXCLUSIVE);
>     if (!SHMQueueIsDetached(&(MyProc->syncRepLinks)))
>         SHMQueueDelete(&(MyProc->syncRepLinks));
>     LWLockRelease(SyncRepLock);
> }

Makes sense.  Thanks for the suggestion.
--
Michael

Attachment

Re: Problem with synchronous replication

From
Michael Paquier
Date:
On Wed, Oct 30, 2019 at 12:34:28PM +0900, Kyotaro Horiguchi wrote:
> If we do that strictly, other functions like
> SyncRepGetOldestSyncRecPtr need the same Assert()s. I think static
> functions don't need Assert() and caution in their comments would be
> enough.

Perhaps.  I'd rather be careful though if we meddle again with this
code in the future as it is shared across multiple places and
callers.

> SyncRepReleaseWaiters reads MyWalSnd->sync_standby_priority without
> holding SyncRepLock, which could lead to a message with wrong
> priority. I'm not sure it matters, though.

The WAL sender is the only writer of its info in shared memory, so
there is no problem to have it read data without its spin lock hold.

> Seems reasonable for holdoffs. The same assertion would be needed in
> more places but it's another issue.

Sure.

> By the way while I was looking this, I found a typo. Please find the
> attached.

Thanks, applied that one.
--
Michael

Attachment

Re: Problem with synchronous replication

From
Michael Paquier
Date:
On Wed, Oct 30, 2019 at 05:43:04PM +0900, Kyotaro Horiguchi wrote:
> At Wed, 30 Oct 2019 17:21:17 +0900, Fujii Masao <masao.fujii@gmail.com> wrote in
>> This change causes every ending backends to always take the exclusive lock
>> even when it's not in SyncRep queue. This may be problematic, for example,
>> when terminating multiple backends at the same time? If yes,
>> it might be better to check SHMQueueIsDetached() again after taking the lock.
>> That is,
>
> I'm not sure how much that harms but double-checked locking
> (releasing) is simple enough for reducing possible congestion here, I
> think.

FWIW, I could not measure any actual difference with pgbench -C, up to
500 sessions and an empty input file (just have one meta-command) and
-c 20.

I have added some comments in SyncRepCleanupAtProcExit(), and adjusted
the patch with the suggestion from Fujii-san.  Any comments?
--
Michael

Attachment

Re: Problem with synchronous replication

From
Fujii Masao
Date:
On Thu, Oct 31, 2019 at 11:12 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Wed, Oct 30, 2019 at 05:43:04PM +0900, Kyotaro Horiguchi wrote:
> > At Wed, 30 Oct 2019 17:21:17 +0900, Fujii Masao <masao.fujii@gmail.com> wrote in
> >> This change causes every ending backends to always take the exclusive lock
> >> even when it's not in SyncRep queue. This may be problematic, for example,
> >> when terminating multiple backends at the same time? If yes,
> >> it might be better to check SHMQueueIsDetached() again after taking the lock.
> >> That is,
> >
> > I'm not sure how much that harms but double-checked locking
> > (releasing) is simple enough for reducing possible congestion here, I
> > think.
>
> FWIW, I could not measure any actual difference with pgbench -C, up to
> 500 sessions and an empty input file (just have one meta-command) and
> -c 20.
>
> I have added some comments in SyncRepCleanupAtProcExit(), and adjusted
> the patch with the suggestion from Fujii-san.  Any comments?

Thanks for the patch! Looks good to me.

Regards,

-- 
Fujii Masao



Re: Problem with synchronous replication

From
"lingce.ldm"
Date:
On Oct 31, 2019, at 10:11, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Wed, Oct 30, 2019 at 05:43:04PM +0900, Kyotaro Horiguchi wrote:
>> At Wed, 30 Oct 2019 17:21:17 +0900, Fujii Masao <masao.fujii@gmail.com> wrote in
>>> This change causes every ending backends to always take the exclusive lock
>>> even when it's not in SyncRep queue. This may be problematic, for example,
>>> when terminating multiple backends at the same time? If yes,
>>> it might be better to check SHMQueueIsDetached() again after taking the lock.
>>> That is,
>>
>> I'm not sure how much that harms but double-checked locking
>> (releasing) is simple enough for reducing possible congestion here, I
>> think.
>
> FWIW, I could not measure any actual difference with pgbench -C, up to
> 500 sessions and an empty input file (just have one meta-command) and
> -c 20.
>
> I have added some comments in SyncRepCleanupAtProcExit(), and adjusted
> the patch with the suggestion from Fujii-san.  Any comments?

Thanks for the patch. Looks good to me +1.

Regards,

—
Dongming Liu


Attachment

Re: Problem with synchronous replication

From
Michael Paquier
Date:
On Thu, Oct 31, 2019 at 05:38:32PM +0900, Fujii Masao wrote:
> Thanks for the patch! Looks good to me.

Thanks.  I have applied the core fix down to 9.4, leaving the new
assertion improvements only for HEAD.
--
Michael

Attachment