Re: Problem with synchronous replication - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: Problem with synchronous replication
Date
Msg-id CAHGQGwFKC89NhV2Ab=VRUzYZHZKfZkQLBr=FVQ2QDNEN7S3cnA@mail.gmail.com
Whole thread Raw
In response to Re: Problem with synchronous replication  (lingce.ldm <lingce.ldm@alibaba-inc.com>)
Responses Re: Problem with synchronous replication
Re: Problem with synchronous replication
List pgsql-hackers
On Wed, Oct 30, 2019 at 4:16 PM lingce.ldm <lingce.ldm@alibaba-inc.com> wrote:
>
> On Oct 29, 2019, at 18:50, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
>
>
> Hello.
>
> At Fri, 25 Oct 2019 15:18:34 +0800, "Dongming Liu" <lingce.ldm@alibaba-inc.com> wrote in
>
>
> Hi,
>
> I recently discovered two possible bugs about synchronous replication.
>
> 1. SyncRepCleanupAtProcExit may delete an element that has been deleted
> SyncRepCleanupAtProcExit first checks whether the queue is detached, if it is not detached,
> acquires the SyncRepLock lock and deletes it. If this element has been deleted by walsender,
> it will be deleted repeatedly, SHMQueueDelete will core with a segment fault.
>
> IMO, like SyncRepCancelWait, we should lock the SyncRepLock first and then check
> whether the queue is detached or not.
>
>
> I think you're right here.

This change causes every ending backends to always take the exclusive lock
even when it's not in SyncRep queue. This may be problematic, for example,
when terminating multiple backends at the same time? If yes,
it might be better to check SHMQueueIsDetached() again after taking the lock.
That is,

if (!SHMQueueIsDetached(&(MyProc->syncRepLinks)))
{
    LWLockAcquire(SyncRepLock, LW_EXCLUSIVE);
    if (!SHMQueueIsDetached(&(MyProc->syncRepLinks)))
        SHMQueueDelete(&(MyProc->syncRepLinks));
    LWLockRelease(SyncRepLock);
}

Regards,

-- 
Fujii Masao



pgsql-hackers by date:

Previous
From: Fabien COELHO
Date:
Subject: Re: Join Correlation Name
Next
From: Ibrar Ahmed
Date:
Subject: Proposal: Global Index