Re: Race condition in SyncRepGetSyncStandbysPriority - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Race condition in SyncRepGetSyncStandbysPriority
Date
Msg-id CA+fd4k4oJZYmi8+U=_8_CH3Lwk2ZQoeV2rmGkYtvoAnHXsQsTQ@mail.gmail.com
Whole thread Raw
In response to Re: Race condition in SyncRepGetSyncStandbysPriority  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Race condition in SyncRepGetSyncStandbysPriority  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Sat, 18 Apr 2020 at 00:31, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes:
> > At Fri, 17 Apr 2020 16:03:34 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in
> >> I agree that it might be worth considering the removal of am_sync for
> >> the master branch or v14. But I think that it should not be
> >> back-patched.
>
> > Ah! Agreed.
>
> Yeah, that's not necessary to fix the bug.  I'd be inclined to leave
> it for v14 at this point.
>
> I don't much like the patch Fujii-san posted, though.  An important part
> of the problem, IMO, is that SyncRepGetSyncStandbysPriority is too
> complicated and it's unclear what dependencies it has on the set of
> priorities in shared memory being consistent.  His patch does not improve
> that situation; if anything it makes it worse.
>
> If we're concerned about not breaking ABI in the back branches, what
> I propose we do about that is just leave SyncRepGetSyncStandbys in
> place but not used by the core code, and remove it only in HEAD.
> We can do an absolutely minimal fix for the assertion failure, in
> case anybody is calling that code, by just dropping the Assert and
> letting SyncRepGetSyncStandbys return NIL if it falls out.  (Or we
> could let it return the incomplete list, which'd be the behavior
> you get today in a non-assert build.)

+1

>
> Also, I realized while re-reading my patch that Kyotaro-san is onto
> something about the is_sync_standby flag not being necessary: instead
> we can just have the new function SyncRepGetCandidateStandbys return
> a reduced count.  I'd initially believed that it was necessary for
> that function to return the rejected candidate walsenders along with
> the accepted ones, but that was a misunderstanding.  I still don't
> want its API spec to say anything about ordering of the result array,
> but we don't need to.
>
> So that leads me to the attached.  I propose applying this to the
> back branches except for the rearrangement of WALSnd field order.
> In HEAD, I'd remove SyncRepGetSyncStandbys and subroutines altogether.
>

+   /* Quick out if not even configured to be synchronous */
+   if (SyncRepConfig == NULL)
+       return false;

I felt strange a bit that we do the above check in
SyncRepGetSyncRecPtr() because SyncRepReleaseWaiters() which is the
only caller says the following before calling it:

    /*
     * We're a potential sync standby. Release waiters if there are enough
     * sync standbys and we are considered as sync.
     */
    LWLockAcquire(SyncRepLock, LW_EXCLUSIVE);

Can we either change it to an assertion, move it to before acquiring
SyncRepLock in SyncRepReleaseWaiters or just remove it?

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: James Coleman
Date:
Subject: Re: execExprInterp() questions / How to improve scalar array op expr eval?
Next
From: Pavel Stehule
Date:
Subject: Re: proposal - function string_to_table