Re: Race condition in SyncRepGetSyncStandbysPriority - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: Race condition in SyncRepGetSyncStandbysPriority
Date
Msg-id 20200415.102127.149843245284107812.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: Race condition in SyncRepGetSyncStandbysPriority  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
At Tue, 14 Apr 2020 09:52:42 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in 
> Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes:
> > SyncRepGetSyncStandbysPriority() is runing holding SyncRepLock so
> > sync_standby_priority of any walsender can be changed while the
> > function is scanning welsenders. The issue is we already have
> > inconsistent walsender information before we enter the function.  Thus
> > how many times we scan on the array doesn't make any difference.
> 
> *Yes it does*.  The existing code can deliver entirely broken results
> if some walsender exits between where we examine the priorities and
> where we fetch the WAL pointers.  While that doesn't seem to be the

Ah. I didn't take that as inconsistency. Actually walsender exit
inactivate the corresponding slot by setting pid = 0.  In a bad case
(as you mentioned upthread) the entry can be occupied by another wal
sender. However, sync_standby_priority cannot be updated until the
whole work is finished.

> exact issue we're seeing in the buildfarm, it's still another obvious
> bug in this code.  I will not accept a "fix" that doesn't fix that.

I think that the "inconsistency" that can be observed in a process is
disagreement between SyncRepConfig->nmembers and
<each_walsnd_entry>->sync_standby_priority.  If any one of walsenders
regards its priority as lower (larger in value) than nmembers in the
"current" process, the assertion fires.  If that is the issue, the
issue is not dynamic inconsistency.

# It's the assumption of my band-aid.

> > I think we need to do one of the followings.
> 
> >  A) prevent SyncRepGetSyncStandbysPriority from being entered while
> >     walsender priority is inconsistent.
> >  B) make SyncRepGetSyncStandbysPriority be tolerant of priority
> >     inconsistency.
> >  C) protect walsender priority array from beinig inconsistent.
> 
> (B) seems like the only practical solution from here.  We could
> probably arrange for synchronous update of the priorities when
> they change in response to a GUC change, but it doesn't seem to
> me to be practical to do that in response to walsender exit.
> You'd end up finding that an unexpected walsender exit results
> in panic'ing the system, which is no better than where we are now.

I agree to you as a whole.  I thought of several ways to keep the
consistency of the array but all of them looked too much.

> It doesn't seem to me to be that hard to implement the desired
> semantics for synchronous_standby_names with inconsistent info.
> In FIRST mode you basically just need to take the N smallest
> priorities you see in the array, but without assuming there are no
> duplicates or holes.  It might be a good idea to include ties at the
> end, that is if you see 1,2,2,4 or 1,3,3,4 and you want 2 sync
> standbys, include the first three of them in the calculation until
> the inconsistency is resolved.  In ANY mode I don't see that
> inconsistent priorities matter at all.

Mmm, the priority lists like 1,2,2,4 are not thought as inconsistency
at all in the context of walsender priority.  That happenes stablly if
any two or more walreceivers reported the same application_name. I
believe the existing code is already taking that case into
consideration.

> > If we accept to share variable-length information among processes,
> > sharing sync_standby_names or parsed SyncRepConfigData among processes
> > would work.
> 
> Not sure that we really need more than what's being shared now,
> ie each process's last-known index in the sync_standby_names list.

If we take the (B), we don't need anymore.  (A) and (C) would need
more.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Perl modules for testing/viewing/corrupting/repairing your heap files
Next
From: David Rowley
Date:
Subject: remove_useless_groupby_columns does not need to record constraint dependencies