Re: Race condition in SyncRepGetSyncStandbysPriority - Mailing list pgsql-hackers
From | Kyotaro Horiguchi |
---|---|
Subject | Re: Race condition in SyncRepGetSyncStandbysPriority |
Date | |
Msg-id | 20200415.102127.149843245284107812.horikyota.ntt@gmail.com Whole thread Raw |
In response to | Re: Race condition in SyncRepGetSyncStandbysPriority (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
At Tue, 14 Apr 2020 09:52:42 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in > Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes: > > SyncRepGetSyncStandbysPriority() is runing holding SyncRepLock so > > sync_standby_priority of any walsender can be changed while the > > function is scanning welsenders. The issue is we already have > > inconsistent walsender information before we enter the function. Thus > > how many times we scan on the array doesn't make any difference. > > *Yes it does*. The existing code can deliver entirely broken results > if some walsender exits between where we examine the priorities and > where we fetch the WAL pointers. While that doesn't seem to be the Ah. I didn't take that as inconsistency. Actually walsender exit inactivate the corresponding slot by setting pid = 0. In a bad case (as you mentioned upthread) the entry can be occupied by another wal sender. However, sync_standby_priority cannot be updated until the whole work is finished. > exact issue we're seeing in the buildfarm, it's still another obvious > bug in this code. I will not accept a "fix" that doesn't fix that. I think that the "inconsistency" that can be observed in a process is disagreement between SyncRepConfig->nmembers and <each_walsnd_entry>->sync_standby_priority. If any one of walsenders regards its priority as lower (larger in value) than nmembers in the "current" process, the assertion fires. If that is the issue, the issue is not dynamic inconsistency. # It's the assumption of my band-aid. > > I think we need to do one of the followings. > > > A) prevent SyncRepGetSyncStandbysPriority from being entered while > > walsender priority is inconsistent. > > B) make SyncRepGetSyncStandbysPriority be tolerant of priority > > inconsistency. > > C) protect walsender priority array from beinig inconsistent. > > (B) seems like the only practical solution from here. We could > probably arrange for synchronous update of the priorities when > they change in response to a GUC change, but it doesn't seem to > me to be practical to do that in response to walsender exit. > You'd end up finding that an unexpected walsender exit results > in panic'ing the system, which is no better than where we are now. I agree to you as a whole. I thought of several ways to keep the consistency of the array but all of them looked too much. > It doesn't seem to me to be that hard to implement the desired > semantics for synchronous_standby_names with inconsistent info. > In FIRST mode you basically just need to take the N smallest > priorities you see in the array, but without assuming there are no > duplicates or holes. It might be a good idea to include ties at the > end, that is if you see 1,2,2,4 or 1,3,3,4 and you want 2 sync > standbys, include the first three of them in the calculation until > the inconsistency is resolved. In ANY mode I don't see that > inconsistent priorities matter at all. Mmm, the priority lists like 1,2,2,4 are not thought as inconsistency at all in the context of walsender priority. That happenes stablly if any two or more walreceivers reported the same application_name. I believe the existing code is already taking that case into consideration. > > If we accept to share variable-length information among processes, > > sharing sync_standby_names or parsed SyncRepConfigData among processes > > would work. > > Not sure that we really need more than what's being shared now, > ie each process's last-known index in the sync_standby_names list. If we take the (B), we don't need anymore. (A) and (C) would need more. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
pgsql-hackers by date: