Re: Race condition in SyncRepGetSyncStandbysPriority - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: Race condition in SyncRepGetSyncStandbysPriority |
Date | |
Msg-id | CA+fd4k5Oy2OuiFHD-uY_j5rS9RAoSei5G62E0goLEfK2K+1dAw@mail.gmail.com Whole thread Raw |
In response to | Re: Race condition in SyncRepGetSyncStandbysPriority (Kyotaro Horiguchi <horikyota.ntt@gmail.com>) |
Responses |
Re: Race condition in SyncRepGetSyncStandbysPriority
|
List | pgsql-hackers |
On Tue, 14 Apr 2020 at 18:35, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > At Tue, 14 Apr 2020 13:06:14 +0900, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote in > > On Tue, 14 Apr 2020 at 10:34, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > > > > Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes: > > > > At Sat, 11 Apr 2020 18:30:30 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in > > > >> What I think we should do about this is, essentially, to get rid of > > > >> SyncRepGetSyncStandbys. Instead, let's have each walsender advertise > > > >> whether *it* believes that it is a sync standby, based on its last > > > >> evaluation of the relevant GUCs. This would be a bool that it'd > > > >> compute and set alongside sync_standby_priority. (Hm, maybe we'd not > > > > > > > Mmm.. SyncRepGetStandbyPriority returns the "priority" that a > > > > walsender thinks it is at, among synchronous_standby_names. Then to > > > > decide "I am a sync standby" we need to know how many walsenders with > > > > higher priority are alive now. SyncRepGetSyncStandbyPriority does the > > > > judgment now and suffers from the inconsistency of priority values. > > > > > > Yeah. After looking a bit closer, I think that the current definition > > > of sync_standby_priority (that is, as the result of local evaluation > > > of SyncRepGetStandbyPriority()) is OK. The problem is what we're doing > > > with it. I suggest that what we should do in SyncRepGetSyncRecPtr() > > > is make one sweep across the WalSnd array, collecting PID, > > > sync_standby_priority, *and* the WAL pointers from each valid entry. > > > Then examine that data and decide which WAL value we need, without assuming > > > that the sync_standby_priority values are necessarily totally consistent. > > > But in any case we must examine each entry just once while holding its > > > mutex, not go back to it later expecting it to still be the same. > > SyncRepGetSyncStandbysPriority() is runing holding SyncRepLock so > sync_standby_priority of any walsender can be changed while the > function is scanning welsenders. The issue is we already have > inconsistent walsender information before we enter the function. Thus > how many times we scan on the array doesn't make any difference. > > I think we need to do one of the followings. > > A) prevent SyncRepGetSyncStandbysPriority from being entered while > walsender priority is inconsistent. > > B) make SyncRepGetSyncStandbysPriority be tolerant of priority > inconsistency. > > C) protect walsender priority array from beinig inconsistent. > > The (B) is the band aids. To achieve A we need to central controller > of priority config handling. C is: > > > Can we have a similar approach of sync_standby_defined for > > sync_standby_priority? That is, checkpionter is responsible for > > changing sync_standby_priority of all walsenders when SIGHUP. That > > way, all walsenders can see a consistent view of > > sync_standby_priority. And when a walsender starts, it sets > > sync_standby_priority by itself. The logic to decide who's a sync > > standby doesn't change. SyncRepGetSyncRecPtr() gets all walsenders > > having higher priority along with their WAL positions. > > Yeah, it works if we do , but the problem of that way is that to > determin priority of walsenders, we need to know what walsenders are > running. That is, when new walsender comes the process needs to aware > of the arrival (or leaving) right away and reassign the priority of > every wal senders again. I think we don't need to reassign the priority when new walsender comes or leaves. IIUC The priority is calculated based on only synchronous_standby_names. Coming or leaving a walsender doesn't affect other's priorities. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: