Re: Race condition in SyncRepGetSyncStandbysPriority - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Race condition in SyncRepGetSyncStandbysPriority
Date
Msg-id 15616.1586872362@sss.pgh.pa.us
Whole thread Raw
In response to Re: Race condition in SyncRepGetSyncStandbysPriority  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses Re: Race condition in SyncRepGetSyncStandbysPriority  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Race condition in SyncRepGetSyncStandbysPriority  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Re: Race condition in SyncRepGetSyncStandbysPriority  (Fujii Masao <masao.fujii@oss.nttdata.com>)
List pgsql-hackers
Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes:
> SyncRepGetSyncStandbysPriority() is runing holding SyncRepLock so
> sync_standby_priority of any walsender can be changed while the
> function is scanning welsenders. The issue is we already have
> inconsistent walsender information before we enter the function.  Thus
> how many times we scan on the array doesn't make any difference.

*Yes it does*.  The existing code can deliver entirely broken results
if some walsender exits between where we examine the priorities and
where we fetch the WAL pointers.  While that doesn't seem to be the
exact issue we're seeing in the buildfarm, it's still another obvious
bug in this code.  I will not accept a "fix" that doesn't fix that.

> I think we need to do one of the followings.

>  A) prevent SyncRepGetSyncStandbysPriority from being entered while
>     walsender priority is inconsistent.
>  B) make SyncRepGetSyncStandbysPriority be tolerant of priority
>     inconsistency.
>  C) protect walsender priority array from beinig inconsistent.

(B) seems like the only practical solution from here.  We could
probably arrange for synchronous update of the priorities when
they change in response to a GUC change, but it doesn't seem to
me to be practical to do that in response to walsender exit.
You'd end up finding that an unexpected walsender exit results
in panic'ing the system, which is no better than where we are now.

It doesn't seem to me to be that hard to implement the desired
semantics for synchronous_standby_names with inconsistent info.
In FIRST mode you basically just need to take the N smallest
priorities you see in the array, but without assuming there are no
duplicates or holes.  It might be a good idea to include ties at the
end, that is if you see 1,2,2,4 or 1,3,3,4 and you want 2 sync
standbys, include the first three of them in the calculation until
the inconsistency is resolved.  In ANY mode I don't see that
inconsistent priorities matter at all.

> If we accept to share variable-length information among processes,
> sharing sync_standby_names or parsed SyncRepConfigData among processes
> would work.

Not sure that we really need more than what's being shared now,
ie each process's last-known index in the sync_standby_names list.

            regards, tom lane



pgsql-hackers by date:

Previous
From: tushar
Date:
Subject: While restoring -getting error if dump contain sql statementsgenerated from generated.sql file
Next
From: Andreas Karlsson
Date:
Subject: Re: Poll: are people okay with function/operator table redesign?