Re: Race condition in SyncRepGetSyncStandbysPriority - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: Race condition in SyncRepGetSyncStandbysPriority
Date
Msg-id 43ecec68-abff-6cdb-8412-204451d650ac@oss.nttdata.com
Whole thread Raw
In response to Re: Race condition in SyncRepGetSyncStandbysPriority  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Race condition in SyncRepGetSyncStandbysPriority  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers

On 2020/04/14 22:52, Tom Lane wrote:
> Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes:
>> SyncRepGetSyncStandbysPriority() is runing holding SyncRepLock so
>> sync_standby_priority of any walsender can be changed while the
>> function is scanning welsenders. The issue is we already have
>> inconsistent walsender information before we enter the function.  Thus
>> how many times we scan on the array doesn't make any difference.
> 
> *Yes it does*.  The existing code can deliver entirely broken results
> if some walsender exits between where we examine the priorities and
> where we fetch the WAL pointers.

So, in this case, the oldest lsn that SyncRepGetOldestSyncRecPtr()
calculates may be based on also the lsn of already-exited walsender.
This is what you say "broken results"? If yes, ISTM that this issue still
remains even after applying your patch. No? The walsender marked
as sync may still exit just before SyncRepGetOldestSyncRecPtr()
calculates the oldest lsn.

IMO that the broken results can be delivered when walsender marked
as sync exits *and* new walsender comes at that moment. If this new
walsender uses the WalSnd slot that the exited walsender used,
SyncRepGetOldestSyncRecPtr() wronly calculates the oldest lsn based
on this new walsender (i.e., different walsender from one marked as sync).
If this is actually what you tried to say "broken results", your patch
seems fine and fixes the issue.

BTW, since the patch changes the API of SyncRepGetSyncStandbys(),
it should not be back-patched to avoid ABI break. Right?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: fixing old_snapshot_threshold's time->xid mapping
Next
From: Robert Haas
Date:
Subject: Re: fixing old_snapshot_threshold's time->xid mapping