On Thu, Aug 20, 2020 at 10:28 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> While trying to make sense of Adam Sjøgren's problem [1], I found
> myself staring at ReplicationSlotsComputeRequiredXmin() in slot.c.
> It seems to me that that is very shaky code, on two different
> grounds:
>
> 1. Sometimes it's called with ProcArrayLock already held exclusively.
> This means that any delay in acquiring the ReplicationSlotControlLock
> translates directly into a hold on ProcArrayLock; in other words,
> every acquisition of the ReplicationSlotControlLock is just as bad
> for concurrency as an acquisition of ProcArrayLock. While I didn't
> see any places that were doing really obviously slow things while
> holding ReplicationSlotControlLock, this is disturbing. Do we really
> need it to be like that?
>
> 2. On the other hand, the code is *releasing* the
> ReplicationSlotControlLock before it calls
> ProcArraySetReplicationSlotXmin, and that seems like a flat out
> concurrency bug. How can we be sure that the values we're storing
> into the shared xmin fields aren't stale by the time we acquire
> the ProcArrayLock (in the case where we didn't hold it already)?
> I'm concerned that in the worst case this code could make the
> shared xmin fields go backwards.
>
It is not clear to me how those values can go backward. Basically, we
install those values in slots after getting it from
GetOldestSafeDecodingTransactionId() and then those always seem to get
advanced. And GetOldestSafeDecodingTransactionId() takes into account
the already stored shared values of replication_slot_xmin and
replication_slot_catalog_xmin for computing the xmin_horizon for slot.
--
With Regards,
Amit Kapila.