ReplicationSlotsComputeRequiredXmin seems pretty questionable - Mailing list pgsql-hackers

From Tom Lane
Subject ReplicationSlotsComputeRequiredXmin seems pretty questionable
Date
Msg-id 1606337.1597942712@sss.pgh.pa.us
Whole thread Raw
Responses Re: ReplicationSlotsComputeRequiredXmin seems pretty questionable  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
While trying to make sense of Adam Sjøgren's problem [1], I found
myself staring at ReplicationSlotsComputeRequiredXmin() in slot.c.
It seems to me that that is very shaky code, on two different
grounds:

1. Sometimes it's called with ProcArrayLock already held exclusively.
This means that any delay in acquiring the ReplicationSlotControlLock
translates directly into a hold on ProcArrayLock; in other words,
every acquisition of the ReplicationSlotControlLock is just as bad
for concurrency as an acquisition of ProcArrayLock.  While I didn't
see any places that were doing really obviously slow things while
holding ReplicationSlotControlLock, this is disturbing.  Do we really
need it to be like that?

2. On the other hand, the code is *releasing* the
ReplicationSlotControlLock before it calls
ProcArraySetReplicationSlotXmin, and that seems like a flat out
concurrency bug.  How can we be sure that the values we're storing
into the shared xmin fields aren't stale by the time we acquire
the ProcArrayLock (in the case where we didn't hold it already)?
I'm concerned that in the worst case this code could make the
shared xmin fields go backwards.

Both of these issues could be solved, I think, if we got rid of
the provision for calling with ProcArrayLock already held and
moved the ProcArraySetReplicationSlotXmin call inside the hold
of ReplicationSlotControlLock.  But maybe I'm missing something
about why that would be worse.

            regards, tom lane

[1] https://www.postgresql.org/message-id/flat/87364kdsim.fsf%40tullinup.koldfront.dk



pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: [PG13] Planning (time + buffers) data structure in explain plan (format text)
Next
From: Alvaro Herrera
Date:
Subject: Re: "ccold" left by reindex concurrently are droppable?