On Fri, Jun 19, 2020 at 10:02:54AM +0900, Kyotaro Horiguchi wrote:
> At Thu, 18 Jun 2020 18:18:37 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in
>> It is a little unclear to me how this or any proposed patch will solve
>> the original problem reported by Fujii-San? Basically, the problem
>> arises because we don't have an interlock between when the checkpoint
>> removes the WAL segment and the view tries to acquire the same. Am, I
>> missing something?
The proposed patch fetches the computation of the minimum LSN across
all slots before taking ReplicationSlotControlLock so its value gets
more lossy, and potentially older than what the slots actually
include. So it is an attempt to take the safest spot possible.
Honestly, I find a bit silly the design to compute and use the same
minimum LSN value for all the tuples returned by
pg_get_replication_slots, and you can actually get a pretty good
estimate of that by emulating ReplicationSlotsComputeRequiredLSN()
directly with what pg_replication_slot provides as we have a min()
aggregate for pg_lsn.
For these reasons, I think that we should remove for now this
information from the view, and reconsider this part more carefully for
14~ with a clear definition of how much lossiness we are ready to
accept for the information provided here, if necessary. We could for
example just have a separate SQL function that just grabs this value
(or a more global SQL view for XLogCtl data that includes this data).
> I'm not sure, but I don't get the point of blocking WAL segment
> removal until the view is completed.
We should really not do that anyway for a monitoring view.
--
Michael