At Tue, 16 Jun 2020 14:31:43 -0400, Alvaro Herrera <alvherre@2ndquadrant.com> wrote in
> On 2020-Jun-16, Kyotaro Horiguchi wrote:
>
> > I noticed the another issue. If some required WALs are removed, the
> > slot will be "invalidated", that is, restart_lsn is set to invalid
> > value. As the result we hardly see the "lost" state.
> >
> > It can be "fixed" by remembering the validity of a slot separately
> > from restart_lsn. Is that worth doing?
>
> We discussed this before. I agree it would be better to do this
> in some way, but I fear that if we do it naively, some code might exist
> that reads the LSN without realizing that it needs to check the validity
> flag first.
Yes, that was my main concern on it. That's error-prone. How about
remembering the LSN where invalidation happened? It's safe since no
others than slot-monitoring functions would look
last_invalidated_lsn. It can be reset if active_pid is a valid pid.
InvalidateObsoleteReplicationSlots:
...
SpinLockAcquire(&s->mutex);
+ s->data.last_invalidated_lsn = s->data.restart_lsn;
s->data.restart_lsn = InvalidXLogRecPtr;
SpinLockRelease(&s->mutex);
> On the other hand, maybe this is not a problem in practice, because if
> such a bug occurs, what will happen is that trying to read WAL from such
> a slot will return the error message that the WAL file cannot be found.
> Maybe this is acceptable?
I'm not sure. For my part a problem of that would we need to look
into server logs to know what is acutally going on.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center