On Wed, 21 Apr 2021 12:36:05 -0700
Andres Freund <andres@anarazel.de> wrote:
> [...]
>
> I don't think that concern equally applies for what I am proposing
> here. For one, we already have minRecoveryPoint in ControlData, and we
> already use it for the purpose of determining where we need to recover
> to, albeit only during crash recovery. Imo that's substantially
> different from adding actual recovery progress status information to the
> control file.
Just for the record, when I was talking about updating status of the startup
in the controldata, I was thinking about setting the last known LSN replayed.
Not some kind of variable string.
>
> I also think that it'd actually be a significant reliability improvement
> if we maintained an approximate minRecoveryPoint during normal running:
> I've seen way too many cases where WAL files were lost / removed and
> crash recovery just started up happily. Only hitting problems months
> down the line. Yes, it'd obviously not bullet proof, since we'd not want
> to add a significant stream of new fsyncs, but IME such WAL files
> lost/removed issues tend not to be about a few hundred bytes of WAL but
> many segments missing.
Maybe setting this minRecoveryPoint once per segment would be enough, near
from the beginning of the WAL. At least, the recovery process would be
forced to actually replay until the very last known segment.
Regards,