On 2020-Jun-23, Kyotaro Horiguchi wrote:
> At Tue, 23 Jun 2020 11:50:34 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in
> > On Mon, Jun 22, 2020 at 6:32 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> > > We should expose the LSN calculated from
> > > "the current WAL LSN - max(wal_keep_segments * 16MB, max_slot_wal_keep_size)"?
> > > This indicates the minimum LSN of WAL files that are guaraneed to be
> > > currently retained by wal_keep_segments and max_slot_wal_keep_size.
> > > That is, if checkpoint occurs when restart_lsn of replication slot is
> > > smaller than that minimum LSN, some required WAL files may be removed.
> > >
> > > So DBAs can periodically monitor and compare restart_lsn and that minimum
> > > LSN. If they see frequently that difference of those LSN is very small,
> > > they can decide to increase wal_keep_segments or max_slot_wal_keep_size,
> > > to prevent required WAL files from being removed. Thought?
> >
> > +1. This sounds like a good and useful stat for users.
>
> +1 for showing a number that is not involving lastRemovedSegNo. It is
> like returning to the initial version of this patch. It showed a
> number like ((the suggested above) minus restart_lsn). The number is
> different for each slot so they fit in the view.
>
> The number is usable for the same purpose so I'm ok with it.
I think we should publish the value from wal_keep_segments separately
from max_slot_wal_keep_size. ISTM that the user might decide to change
or remove wal_keep_segments and be suddenly at risk of losing slots
because of overlooking that it was wal_keep_segments, not
max_slot_wal_keep_size, that was protecting them.
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services