On Tue, Jun 09, 2020 at 09:01:13PM +0300, Alexey Kondratov wrote:
> Yes, there was a ReplicationSlotsComputeRequiredLSN() call inside
> pg_physical_replication_slot_advance() in the v5 of the patch:
>
> But later it has been missed and we have not noticed that.
>
> I think that adding it back as per attached will be enough.
[ scratches head... ]
Indeed, this part gets wrong and we would have to likely rely on a WAL
sender to do this calculation once a new flush location is received,
but that may not happen in some cases. It feels more natural to do
that in the location where the slot is marked as dirty, and there
is no need to move around an extra check to see if the slot has
actually been advanced or not. Or we could just call the routine once
any advancing is attempted? That would be unnecessary if no advancing
is done.
> > I find it really depressing how much obviously untested stuff gets
> > added in this area.
>
> Prior to this patch pg_replication_slot_advance was not being tested
> at all.
> Unfortunately, added tests appeared to be not enough to cover all
> cases. It
> seems that the whole machinery of WAL holding and trimming is worth
> to be
> tested more thoroughly.
I think that it would be interesting if we had a SQL representation of
the contents of XLogCtlData (wanted that a couple of times). Now we
are actually limited to use a checkpoint and check that past segments
are getting recycled by looking at the contents of pg_wal. Doing that
here does not cause the existing tests to be much more expensive as we
only need one extra call to pg_switch_wal(), mostly. Please see the
attached.
--
Michael