Re: Replication slot stats misgivings - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Replication slot stats misgivings
Date
Msg-id CAA4eK1+_U2zjjPHL8YFcbZCNCyK97k_vekWqWr-Fck+z6y92Yw@mail.gmail.com
Whole thread Raw
In response to Re: Replication slot stats misgivings  (Andres Freund <andres@anarazel.de>)
Responses Re: Replication slot stats misgivings
List pgsql-hackers
On Tue, Mar 23, 2021 at 10:54 PM Andres Freund <andres@anarazel.de> wrote:
>
> On 2021-03-23 23:37:14 +0900, Masahiko Sawada wrote:
>
> > > > Maybe we can compare the slot name in the
> > > > received message to the name in the element of replSlotStats. If they
> > > > don’t match, we swap entries in replSlotStats to synchronize the index
> > > > of the replication slot in ReplicationSlotCtl->replication_slots and
> > > > replSlotStats. If we cannot find the entry in replSlotStats that has
> > > > the name in the received message, it probably means either it's a new
> > > > slot or the previous create message is dropped, we can create the new
> > > > stats for the slot. Is that what you mean, Andres?
>
> That doesn't seem great. Slot names are imo a poor identifier for
> something happening asynchronously. The stats collector regularly
> doesn't process incoming messages for periods of time because it is busy
> writing out the stats file. That's also when messages to it are most
> likely to be dropped (likely because the incoming buffer is full).
>

Leaving aside restart case, without some sort of such sanity checking,
if both drop (of old slot) and create (of new slot) messages are lost
then we will start accumulating stats in old slots. However, if only
one of them is lost then there won't be any such problem.

> Perhaps we could have RestoreSlotFromDisk() send something to the stats
> collector ensuring the mapping makes sense?
>

Say if we send just the index location of each slot then probably we
can setup replSlotStats. Now say before the restart if one of the drop
messages was missed (by stats collector) and that happens to be at
some middle location, then we would end up restoring some already
dropped slot, leaving some of the still required ones. However, if
there is some sanity identifier like name along with the index, then I
think that would have worked for such a case.

I think it would have been easier if we would have some OID type of
identifier for each slot. But, without that may be index location of
ReplicationSlotCtl->replication_slots and slotname combination can
reduce the chances of slot stats go wrong quite less even if not zero.
If not name, do we have anything else in a slot that can be used for
some sort of sanity checking?

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Christoph Berg
Date:
Subject: Re: pgsql: Move tablespace path re-creation from the makefiles to pg_regres
Next
From: Dilip Kumar
Date:
Subject: Re: [HACKERS] Custom compression methods