Re: Replication slot stats misgivings - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Replication slot stats misgivings
Date
Msg-id CAD21AoD9Orq=xuZhaxowqoEZvBpvFiT7hhtA+n3B1WJ_VM9pCQ@mail.gmail.com
Whole thread Raw
In response to Re: Replication slot stats misgivings  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Replication slot stats misgivings
List pgsql-hackers
On Tue, Mar 23, 2021 at 3:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Mar 22, 2021 at 12:20 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Mon, Mar 22, 2021 at 1:25 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Sat, Mar 20, 2021 at 3:52 AM Andres Freund <andres@anarazel.de> wrote:
> > > >
> > > > - If max_replication_slots was lowered between a restart,
> > > >   pgstat_read_statfile() will happily write beyond the end of
> > > >   replSlotStats.
> > >
> > > I think we cannot restart the server after lowering
> > > max_replication_slots to a value less than the number of replication
> > > slots actually created on the server. No?
> >
> > This problem happens in the case where max_replication_slots is
> > lowered and there still are stats for a slot.
> >
>
> I think this can happen only if the drop message is lost, right?

Yes, I think you're right. In that case, the stats file could have
more slots statistics than the lowered max_replication_slots.

>
> > I understood the risk of running out of replSlotStats. If we use the
> > index in replSlotStats instead, IIUC we need to somehow synchronize
> > the indexes in between replSlotStats and
> > ReplicationSlotCtl->replication_slots. The order of replSlotStats is
> > preserved across restarting whereas the order of
> > ReplicationSlotCtl->replication_slots isn’t (readdir() that is used by
> > StartupReplicationSlots() doesn’t guarantee the order of the returned
> > entries in the directory). Maybe we can compare the slot name in the
> > received message to the name in the element of replSlotStats. If they
> > don’t match, we swap entries in replSlotStats to synchronize the index
> > of the replication slot in ReplicationSlotCtl->replication_slots and
> > replSlotStats. If we cannot find the entry in replSlotStats that has
> > the name in the received message, it probably means either it's a new
> > slot or the previous create message is dropped, we can create the new
> > stats for the slot. Is that what you mean, Andres?
> >
>
> I wonder how in this scheme, we will remove the risk of running out of
> 'replSlotStats' and still restore correct stats assuming the drop
> message is lost? Do we want to check after restoring each slot info
> whether the slot with that name exists?

Yeah, I think we need such a check at least if the number of slot
stats in the stats file is larger than max_replication_slots. Or we
can do that at every startup to remove orphaned slot stats.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
Next
From: Tom Lane
Date:
Subject: Re: [CLOBBER_CACHE]Server crashed with segfault 11 while executing clusterdb