Re: Replication slot stats misgivings - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Replication slot stats misgivings
Date
Msg-id 20210323172445.xssyuj2gzpmxdq2d@alap3.anarazel.de
Whole thread Raw
In response to Re: Replication slot stats misgivings  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: Replication slot stats misgivings  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
Hi,

On 2021-03-23 23:37:14 +0900, Masahiko Sawada wrote:
> On Tue, Mar 23, 2021 at 3:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, Mar 22, 2021 at 12:20 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Mon, Mar 22, 2021 at 1:25 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > >
> > > > On Sat, Mar 20, 2021 at 3:52 AM Andres Freund <andres@anarazel.de> wrote:
> > > > >
> > > > > - If max_replication_slots was lowered between a restart,
> > > > >   pgstat_read_statfile() will happily write beyond the end of
> > > > >   replSlotStats.
> > > >
> > > > I think we cannot restart the server after lowering
> > > > max_replication_slots to a value less than the number of replication
> > > > slots actually created on the server. No?
> > >
> > > This problem happens in the case where max_replication_slots is
> > > lowered and there still are stats for a slot.
> > >
> >
> > I think this can happen only if the drop message is lost, right?
> 
> Yes, I think you're right. In that case, the stats file could have
> more slots statistics than the lowered max_replication_slots.

Or if slots are deleted on the file-system while the cluster is
shutdown. Which obviously is at best a semi-supported thing, but it
normally does work.


> > > I understood the risk of running out of replSlotStats. If we use the
> > > index in replSlotStats instead, IIUC we need to somehow synchronize
> > > the indexes in between replSlotStats and
> > > ReplicationSlotCtl->replication_slots. The order of replSlotStats is
> > > preserved across restarting whereas the order of
> > > ReplicationSlotCtl->replication_slots isn’t (readdir() that is used by
> > > StartupReplicationSlots() doesn’t guarantee the order of the returneda
> > > entries in the directory).

Very good point. Even if readdir() order were fixed, we'd still have the
problem because there can be "gaps" in the indexes for slots
(e.g. create slot_a, create slot_b, create slot_c, drop slot_b, leaving
you with index 0 and 2 used, and 1 unused).


> > > Maybe we can compare the slot name in the
> > > received message to the name in the element of replSlotStats. If they
> > > don’t match, we swap entries in replSlotStats to synchronize the index
> > > of the replication slot in ReplicationSlotCtl->replication_slots and
> > > replSlotStats. If we cannot find the entry in replSlotStats that has
> > > the name in the received message, it probably means either it's a new
> > > slot or the previous create message is dropped, we can create the new
> > > stats for the slot. Is that what you mean, Andres?

That doesn't seem great. Slot names are imo a poor identifier for
something happening asynchronously. The stats collector regularly
doesn't process incoming messages for periods of time because it is busy
writing out the stats file. That's also when messages to it are most
likely to be dropped (likely because the incoming buffer is full).

Perhaps we could have RestoreSlotFromDisk() send something to the stats
collector ensuring the mapping makes sense?

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: making update/delete of inheritance trees scale better
Next
From: Jan Wieck
Date:
Subject: Re: pg_upgrade failing for 200+ million Large Objects