Re: Replication slot stats misgivings - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Replication slot stats misgivings
Date
Msg-id CAA4eK1LNCBCsB9b6zXPFnqySZxfaeVjqACs5TgJUAoCCyKRA-Q@mail.gmail.com
Whole thread Raw
In response to Re: Replication slot stats misgivings  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Mon, Mar 22, 2021 at 3:10 AM Andres Freund <andres@anarazel.de> wrote:
>
> On 2021-03-21 16:08:00 +0530, Amit Kapila wrote:
> > On Sun, Mar 21, 2021 at 2:57 AM Andres Freund <andres@anarazel.de> wrote:
> > > On 2021-03-20 10:28:06 +0530, Amit Kapila wrote:
> > > > On Sat, Mar 20, 2021 at 9:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > This idea is worth exploring to address the complaints but what do we
> > > > > do when we detect that the stats are from the different slot? It has
> > > > > mixed of stats from the old and new slot. We need to probably reset it
> > > > > after we detect that.
> > > > >
> > > >
> > > > What if the user created a slot with the same name after dropping the
> > > > slot and it has used the same index. I think chances are less but
> > > > still a possibility, but maybe that is okay.
> > > >
> > > > > What if after some frequency (say whenever we
> > > > > run out of indexes) we check whether the slots we are maintaining is
> > > > > pgstat.c have some stale slot entry (entry exists but the actual slot
> > > > > is dropped)?
> > > > >
> > > >
> > > > A similar drawback (the user created a slot with the same name after
> > > > dropping it) exists with this as well.
> > >
> > > pgstat_report_replslot_drop() already prevents that, no?
> > >
> >
> > Yeah, normally it would prevent that but what if a drop message is lost?
>
> That already exists as a danger, no? pgstat_recv_replslot() uses
> pgstat_replslot_index() to find the slot by name. So if a drop message
> is lost we'd potentially accumulate into stats of an older slot.  It'd
> probably a lower risk with what I suggested, because the initial stat
> report slot.c would use something like pgstat_report_replslot_create(),
> which the stats collector can use to reset the stats to 0?
>

okay, but I guess if we miss the create message as well then we will
have a similar danger. I think the benefit your idea will bring is to
use index-based lookup instead of name-based lookup. IIRC, we have
initially used the name here because we thought there is nothing like
OID for slots but your suggestion of using
ReplicationSlotCtl->replication_slots can address that.

> If we do it right the lossiness will be removed via shared memory stats
> patch...
>

Okay.

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Tatsuo Ishii
Date:
Subject: Re: Why logical replication lancher exits 1?
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: Type of wait events WalReceiverWaitStart and WalSenderWaitForWAL