Re: Replication slot stats misgivings - Mailing list pgsql-hackers

From vignesh C
Subject Re: Replication slot stats misgivings
Date
Msg-id CALDaNm2sdhsF73gZ_PJ6TRv=0VKT=r98j1ZqKEW+wf1h1pmi8A@mail.gmail.com
Whole thread Raw
In response to Re: Replication slot stats misgivings  (vignesh C <vignesh21@gmail.com>)
Responses Re: Replication slot stats misgivings  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Wed, Mar 31, 2021 at 11:32 AM vignesh C <vignesh21@gmail.com> wrote:
>
> On Tue, Mar 30, 2021 at 11:00 AM Andres Freund <andres@anarazel.de> wrote:
> >
> > Hi,
> >
> > On 2021-03-30 10:13:29 +0530, vignesh C wrote:
> > > On Tue, Mar 30, 2021 at 6:28 AM Andres Freund <andres@anarazel.de> wrote:
> > > > Any chance you could write a tap test exercising a few of these cases?
> > >
> > > I can try to write a patch for this if nobody objects.
> >
> > Cool!
> >
>
> Attached a patch which has the test for the first scenario.
>
> > > > E.g. things like:
> > > >
> > > > - create a few slots, drop one of them, shut down, start up, verify
> > > >   stats are still sane
> > > > - create a few slots, shut down, manually remove a slot, lower
> > > >   max_replication_slots, start up
> > >
> > > Here by "manually remove a slot", do you mean to remove the slot
> > > manually from the pg_replslot folder?
> >
> > Yep - thereby allowing max_replication_slots after the shutdown/start to
> > be lower than the number of slots-stats objects.
>
> I have not included the 2nd test in the patch as the test fails with
> following warnings and also displays the statistics of the removed
> slot:
> WARNING:  problem in alloc set Statistics snapshot: detected write
> past chunk end in block 0x55d038b8e410, chunk 0x55d038b8e438
> WARNING:  problem in alloc set Statistics snapshot: detected write
> past chunk end in block 0x55d038b8e410, chunk 0x55d038b8e438
>
> This happens because the statistics file has an additional slot
> present even though the replication slot was removed.  I felt this
> issue should be fixed. I will try to fix this issue and send the
> second test along with the fix.

I felt from the statistics collector process, there is no way in which
we can identify if the replication slot is present or not because the
statistic collector process does not have access to shared memory.
Anything that the statistic collector process does independently by
traversing and removing the statistics of the replication slot
exceeding the max_replication_slot has its drawback of removing some
valid replication slot's statistics data.
Any thoughts on how we can identify the replication slot which has been dropped?
Can someone point me to the shared stats patch link with which message
loss can be avoided. I wanted to see a scenario where something like
the slot is dropped but the statistics are not updated because of an
immediate shutdown or server going down abruptly can occur or not with
the shared stats patch.

Regards,
Vignesh



pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: Get memory contexts of an arbitrary backend process
Next
From: Amit Kapila
Date:
Subject: Re: Any objections to implementing LogicalDecodeMessageCB for pgoutput?