Re: Replication slot stats misgivings - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Replication slot stats misgivings |
Date | |
Msg-id | CAA4eK1Lyni4XaK+-dfy6Lix_X0JbfWqrH8mrtrx2h0QV_NvNpQ@mail.gmail.com Whole thread Raw |
In response to | Re: Replication slot stats misgivings (vignesh C <vignesh21@gmail.com>) |
Responses |
Re: Replication slot stats misgivings
|
List | pgsql-hackers |
On Thu, Apr 1, 2021 at 3:43 PM vignesh C <vignesh21@gmail.com> wrote: > > On Wed, Mar 31, 2021 at 11:32 AM vignesh C <vignesh21@gmail.com> wrote: > > > > On Tue, Mar 30, 2021 at 11:00 AM Andres Freund <andres@anarazel.de> wrote: > > > > > > Hi, > > > > > > On 2021-03-30 10:13:29 +0530, vignesh C wrote: > > > > On Tue, Mar 30, 2021 at 6:28 AM Andres Freund <andres@anarazel.de> wrote: > > > > > Any chance you could write a tap test exercising a few of these cases? > > > > > > > > I can try to write a patch for this if nobody objects. > > > > > > Cool! > > > > > > > Attached a patch which has the test for the first scenario. > > > > > > > E.g. things like: > > > > > > > > > > - create a few slots, drop one of them, shut down, start up, verify > > > > > stats are still sane > > > > > - create a few slots, shut down, manually remove a slot, lower > > > > > max_replication_slots, start up > > > > > > > > Here by "manually remove a slot", do you mean to remove the slot > > > > manually from the pg_replslot folder? > > > > > > Yep - thereby allowing max_replication_slots after the shutdown/start to > > > be lower than the number of slots-stats objects. > > > > I have not included the 2nd test in the patch as the test fails with > > following warnings and also displays the statistics of the removed > > slot: > > WARNING: problem in alloc set Statistics snapshot: detected write > > past chunk end in block 0x55d038b8e410, chunk 0x55d038b8e438 > > WARNING: problem in alloc set Statistics snapshot: detected write > > past chunk end in block 0x55d038b8e410, chunk 0x55d038b8e438 > > > > This happens because the statistics file has an additional slot > > present even though the replication slot was removed. I felt this > > issue should be fixed. I will try to fix this issue and send the > > second test along with the fix. > > I felt from the statistics collector process, there is no way in which > we can identify if the replication slot is present or not because the > statistic collector process does not have access to shared memory. > Anything that the statistic collector process does independently by > traversing and removing the statistics of the replication slot > exceeding the max_replication_slot has its drawback of removing some > valid replication slot's statistics data. > Any thoughts on how we can identify the replication slot which has been dropped? > Can someone point me to the shared stats patch link with which message > loss can be avoided. I wanted to see a scenario where something like > the slot is dropped but the statistics are not updated because of an > immediate shutdown or server going down abruptly can occur or not with > the shared stats patch. > I don't think it is easy to simulate a scenario where the 'drop' message is dropped and I think that is why the test contains the step to manually remove the slot. At this stage, you can probably provide a test patch and a code-fix patch where it just drops the extra slots from the stats file. That will allow us to test it with a shared memory stats patch on which Andres and Horiguchi-San are working. If we still continue to pursue with current approach then as Andres suggested we might send additional information from RestoreSlotFromDisk to keep it in sync. -- With Regards, Amit Kapila.
pgsql-hackers by date: