Re: Replication slot stats misgivings - Mailing list pgsql-hackers
From | vignesh C |
---|---|
Subject | Re: Replication slot stats misgivings |
Date | |
Msg-id | CALDaNm2LRKnDy4Zo==h-CPZPcAkzD4TMre-0fViv_-P4rnWB4Q@mail.gmail.com Whole thread Raw |
In response to | Re: Replication slot stats misgivings (vignesh C <vignesh21@gmail.com>) |
Responses |
Re: Replication slot stats misgivings
|
List | pgsql-hackers |
On Sat, Apr 3, 2021 at 11:07 PM vignesh C <vignesh21@gmail.com> wrote: > > On Fri, Apr 2, 2021 at 9:29 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Fri, Apr 2, 2021 at 1:55 AM vignesh C <vignesh21@gmail.com> wrote: > > > > > > On Thu, Apr 1, 2021 at 5:58 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > On Thu, Apr 1, 2021 at 3:43 PM vignesh C <vignesh21@gmail.com> wrote: > > > > > > > > > > On Wed, Mar 31, 2021 at 11:32 AM vignesh C <vignesh21@gmail.com> wrote: > > > > > > > > > > > > On Tue, Mar 30, 2021 at 11:00 AM Andres Freund <andres@anarazel.de> wrote: > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > On 2021-03-30 10:13:29 +0530, vignesh C wrote: > > > > > > > > On Tue, Mar 30, 2021 at 6:28 AM Andres Freund <andres@anarazel.de> wrote: > > > > > > > > > Any chance you could write a tap test exercising a few of these cases? > > > > > > > > > > > > > > > > I can try to write a patch for this if nobody objects. > > > > > > > > > > > > > > Cool! > > > > > > > > > > > > > > > > > > > Attached a patch which has the test for the first scenario. > > > > > > > > > > > > > > > E.g. things like: > > > > > > > > > > > > > > > > > > - create a few slots, drop one of them, shut down, start up, verify > > > > > > > > > stats are still sane > > > > > > > > > - create a few slots, shut down, manually remove a slot, lower > > > > > > > > > max_replication_slots, start up > > > > > > > > > > > > > > > > Here by "manually remove a slot", do you mean to remove the slot > > > > > > > > manually from the pg_replslot folder? > > > > > > > > > > > > > > Yep - thereby allowing max_replication_slots after the shutdown/start to > > > > > > > be lower than the number of slots-stats objects. > > > > > > > > > > > > I have not included the 2nd test in the patch as the test fails with > > > > > > following warnings and also displays the statistics of the removed > > > > > > slot: > > > > > > WARNING: problem in alloc set Statistics snapshot: detected write > > > > > > past chunk end in block 0x55d038b8e410, chunk 0x55d038b8e438 > > > > > > WARNING: problem in alloc set Statistics snapshot: detected write > > > > > > past chunk end in block 0x55d038b8e410, chunk 0x55d038b8e438 > > > > > > > > > > > > This happens because the statistics file has an additional slot > > > > > > present even though the replication slot was removed. I felt this > > > > > > issue should be fixed. I will try to fix this issue and send the > > > > > > second test along with the fix. > > > > > > > > > > I felt from the statistics collector process, there is no way in which > > > > > we can identify if the replication slot is present or not because the > > > > > statistic collector process does not have access to shared memory. > > > > > Anything that the statistic collector process does independently by > > > > > traversing and removing the statistics of the replication slot > > > > > exceeding the max_replication_slot has its drawback of removing some > > > > > valid replication slot's statistics data. > > > > > Any thoughts on how we can identify the replication slot which has been dropped? > > > > > Can someone point me to the shared stats patch link with which message > > > > > loss can be avoided. I wanted to see a scenario where something like > > > > > the slot is dropped but the statistics are not updated because of an > > > > > immediate shutdown or server going down abruptly can occur or not with > > > > > the shared stats patch. > > > > > > > > > > > > > I don't think it is easy to simulate a scenario where the 'drop' > > > > message is dropped and I think that is why the test contains the step > > > > to manually remove the slot. At this stage, you can probably provide a > > > > test patch and a code-fix patch where it just drops the extra slots > > > > from the stats file. That will allow us to test it with a shared > > > > memory stats patch on which Andres and Horiguchi-San are working. If > > > > we still continue to pursue with current approach then as Andres > > > > suggested we might send additional information from > > > > RestoreSlotFromDisk to keep it in sync. > > > > > > Thanks for your comments, Attached patch has the fix for the same. > > > Also attached a couple of more patches which addresses the comments > > > which Andres had listed i.e changing char to NameData type and also to > > > display the unspilled/unstreamed transaction information in the > > > replication statistics. > > > Thoughts? > > > > Thank you for the patches! > > > > I've looked at those patches and here are some comments on 0001, 0002, > > and 0003 patch: > > Thanks for the comments. > > > 0001 patch: > > > > - values[0] = PointerGetDatum(cstring_to_text(s->slotname)); > > + values[0] = PointerGetDatum(cstring_to_text(s->slotname.data)); > > > > We can use NameGetDatum() instead. > > I felt we will not be able to use NameGetDatum because this function > will not have access to the value throughout the loop and NameGetDatum > must ensure the pointed-to value has adequate lifetime. > > > --- > > 0002 patch: > > > > The patch uses logical replication to test replication slots > > statistics but I think it's necessarily necessary. It would be more > > simple to use logical decoding. Maybe we can add TAP tests to > > contrib/test_decoding. > > > > I will try to change it to test_decoding if feasible and post in the > next version. > I have modified the patch to include tap tests in contrib/test_decoding. Attached v3 patch has the changes for the same. Thoughts? Regards, Vignesh
Attachment
pgsql-hackers by date: