Re: Replication slot stats misgivings - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Replication slot stats misgivings |
Date | |
Msg-id | 20210320212653.flbcpczmlmug5g5d@alap3.anarazel.de Whole thread Raw |
In response to | Re: Replication slot stats misgivings (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: Replication slot stats misgivings
|
List | pgsql-hackers |
Hi, On 2021-03-20 09:25:40 +0530, Amit Kapila wrote: > On Sat, Mar 20, 2021 at 12:22 AM Andres Freund <andres@anarazel.de> wrote: > > > > And then more generally about the feature: > > - If a slot was used to stream out a large amount of changes (say an > > initial data load), but then replication is interrupted before the > > transaction is committed/aborted, stream_bytes will not reflect the > > many gigabytes of data we may have sent. > > > > We can probably update the stats each time we spilled or streamed the > transaction data but it was not clear at that stage whether or how > much it will be useful. It seems like the obvious answer here is to sync stats when releasing the slot? > > - I seems weird that we went to the trouble of inventing replication > > slot stats, but then limit them to logical slots, and even there don't > > record the obvious things like the total amount of data sent. > > > > Won't spill_bytes and stream_bytes will give you the amount of data sent? I don't think either tracks changes that were neither spilled nor streamed? And if they are, they're terribly misnamed? > > > > I think the best way to address the more fundamental "pgstat related" > > complaints is to change how replication slot stats are > > "addressed". Instead of using the slots name, report stats using the > > index in ReplicationSlotCtl->replication_slots. > > > > That removes the risk of running out of "replication slot stat slots": > > If we loose a drop message, the index eventually will be reused and we > > likely can detect that the stats were for a different slot by comparing > > the slot name. > > > > This idea is worth exploring to address the complaints but what do we > do when we detect that the stats are from the different slot? I think it's pretty easy to make that bulletproof. Add a pgstat_report_replslot_create(), and use that in ReplicationSlotCreate(). That is called with ReplicationSlotAllocationLock held, so it can just safely zero out stats. I don't think: > It has mixed of stats from the old and new slot. Can happen in that scenario. > > It also makes it easy to handle the issue of max_replication_slots being > > lowered and there still being stats for a slot - we simply can skip > > restoring that slots data, because we know the relevant slot can't exist > > anymore. And we can make the initial pgstat_report_replslot() during > > slot creation use a > > > > Here, your last sentence seems to be incomplete. Oops, I was planning to suggest adding pgstat_report_replslot_create() that zeroes out the pre-existing stats (or a parameter to pgstat_report_replslot(), but I don't think that's better). > > I'm wondering if we should just remove the slot name entirely from the > > pgstat.c side of things, and have pg_stat_get_replication_slots() > > inquire about slots by index as well and get the list of slots to report > > stats for from slot.c infrastructure. > > > > But how will you detect in your idea that some of the stats from the > already dropped slot? I don't think that is possible with my sketch? Greetings, Andres Freund
pgsql-hackers by date: