Re: Replication slot stats misgivings - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Replication slot stats misgivings
Date
Msg-id 20210320212653.flbcpczmlmug5g5d@alap3.anarazel.de
Whole thread Raw
In response to Re: Replication slot stats misgivings  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Replication slot stats misgivings
List pgsql-hackers
Hi,

On 2021-03-20 09:25:40 +0530, Amit Kapila wrote:
> On Sat, Mar 20, 2021 at 12:22 AM Andres Freund <andres@anarazel.de> wrote:
> >
> > And then more generally about the feature:
> > - If a slot was used to stream out a large amount of changes (say an
> >   initial data load), but then replication is interrupted before the
> >   transaction is committed/aborted, stream_bytes will not reflect the
> >   many gigabytes of data we may have sent.
> >
> 
> We can probably update the stats each time we spilled or streamed the
> transaction data but it was not clear at that stage whether or how
> much it will be useful.

It seems like the obvious answer here is to sync stats when releasing
the slot?


> > - I seems weird that we went to the trouble of inventing replication
> >   slot stats, but then limit them to logical slots, and even there don't
> >   record the obvious things like the total amount of data sent.
> >
> 
> Won't spill_bytes and stream_bytes will give you the amount of data sent?

I don't think either tracks changes that were neither spilled nor
streamed? And if they are, they're terribly misnamed?

> >
> > I think the best way to address the more fundamental "pgstat related"
> > complaints is to change how replication slot stats are
> > "addressed". Instead of using the slots name, report stats using the
> > index in ReplicationSlotCtl->replication_slots.
> >
> > That removes the risk of running out of "replication slot stat slots":
> > If we loose a drop message, the index eventually will be reused and we
> > likely can detect that the stats were for a different slot by comparing
> > the slot name.
> >
> 
> This idea is worth exploring to address the complaints but what do we
> do when we detect that the stats are from the different slot?

I think it's pretty easy to make that bulletproof. Add a
pgstat_report_replslot_create(), and use that in
ReplicationSlotCreate(). That is called with
ReplicationSlotAllocationLock held, so it can just safely zero out stats.

I don't think:

> It has mixed of stats from the old and new slot.

Can happen in that scenario.


> > It also makes it easy to handle the issue of max_replication_slots being
> > lowered and there still being stats for a slot - we simply can skip
> > restoring that slots data, because we know the relevant slot can't exist
> > anymore. And we can make the initial pgstat_report_replslot() during
> > slot creation use a
> >
> 
> Here, your last sentence seems to be incomplete.

Oops, I was planning to suggest adding pgstat_report_replslot_create()
that zeroes out the pre-existing stats (or a parameter to
pgstat_report_replslot(), but I don't think that's better).


> > I'm wondering if we should just remove the slot name entirely from the
> > pgstat.c side of things, and have pg_stat_get_replication_slots()
> > inquire about slots by index as well and get the list of slots to report
> > stats for from slot.c infrastructure.
> >
> 
> But how will you detect in your idea that some of the stats from the
> already dropped slot?

I don't think that is possible with my sketch?

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] Custom compression methods (./configure)
Next
From: Andres Freund
Date:
Subject: Re: Replication slot stats misgivings