Replication slot stats misgivings - Mailing list pgsql-hackers

From Andres Freund
Subject Replication slot stats misgivings
Date
Msg-id 20210319185247.ldebgpdaxsowiflw@alap3.anarazel.de
Whole thread Raw
Responses Re: Replication slot stats misgivings  (Amit Kapila <amit.kapila16@gmail.com>)
Re: Replication slot stats misgivings  (Masahiko Sawada <sawada.mshk@gmail.com>)
Re: Replication slot stats misgivings  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
Hi,

I started to write this as a reply to
https://postgr.es/m/20210318015105.dcfa4ceybdjubf2i%40alap3.anarazel.de
but I think it doesn't really fit under that header anymore.

On 2021-03-17 18:51:05 -0700, Andres Freund wrote:
> It does make it easier for the shared memory stats patch, because if
> there's a fixed number + location, the relevant stats reporting doesn't
> need to go through a hashtable with the associated locking.  I guess
> that may have colored my perception that it's better to just have a
> statically sized memory allocation for this.  Noteworthy that SLRU stats
> are done in a fixed size allocation as well...

As part of reviewing the replication slot stats patch I looked at
replication slot stats a fair bit, and I've a few misgivings. First,
about the pgstat.c side of things:

- If somehow slot stat drop messages got lost (remember pgstat
  communication is lossy!), we'll just stop maintaining stats for slots
  created later, because there'll eventually be no space for keeping
  stats for another slot.

- If max_replication_slots was lowered between a restart,
  pgstat_read_statfile() will happily write beyond the end of
  replSlotStats.

- pgstat_reset_replslot_counter() acquires ReplicationSlotControlLock. I
  think pgstat.c has absolutely no business doing things on that level.

- We do a linear search through all replication slots whenever receiving
  stats for a slot. Even though there'd be a perfectly good index to
  just use all throughout - the slots index itself. It looks to me like
  slots stat reports can be fairly frequent in some workloads, so that
  doesn't seem great.

- PgStat_ReplSlotStats etc use slotname[NAMEDATALEN]. Why not just NameData?

- pgstat_report_replslot() already has a lot of stats parameters, it
  seems likely that we'll get more. Seems like we should just use a
  struct of stats updates.


And then more generally about the feature:
- If a slot was used to stream out a large amount of changes (say an
  initial data load), but then replication is interrupted before the
  transaction is committed/aborted, stream_bytes will not reflect the
  many gigabytes of data we may have sent.
- I seems weird that we went to the trouble of inventing replication
  slot stats, but then limit them to logical slots, and even there don't
  record the obvious things like the total amount of data sent.


I think the best way to address the more fundamental "pgstat related"
complaints is to change how replication slot stats are
"addressed". Instead of using the slots name, report stats using the
index in ReplicationSlotCtl->replication_slots.

That removes the risk of running out of "replication slot stat slots":
If we loose a drop message, the index eventually will be reused and we
likely can detect that the stats were for a different slot by comparing
the slot name.

It also makes it easy to handle the issue of max_replication_slots being
lowered and there still being stats for a slot - we simply can skip
restoring that slots data, because we know the relevant slot can't exist
anymore. And we can make the initial pgstat_report_replslot() during
slot creation use a

I'm wondering if we should just remove the slot name entirely from the
pgstat.c side of things, and have pg_stat_get_replication_slots()
inquire about slots by index as well and get the list of slots to report
stats for from slot.c infrastructure.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Matthias van de Meent
Date:
Subject: Re: non-HOT update not looking at FSM for large tuple update
Next
From: Andres Freund
Date:
Subject: Re: Do we work with LLVM 12 on s390x?