Nice finding.
At Tue, 13 Sep 2022 00:39:45 -0500, Jaime Casanova <jcasanov@systemguards.com.ec> wrote in
> and the problem seems to be that after zero'ing the stats that includes
> the name of the replication slot, this simple patch fixes it... not sure
> if it's the right fix though...
That doesn't work. since what that function clears is not the name in
the slot struct but that in stats entry.
The cause is what pg_stat_reset_replslot wants to do does not match
what pgstat feature thinks as reset.
Unfortunately, we don't have a function to leave a portion alone after
a reset. However, fetching the stats entry in pgstat_reset_replslot is
ugly..
I'm not sure this is less uglier but it works if
pgstat_report_replslot sets the name if it is found to be wiped
out... But that hafly nullify the protction by the assertion just
after.
--- a/src/backend/utils/activity/pgstat_replslot.c
+++ b/src/backend/utils/activity/pgstat_replslot.c
@@ -83,9 +83,11 @@ pgstat_report_replslot(ReplicationSlot *slot, const PgStat_StatReplSlotEntry *re
statent = &shstatent->stats;
/*
- * Any mismatch should have been fixed in pgstat_create_replslot() or
- * pgstat_acquire_replslot().
+ * pgstat_create_replslot() and pgstat_acquire_replslot() enters the name,
+ * but pgstat_reset_replslot() may clear it.
*/
+ if (statent->slotname.data[0] == 0)
+ namestrcpy(&shstatent->stats.slotname, NameStr(slot->data.name));
Assert(namestrcmp(&statent->slotname, NameStr(slot->data.name)) == 0);
Another measure would be to add the region to wipe-out on reset to
PgStat_KindInfo, but it seems too much.. (attached)
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center