Re: START_REPLICATION SLOT causing a crash in an assert build - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: START_REPLICATION SLOT causing a crash in an assert build
Date
Msg-id 20220913.184845.2130916662920954058.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: START_REPLICATION SLOT causing a crash in an assert build  (Jaime Casanova <jcasanov@systemguards.com.ec>)
Responses Re: START_REPLICATION SLOT causing a crash in an assert build
List pgsql-hackers
Nice finding.

At Tue, 13 Sep 2022 00:39:45 -0500, Jaime Casanova <jcasanov@systemguards.com.ec> wrote in 
> and the problem seems to be that after zero'ing the stats that includes
> the name of the replication slot, this simple patch fixes it... not sure
> if it's the right fix though...

That doesn't work. since what that function clears is not the name in
the slot struct but that in stats entry.

The cause is what pg_stat_reset_replslot wants to do does not match
what pgstat feature thinks as reset.

Unfortunately, we don't have a function to leave a portion alone after
a reset. However, fetching the stats entry in pgstat_reset_replslot is
ugly..

I'm not sure this is less uglier but it works if
pgstat_report_replslot sets the name if it is found to be wiped
out... But that hafly nullify the protction by the assertion just
after.


--- a/src/backend/utils/activity/pgstat_replslot.c
+++ b/src/backend/utils/activity/pgstat_replslot.c
@@ -83,9 +83,11 @@ pgstat_report_replslot(ReplicationSlot *slot, const PgStat_StatReplSlotEntry *re
     statent = &shstatent->stats;
 
     /*
-     * Any mismatch should have been fixed in pgstat_create_replslot() or
-     * pgstat_acquire_replslot().
+     * pgstat_create_replslot() and pgstat_acquire_replslot() enters the name,
+     * but pgstat_reset_replslot() may clear it.
      */
+    if (statent->slotname.data[0] == 0)
+        namestrcpy(&shstatent->stats.slotname, NameStr(slot->data.name));
     Assert(namestrcmp(&statent->slotname, NameStr(slot->data.name)) == 0);


Another measure would be to add the region to wipe-out on reset to
PgStat_KindInfo, but it seems too much.. (attached)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Perform streaming logical transactions by background workers and parallel apply
Next
From: Alvaro Herrera
Date:
Subject: Re: [PATCH v1] fix potential memory leak in untransformRelOptions