Re: Replication slot stats misgivings - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Replication slot stats misgivings
Date
Msg-id CAA4eK1+ycL2XFbCLG=bK-Wv8jBDVVBG7M3o2tZgrUs9qMqHzWQ@mail.gmail.com
Whole thread Raw
In response to Re: Replication slot stats misgivings  (vignesh C <vignesh21@gmail.com>)
Responses Re: Replication slot stats misgivings  (vignesh C <vignesh21@gmail.com>)
Re: Replication slot stats misgivings  (vignesh C <vignesh21@gmail.com>)
List pgsql-hackers
On Mon, Apr 5, 2021 at 8:51 PM vignesh C <vignesh21@gmail.com> wrote:
>

Few comments on the latest patches:
Comments on 0001
--------------------------------
1.
@@ -659,6 +661,8 @@ ReorderBufferTXNByXid(ReorderBuffer *rb,
TransactionId xid, bool create,
  dlist_push_tail(&rb->toplevel_by_lsn, &txn->node);
  AssertTXNLsnOrder(rb);
  }
+
+ rb->totalTxns++;
  }
  else
  txn = NULL; /* not found and not asked to create */
@@ -3078,6 +3082,7 @@ ReorderBufferChangeMemoryUpdate(ReorderBuffer *rb,
  {
  txn->size += sz;
  rb->size += sz;
+ rb->totalBytes += sz;

I think this will include the txns that are aborted and for which we
don't send anything. It might be better to update these stats in
ReorderBufferProcessTXN or ReorderBufferReplay where we are sure we
have sent the data. We can probably use size/total_size in txn. We
need to be careful to not double include the totaltxn or totalBytes
for streaming xacts as we might process the same txn multiple times.

2.
+        Amount of decoded transactions data sent to the decoding output plugin
+        while decoding the changes from WAL for this slot. This and total_txns
+        for this slot can be used to gauge the total amount of data during
+        logical decoding.

I think we can slightly modify the second line here: "This can be used
to gauge the total amount of data sent during logical decoding.". Why
we need to include total_txns along with it.

0002
----------
3.
+  -- we don't want to wait forever; loop will exit after 30 seconds
+  FOR i IN 1 .. 5 LOOP
+
...
...
+
+    -- wait a little
+    perform pg_sleep_for('100 milliseconds');

I think this loop needs to be executed 300 times instead of 5 times,
if the above comments and code needs to do what is expected here?


4.
+# Test to drop one of the subscribers and verify replication statistics data is
+# fine after publisher is restarted.
+$node->safe_psql('postgres', "SELECT
pg_drop_replication_slot('regression_slot4')");
+
+$node->stop;
+$node->start;
+
+# Verify statistics data present in pg_stat_replication_slots are sane after
+# publisher is restarted
+$result = $node->safe_psql('postgres',
+ "SELECT slot_name, total_txns > 0 AS total_txn, total_bytes > 0 AS total_bytes
+ FROM pg_stat_replication_slots ORDER BY slot_name"

Various comments in the 0002 refer to publisher/subscriber which is
not what we are using here.

5.
+# Create table.
+$node->safe_psql('postgres',
+        "CREATE TABLE test_repl_stat(col1 int)");
+$node->safe_psql('postgres',
+        "SELECT data FROM
pg_logical_slot_get_changes('regression_slot1', NULL, NULL,
'include-xids', '0', 'skip-empty-xacts', '1')");
+$node->safe_psql('postgres',
+        "SELECT data FROM
pg_logical_slot_get_changes('regression_slot2', NULL, NULL,
'include-xids', '0', 'skip-empty-xacts', '1')");
+$node->safe_psql('postgres',
+        "SELECT data FROM
pg_logical_slot_get_changes('regression_slot3', NULL, NULL,
'include-xids', '0', 'skip-empty-xacts', '1')");
+$node->safe_psql('postgres',
+        "SELECT data FROM
pg_logical_slot_get_changes('regression_slot4', NULL, NULL,
'include-xids', '0', 'skip-empty-xacts', '1')");

I think we can save the above calls to pg_logical_slot_get_changes if
we create table before creating the slots in this test.

0003
---------
6. In the tests/code, publisher is used at multiple places. I think
that is not required because this can happen via plugin as well.
7.
+ if (max_replication_slots == nReplSlotStats)
+ {
+ ereport(pgStatRunningInCollector ? LOG : WARNING,
+ (errmsg("skipping \"%s\" replication slot statistics as
pg_stat_replication_slots does not have enough slots",
+ NameStr(replSlotStats[nReplSlotStats].slotname))));
+ memset(&replSlotStats[nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats));

Do we need memset here? Isn't this location is past the max location?


-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: Re: Stronger safeguard for archive recovery not to miss data
Next
From: "osumi.takamichi@fujitsu.com"
Date:
Subject: RE: Stronger safeguard for archive recovery not to miss data