Re: Failed transaction statistics to measure the logical replication progress - Mailing list pgsql-hackers

From Peter Smith
Subject Re: Failed transaction statistics to measure the logical replication progress
Date
Msg-id CAHut+Pv9gW6+xFH-iQ+=2A9o3-y+Mt-xE-7L+Q59Dy7_YX1Ebg@mail.gmail.com
Whole thread Raw
In response to RE: Failed transaction statistics to measure the logical replication progress  ("osumi.takamichi@fujitsu.com" <osumi.takamichi@fujitsu.com>)
Responses RE: Failed transaction statistics to measure the logical replication progress  ("osumi.takamichi@fujitsu.com" <osumi.takamichi@fujitsu.com>)
List pgsql-hackers
Please see below my review comments for v25.

======

1. Commit message

Introduce cumulative columns of transactions of
logical replication subscriber to the pg_stat_subscription_stats view.

"cumulative columns of transactions" sounds a bit strange to me.

SUGGESTED
Introduce 2 new subscription statistics columns (apply_commit_count,
and apply_rollback_count) to the pg_stat_subscription_stats view for
counting cumulative transaction commits/rollbacks.

~~~

2. doc/src/sgml/monitoring.sgml - bug

The new SGML <row>s have been added in the wrong place!

I don't think this renders like you expect it does. Please regenerate
the help to see for yourself.

~~~

3. doc/src/sgml/monitoring.sgml - wording

+      <para>
+       Number of transactions rollbacked in this subscription. Both
+       <command>ROLLBACK</command> of transaction streamed as in-progress
+       transaction and <command>ROLLBACK PREPARED</command> increment this
+       counter.
+      </para></entry>

BEFORE
Number of transactions rollbacked in this subscription.

SUGGESTED
Number of transaction rollbacks in this subscription.

~~~

4. doc/src/sgml/monitoring.sgml - wording

+      <para>
+       Number of transactions rollbacked in this subscription. Both
+       <command>ROLLBACK</command> of transaction streamed as in-progress
+       transaction and <command>ROLLBACK PREPARED</command> increment this
+       counter.
+      </para></entry>

Trying to distinguish between the ROLLBACK of a transaction and of a
streamed in-progress transaction seems to have made this description
too complicated. I don’t think the user even cares/knows about this
(in-progress) distinction. So, I think this should just be written
more simply (like the COMMIT part was)

BEFORE
Both  <command>ROLLBACK</command> of transaction streamed as
in-progress  transaction and <command>ROLLBACK PREPARED</command>
increment this counter.

SUGGESTED
Both  <command>ROLLBACK</command> and <command>ROLLBACK
PREPARED</command> increment this counter.

~~~

5. Question - column names.

Just curious why the columns are called "apply_commit_count" and
"apply_rollback_count"? Specifically, what extra meaning do those
names have versus just calling them "commit_count" and
"rollback_count"?

~~~

6. src/backend/postmaster/pgstat.c - pgstat_report_subscription_xact

@@ -3421,6 +3425,60 @@ pgstat_send_slru(void)
 }

 /* ----------
+ * pgstat_report_subscription_xact() -
+ *
+ * Send a subscription transaction stats to the collector.
+ * The statistics are cleared upon sending.
+ *
+ * 'force' is true only when the subscription worker process exits.
+ * ----------
+ */
+void
+pgstat_report_subscription_xact(bool force)

6a.
I think this comment should be worded more like the other
pgstat_report_subscption_XXX comments

BEFORE
Send a subscription transaction stats to the collector.

SUGGESTED
Tell the collector about subscriptions transaction stats.

6b.
+ * 'force' is true only when the subscription worker process exits.

I thought this comment should just describe what the 'force' param
actually does in this function; not the scenario about who calls it...

~~~

7. src/backend/postmaster/pgstat.c - pgstat_report_subscription_xact

I think the entire function maybe should be relocated to be nearby the
other pgstat_report_subscription_XXX functions in the source.

~~~

8. src/backend/postmaster/pgstat.c - pgstat_report_subscription_xact

+ /*
+ * This function can be called even if nothing at all has happened. In
+ * this case, there's no need to go forward.
+ */

Too much information. Clearly, it is possible for this function to be
called for this case otherwise this code would not exist in the first
place :)
IMO the comment can be much simpler but still say all it needs to.

BEFORE
This function can be called even if nothing at all has happened. In
this case, there's no need to go forward.
SUGGESTED
Bailout early if nothing to do.

~~~

9. src/backend/postmaster/pgstat.c - pgstat_report_subscription_xact

+ if (subStats.subid == InvalidOid ||
+ (subStats.apply_commit_count == 0 && subStats.apply_rollback_count == 0))
+ return;

Maybe using !OisIsValid(subStats.subid) is better?

~~~

10. src/backend/postmaster/pgstat.c - pgstat_report_subscription_xact

+ /*
+ * Don't send a message unless it's been at least PGSTAT_STAT_INTERVAL
+ * msec since we last sent one to avoid overloading the stats
+ * collector.
+ */

SUGGESTED (2 sentences instead of 1)
Don't send a message unless it's been at least PGSTAT_STAT_INTERVAL
msec since we last sent one. This is to avoid overloading the stats
collector.

~~~

11. src/backend/postmaster/pgstat.c - pgstat_report_subscription_xact

+ if (!force)
+ {
+ TimestampTz now = GetCurrentTimestamp();
+
+ /*
+ * Don't send a message unless it's been at least PGSTAT_STAT_INTERVAL
+ * msec since we last sent one to avoid overloading the stats
+ * collector.
+ */
+ if (!TimestampDifferenceExceeds(last_report, now, PGSTAT_STAT_INTERVAL))
+ return;
+ last_report = now;
+ }

(Yeah, I know there is similar code in this module but 2 wrongs do not
make a right)

I think logically it is better to put the 'now' and the 'last_report'
outside this if (!force) block. Otherwise, the forced report is not
setting the 'last_report' time and that just seems strange.

Rearranging this code is better IMO. e.g.
- the conditions are expressed positive instead of negative (!)
- only one return point instead of multiple
- the 'last_report' is always set so that strangeness is eliminated

SUGGESTED (it's the same code but rearranged)

TimestampTz now = GetCurrentTimestamp();

if (force || TimestampDifferenceExceeds(last_report, now, PGSTAT_STAT_INTERVAL))
{
/*
* Prepare and send the message.
*/
pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_SUBSCRIPTIONXACT);
msg.m_databaseid = MyDatabaseId;
msg.m_subid = subStats.subid;
msg.apply_commit_count = subStats.apply_commit_count;
msg.apply_rollback_count = subStats.apply_rollback_count;
pgstat_send(&msg, sizeof(PgStat_MsgSubscriptionXact));
last_report = now;

/*
* Clear out the statistics.
*/
subStats.apply_commit_count = 0;
subStats.apply_rollback_count = 0;
}

~~~

12. src/backend/replication/logical/worker.c - LogicalRepSubscriptionStats

@@ -238,6 +238,8 @@ static ApplyErrorCallbackArg apply_error_callback_arg =
  .ts = 0,
 };

+LogicalRepSubscriptionStats subStats = {InvalidOid, 0, 0};

Maybe better to show explicit the member assignments (like the struct
above this one) to make it more clear.

~~~

13. src/backend/replication/logical/worker.c - subscription_stats_update

@@ -3372,6 +3386,22 @@ TwoPhaseTransactionGid(Oid subid, TransactionId
xid, char *gid, int szgid)
  snprintf(gid, szgid, "pg_gid_%u_%u", subid, xid);
 }

+/*
+ * Update the statistics of subscription.
+ */
+static void
+subscription_stats_update(bool is_commit)
+{
+ Assert(OidIsValid(subStats.subid));
+
+ if (is_commit)
+ subStats.apply_commit_count++;
+ else
+ subStats.apply_rollback_count++;
+
+ pgstat_report_subscription_xact(false);
+}
+

I felt maybe this would be look better split into 2 functions: e.g.
subscription_stats_incr_commit() and
subscription_stats_incr_rollback(). Then it would be more readable
from all the callers instead of the vague looking
subscription_stats_update(true/false).

~~~

14. src/backend/replication/logical/worker.c - subscription_stats_update

+/*
+ * Update the statistics of subscription.
+ */
+static void
+subscription_stats_update(bool is_commit)
+{
+ Assert(OidIsValid(subStats.subid));
+
+ if (is_commit)
+ subStats.apply_commit_count++;
+ else
+ subStats.apply_rollback_count++;
+
+ pgstat_report_subscription_xact(false);
+}

Is it really necessary to be calling
pgstat_report_subscription_xactfrom here? That is already being called
in the LogicalRepApplyLoop. Isn't that enough?

~~~

15. src/backend/replication/logical/worker.c - LogicalRepApplyLoop

@@ -2717,6 +2729,8 @@ LogicalRepApplyLoop(XLogRecPtr last_received)
  if (endofstream)
  break;

+ pgstat_report_subscription_xact(false);
+

Wondering if this call is better to be done a couple of lines up
(above the 'if/break'). Especially if you remove the call from the
subscription_stats_update as suggested in my review comment #14.

~~~

16. src/backend/utils/adt/pgstatfuncs.c - pg_stat_get_subscription_stats

@@ -2424,7 +2424,11 @@ pg_stat_get_subscription_stats(PG_FUNCTION_ARGS)
     INT8OID, -1, 0);
  TupleDescInitEntry(tupdesc, (AttrNumber) 3, "sync_error_count",
     INT8OID, -1, 0);
- TupleDescInitEntry(tupdesc, (AttrNumber) 4, "stats_reset",
+ TupleDescInitEntry(tupdesc, (AttrNumber) 4, "apply_commit_count",
+    INT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 5, "sync_rollback_count",
+    INT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 6, "stats_reset",
     TIMESTAMPTZOID, -1, 0);

Bug? What is "sync_rollback_count"? Looks like a cut/paste error.

~~~

17. src/include/pgstat.h

+typedef struct PgStat_MsgSubscriptionXact
+{
+ PgStat_MsgHdr m_hdr;
+
+ /* determine the worker entry */
+ Oid m_databaseid;
+ Oid m_subid;
+
+ PgStat_Counter apply_commit_count;
+ PgStat_Counter apply_rollback_count;
+} PgStat_MsgSubscriptionXact;

Is that m_databaseid even needed? I did not notice it getting used
(e.g. pgstat_recv_subscription_xact does not use it). Also, wasn't
similar removed from the other subscription error stats?

~~~

18. src/include/pgstat.h

+ PgStat_Counter apply_rollback_count;
+} PgStat_MsgSubscriptionXact;
+
+

The extra blank line can be removed.

~~~

19. src/include/pgstat.h

@@ -1177,6 +1201,8 @@ extern void pgstat_send_archiver(const char
*xlog, bool failed);
 extern void pgstat_send_bgwriter(void);
 extern void pgstat_send_checkpointer(void);
 extern void pgstat_send_wal(bool force);
+extern void pgstat_report_subscription_xact(bool force);
+

 /* ----------

The extra blank line can be removed.

~~~

20. Test for the column names.

The patch added a couple of new columns to statistics so I was
surprised there were no regression test updates needed for these? How
can that be? Shouldn’t there be just one regression test that
validates the view column names are what they are expected to be?

------
Kind Regards,
Peter Smith.
Fujitsu Australia



pgsql-hackers by date:

Previous
From: Wenjing Zeng
Date:
Subject: Re: [Proposal] Global temporary tables
Next
From: Yura Sokolov
Date:
Subject: Re: BufferAlloc: don't take two simultaneous locks