Re: Design of pg_stat_subscription_workers vs pgstats - Mailing list pgsql-hackers

From Peter Smith
Subject Re: Design of pg_stat_subscription_workers vs pgstats
Date
Msg-id CAHut+PtH-uN5rbGRh-=kCd8xvQYDf_JCcjLcVjW3OXGz6T+xCw@mail.gmail.com
Whole thread Raw
In response to Re: Design of pg_stat_subscription_workers vs pgstats  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: Design of pg_stat_subscription_workers vs pgstats
Re: Design of pg_stat_subscription_workers vs pgstats
List pgsql-hackers
Hi. Below are my review comments for the v1 patch.

======

1. Commit message - wording

As the result of the discussion, we've concluded that the stats
collector is not an appropriate place to store the error information of
subscription workers.

SUGGESTION
It was decided (refer to the Discussion link below) that the stats
collector is not an appropriate place to store the error information of
subscription workers.

~~~

2. Commit message - wording

This commits changes the view so that it stores only statistics
counters: apply_error_count and sync_error_count.

SUGGESTION
"This commits changes the view" --> "This patch changes the
pg_stat_subscription_workers view"

~~~

3. Commit message - wording

Removing these error details, since we don't need to record the
error information for apply workers and tablesync workers separately,
the view now has one entry per subscription.

DID THIS MEAN...
After removing these error details, there is no longer separate
information for apply workers and tablesync workers, so the view now
has only one entry per subscription.

--

But anyway, that is not entirely true either because those counters
are separate information for the different workers, right?

~~

4. Commit message - wording

Also, it changes the view name to pg_stat_subscription_activity
since the word "worker" is an implementation detail that we use one
worker for one tablesync.

SUGGESTION
"Also, it changes" --> "The patch also changes" ...

~~~

5. doc/src/sgml/monitoring.sgml - wording

   <para>
-   The <structname>pg_stat_subscription_workers</structname> view will contain
-   one row per subscription worker on which errors have occurred, for workers
-   applying logical replication changes and workers handling the initial data
-   copy of the subscribed tables.  The statistics entry is removed when the
+   The <structname>pg_stat_subscription_activity</structname> view will contain
+   one row per subscription.  The statistics entry is removed when the
    corresponding subscription is dropped.
   </para>

SUGGESTION
"The statistics entry is removed" --> "This row is removed"

~~~

6.  doc/src/sgml/monitoring.sgml - why two counters?

Please forgive this noob question...

I see there are 2 error_count columns (one for each kind of worker)
but I was wondering why it is useful for users to be able to
distinguish if the error came from the tablesync workers or from the
apply workers? Do you have any example?

Also, IIRC sometimes the tablesync might actually do a few "apply"
changes itself... so the distinction may become a bit fuzzy...

~~~

7. src/backend/postmaster/pgstat.c - comment

@@ -1313,13 +1312,13 @@ pgstat_vacuum_stat(void)
  }

  /*
- * Repeat for subscription workers.  Similarly, we needn't bother in the
- * common case where no subscription workers' stats are being collected.
+ * Repeat for subscription.  Similarly, we needn't bother in the common
+ * case where no subscription stats are being collected.
  */

typo?

"Repeat for subscription." --> "Repeat for subscriptions."

~~~

8. src/backend/postmaster/pgstat.c

@@ -3000,32 +2968,29 @@ pgstat_fetch_stat_funcentry(Oid func_id)

 /*
  * ---------
- * pgstat_fetch_stat_subworker_entry() -
+ * pgstat_fetch_stat_subentry() -
  *
  * Support function for the SQL-callable pgstat* functions. Returns
- * the collected statistics for subscription worker or NULL.
+ * the collected statistics for subscription or NULL.
  * ---------
  */
-PgStat_StatSubWorkerEntry *
-pgstat_fetch_stat_subworker_entry(Oid subid, Oid subrelid)
+PgStat_StatSubEntry *
+pgstat_fetch_stat_subentry(Oid subid)

There seems some kind of inconsistency because the member name is
called "subscriptions" but sometimes it seems singular.

Some places (e.g. pgstat_vacuum_stat) will iterate multiple results,
but then other places (like this function) just return to a single
"subscription" (or "entry").

I suspect all the code may be fine; probably it is just some
inconsistent (singular/plural) comments that have confused things a
bit.

~~~

9. src/backend/replication/logical/worker.c - subscription id

+ /* Report the worker failed during table synchronization */
+ pgstat_report_subscription_error(MyLogicalRepWorker->subid, false);

and

+ /* Report the worker failed during the application of the change */
+ pgstat_report_subscription_error(MyLogicalRepWorker->subid, true);


Why don't these use MySubscription->oid instead of MyLogicalRepWorker->subid?

~~~

10. src/include/pgstat.h - enum order

@@ -84,8 +84,8 @@ typedef enum StatMsgType
  PGSTAT_MTYPE_REPLSLOT,
  PGSTAT_MTYPE_CONNECT,
  PGSTAT_MTYPE_DISCONNECT,
+ PGSTAT_MTYPE_SUBSCRIPTIONERROR,
  PGSTAT_MTYPE_SUBSCRIPTIONPURGE,
- PGSTAT_MTYPE_SUBWORKERERROR,
 } StatMsgType;

This change rearranges the enum order. Maybe it is safer not to do this?

~~~

11. src/include/pgstat.h

@@ -767,8 +747,8 @@ typedef union PgStat_Msg
  PgStat_MsgReplSlot msg_replslot;
  PgStat_MsgConnect msg_connect;
  PgStat_MsgDisconnect msg_disconnect;
+ PgStat_MsgSubscriptionError msg_subscriptionerror;
  PgStat_MsgSubscriptionPurge msg_subscriptionpurge;
- PgStat_MsgSubWorkerError msg_subworkererror;
 } PgStat_Msg;

This change also rearranges the order. Maybe there was no good reason
to do that?

~~~

12. src/include/pgstat.h - PgStat_StatDBEntry

@@ -823,16 +803,12 @@ typedef struct PgStat_StatDBEntry
  TimestampTz stats_timestamp; /* time of db stats file update */

  /*
- * tables, functions, and subscription workers must be last in the struct,
- * because we don't write the pointers out to the stats file.
- *
- * subworkers is the hash table of PgStat_StatSubWorkerEntry which stores
- * statistics of logical replication workers: apply worker and table sync
- * worker.
+ * tables, functions, and subscription must be last in the struct, because
+ * we don't write the pointers out to the stats file.
  */

Should that say "tables, functions, and subscriptions" (plural)

------

Kind Regards,
Peter Smith.
Fujitsu Australia



pgsql-hackers by date:

Previous
From: "tanghy.fnst@fujitsu.com"
Date:
Subject: RE: Design of pg_stat_subscription_workers vs pgstats
Next
From: Andres Freund
Date:
Subject: Re: Frontend error logging style