Re: Skipping logical replication transactions on subscriber side - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Skipping logical replication transactions on subscriber side
Date
Msg-id CAA4eK1+oj008H4p3cfdvxYADaMJGWCEZ0iWdVLs72t+T0FeQvg@mail.gmail.com
Whole thread Raw
In response to Re: Skipping logical replication transactions on subscriber side  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: Skipping logical replication transactions on subscriber side
List pgsql-hackers
On Tue, Nov 16, 2021 at 12:01 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> Right. I've fixed this issue and attached an updated patch.
>

Few comments/questions:
=====================
1.
+  <para>
+   The <structname>pg_stat_subscription_workers</structname> view will contain
+   one row per subscription error reported by workers applying logical
+   replication changes and workers handling the initial data copy of the
+   subscribed tables.  The statistics entry is removed when the subscription
+   the worker is running on is removed.
+  </para>

The last line of this paragraph is not clear to me. First "the" before
"worker" in the following part of the sentence seems unnecessary
"..when the subscription the worker..". Then the part "running on is
removed" is unclear because it could also mean that we remove the
entry when a subscription is disabled. Can we rephrase it to: "The
statistics entry is removed when the corresponding subscription is
dropped"?

2.
Between v20 and v23 versions of patch the size of hash table
PGSTAT_SUBWORKER_HASH_SIZE is increased from 32 to 256. I might have
missed the comment which lead to this change, can you point me to the
same or if you changed it for some other reason, can you let me know
the same?

3.
+
+ /*
+ * Repeat for subscription workers.  Similarly, we needn't bother
+ * in the common case where no function stats are being collected.
+ */

/function/subscription workers'

4.
+      <para>
+       Name of command being applied when the error occurred.  This field
+       is always NULL if the error was reported during the initial data
+       copy.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>xid</structfield> <type>xid</type>
+      </para>
+      <para>
+       Transaction ID of the publisher node being applied when the error
+       occurred.  This field is always NULL if the error was reported
+       during the initial data copy.
+      </para></entry>

Is it important to stress on 'always' in the above two descriptions?

5.
The current description of first/last_error_time seems sliglthy
misleading as one can interpret that these are about different errors.
Let's slightly change the description of first/last_error_time as
follows or something on those lines:

</para>
+      <para>
+       Time at which the first error occurred
+      </para></entry>
+     </row>

First time at which this error occurred

<structfield>last_error_time</structfield> <type>timestamp with time zone</type>
+      </para>
+      <para>
+       Time at which the last error occurred

Last time at which this error occurred. This will be the same as
first_error_time except when the same error occurred more than once
consecutively.

6.
+        </indexterm>
+        <function>pg_stat_reset_subscription_worker</function> (
<parameter>subid</parameter> <type>oid</type>, <optional>
<parameter>relid</parameter> <type>oid</type> </optional> )
+        <returnvalue>void</returnvalue>
+       </para>
+       <para>
+        Resets the statistics of a single subscription worker running on the
+        subscription with <parameter>subid</parameter> shown in the
+        <structname>pg_stat_subscription_worker</structname> view.  If the
+        argument <parameter>relid</parameter> is not <literal>NULL</literal>,
+        resets statistics of the subscription worker handling the initial data
+        copy of the relation with <parameter>relid</parameter>.  Otherwise,
+        resets the subscription worker statistics of the main apply worker.
+        If the argument <parameter>relid</parameter> is omitted, resets the
+        statistics of all subscription workers running on the subscription
+        with <parameter>subid</parameter>.
+       </para>

The first line of this description seems to indicate that we can only
reset the stats of a single worker but the later part indicates that
we can reset stats of all subscription workers. Can we change the
first line as: "Resets the statistics of subscription workers running
on the subscription with <parameter>subid</parameter> shown in the
<structname>pg_stat_subscription_worker</structname> view.".

7.
pgstat_vacuum_stat()
{
..
+ pgstat_setheader(&spmsg.m_hdr, PGSTAT_MTYPE_SUBSCRIPTIONPURGE);
+ spmsg.m_databaseid = MyDatabaseId;
+ spmsg.m_nentries = 0;
..
}

Do we really need to set the header here? It seems to be getting set
in pgstat_send_subscription_purge() while sending this message.

8.
pgstat_vacuum_stat()
{
..
+
+ if (hash_search(htab, (void *) &(subwentry->key.subid), HASH_FIND, NULL)
+ != NULL)
+ continue;
+
+ /* This subscription is dead, add the subid to the message */
+ spmsg.m_subids[spmsg.m_nentries++] = subwentry->key.subid;
..
}

I think it is better to use a separate variable here for subid as we
are using for funcid and tableid. That will make this part of the code
easier to follow and look consistent.

9.
+/* ----------
+ * PgStat_MsgSubWorkerError Sent by the apply worker or the table
sync worker to
+ * report the error occurred during logical replication.
+ * ----------

In this comment "during logical replication" sounds too generic. Can
we instead use "while processing changes." or something like that to
make it a bit more specific?

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Greg Nancarrow
Date:
Subject: Re: row filtering for logical replication
Next
From: Amit Kapila
Date:
Subject: Re: Skipping logical replication transactions on subscriber side