Re: Skipping logical replication transactions on subscriber side - Mailing list pgsql-hackers
| From | Masahiko Sawada |
|---|---|
| Subject | Re: Skipping logical replication transactions on subscriber side |
| Date | |
| Msg-id | CAD21AoDFQn8EGkB=zL_HXR1mnBmfuEYvz8m6aSeqZfc1skef2A@mail.gmail.com Whole thread Raw |
| In response to | Re: Skipping logical replication transactions on subscriber side (Amit Kapila <amit.kapila16@gmail.com>) |
| Responses |
Re: Skipping logical replication transactions on subscriber side
Re: Skipping logical replication transactions on subscriber side Re: Skipping logical replication transactions on subscriber side |
| List | pgsql-hackers |
On Wed, Nov 17, 2021 at 8:14 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Nov 16, 2021 at 12:01 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > Right. I've fixed this issue and attached an updated patch.
> >
>
> Few comments/questions:
> =====================
> 1.
> + <para>
> + The <structname>pg_stat_subscription_workers</structname> view will contain
> + one row per subscription error reported by workers applying logical
> + replication changes and workers handling the initial data copy of the
> + subscribed tables. The statistics entry is removed when the subscription
> + the worker is running on is removed.
> + </para>
>
> The last line of this paragraph is not clear to me. First "the" before
> "worker" in the following part of the sentence seems unnecessary
> "..when the subscription the worker..". Then the part "running on is
> removed" is unclear because it could also mean that we remove the
> entry when a subscription is disabled. Can we rephrase it to: "The
> statistics entry is removed when the corresponding subscription is
> dropped"?
Agreed. Fixed.
>
> 2.
> Between v20 and v23 versions of patch the size of hash table
> PGSTAT_SUBWORKER_HASH_SIZE is increased from 32 to 256. I might have
> missed the comment which lead to this change, can you point me to the
> same or if you changed it for some other reason, can you let me know
> the same?
I'd missed reverting this change. I considered increasing this value
since the lifetime of subscription is long. But when it comes to
unshared hashtable can be expanded on-the-fly, it's better to start
with a small value. Reverted.
>
> 3.
> +
> + /*
> + * Repeat for subscription workers. Similarly, we needn't bother
> + * in the common case where no function stats are being collected.
> + */
>
> /function/subscription workers'
Fixed.
>
> 4.
> + <para>
> + Name of command being applied when the error occurred. This field
> + is always NULL if the error was reported during the initial data
> + copy.
> + </para></entry>
> + </row>
> +
> + <row>
> + <entry role="catalog_table_entry"><para role="column_definition">
> + <structfield>xid</structfield> <type>xid</type>
> + </para>
> + <para>
> + Transaction ID of the publisher node being applied when the error
> + occurred. This field is always NULL if the error was reported
> + during the initial data copy.
> + </para></entry>
>
> Is it important to stress on 'always' in the above two descriptions?
No, removed.
>
> 5.
> The current description of first/last_error_time seems sliglthy
> misleading as one can interpret that these are about different errors.
> Let's slightly change the description of first/last_error_time as
> follows or something on those lines:
>
> </para>
> + <para>
> + Time at which the first error occurred
> + </para></entry>
> + </row>
>
> First time at which this error occurred
>
> <structfield>last_error_time</structfield> <type>timestamp with time zone</type>
> + </para>
> + <para>
> + Time at which the last error occurred
>
> Last time at which this error occurred. This will be the same as
> first_error_time except when the same error occurred more than once
> consecutively.
Changed. I've removed first_error_time as per discussion on the thread
for adding xact stats.
>
> 6.
> + </indexterm>
> + <function>pg_stat_reset_subscription_worker</function> (
> <parameter>subid</parameter> <type>oid</type>, <optional>
> <parameter>relid</parameter> <type>oid</type> </optional> )
> + <returnvalue>void</returnvalue>
> + </para>
> + <para>
> + Resets the statistics of a single subscription worker running on the
> + subscription with <parameter>subid</parameter> shown in the
> + <structname>pg_stat_subscription_worker</structname> view. If the
> + argument <parameter>relid</parameter> is not <literal>NULL</literal>,
> + resets statistics of the subscription worker handling the initial data
> + copy of the relation with <parameter>relid</parameter>. Otherwise,
> + resets the subscription worker statistics of the main apply worker.
> + If the argument <parameter>relid</parameter> is omitted, resets the
> + statistics of all subscription workers running on the subscription
> + with <parameter>subid</parameter>.
> + </para>
>
> The first line of this description seems to indicate that we can only
> reset the stats of a single worker but the later part indicates that
> we can reset stats of all subscription workers. Can we change the
> first line as: "Resets the statistics of subscription workers running
> on the subscription with <parameter>subid</parameter> shown in the
> <structname>pg_stat_subscription_worker</structname> view.".
>
Changed.
> 7.
> pgstat_vacuum_stat()
> {
> ..
> + pgstat_setheader(&spmsg.m_hdr, PGSTAT_MTYPE_SUBSCRIPTIONPURGE);
> + spmsg.m_databaseid = MyDatabaseId;
> + spmsg.m_nentries = 0;
> ..
> }
>
> Do we really need to set the header here? It seems to be getting set
> in pgstat_send_subscription_purge() while sending this message.
Removed.
>
> 8.
> pgstat_vacuum_stat()
> {
> ..
> +
> + if (hash_search(htab, (void *) &(subwentry->key.subid), HASH_FIND, NULL)
> + != NULL)
> + continue;
> +
> + /* This subscription is dead, add the subid to the message */
> + spmsg.m_subids[spmsg.m_nentries++] = subwentry->key.subid;
> ..
> }
>
> I think it is better to use a separate variable here for subid as we
> are using for funcid and tableid. That will make this part of the code
> easier to follow and look consistent.
Agreed, and changed.
>
> 9.
> +/* ----------
> + * PgStat_MsgSubWorkerError Sent by the apply worker or the table
> sync worker to
> + * report the error occurred during logical replication.
> + * ----------
>
> In this comment "during logical replication" sounds too generic. Can
> we instead use "while processing changes." or something like that to
> make it a bit more specific?
"while processing changes" sounds good.
I've attached an updated version patch. Unless I miss something, all
comments I got so far have been incorporated into this patch. Please
review it.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
Attachment
pgsql-hackers by date: