Home > mailing lists

Re: Skipping logical replication transactions on subscriber side - Mailing list pgsql-hackers

From	Masahiko Sawada
Subject	Re: Skipping logical replication transactions on subscriber side
Date	November 24, 2021 11:43:35
Msg-id	CAD21AoDFQn8EGkB=zL_HXR1mnBmfuEYvz8m6aSeqZfc1skef2A@mail.gmail.com Whole thread Raw
In response to	Re: Skipping logical replication transactions on subscriber side (Amit Kapila <amit.kapila16@gmail.com>)
Responses	Re: Skipping logical replication transactions on subscriber side Re: Skipping logical replication transactions on subscriber side Re: Skipping logical replication transactions on subscriber side
List	pgsql-hackers

Tree view

On Wed, Nov 17, 2021 at 8:14 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Nov 16, 2021 at 12:01 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > Right. I've fixed this issue and attached an updated patch.
> >
>
> Few comments/questions:
> =====================
> 1.
> +  <para>
> +   The <structname>pg_stat_subscription_workers</structname> view will contain
> +   one row per subscription error reported by workers applying logical
> +   replication changes and workers handling the initial data copy of the
> +   subscribed tables.  The statistics entry is removed when the subscription
> +   the worker is running on is removed.
> +  </para>
>
> The last line of this paragraph is not clear to me. First "the" before
> "worker" in the following part of the sentence seems unnecessary
> "..when the subscription the worker..". Then the part "running on is
> removed" is unclear because it could also mean that we remove the
> entry when a subscription is disabled. Can we rephrase it to: "The
> statistics entry is removed when the corresponding subscription is
> dropped"?

Agreed. Fixed.

>
> 2.
> Between v20 and v23 versions of patch the size of hash table
> PGSTAT_SUBWORKER_HASH_SIZE is increased from 32 to 256. I might have
> missed the comment which lead to this change, can you point me to the
> same or if you changed it for some other reason, can you let me know
> the same?

I'd missed reverting this change. I considered increasing this value
since the lifetime of subscription is long. But when it comes to
unshared hashtable can be expanded on-the-fly, it's better to start
with a small value. Reverted.

>
> 3.
> +
> + /*
> + * Repeat for subscription workers.  Similarly, we needn't bother
> + * in the common case where no function stats are being collected.
> + */
>
> /function/subscription workers'

Fixed.

>
> 4.
> +      <para>
> +       Name of command being applied when the error occurred.  This field
> +       is always NULL if the error was reported during the initial data
> +       copy.
> +      </para></entry>
> +     </row>
> +
> +     <row>
> +      <entry role="catalog_table_entry"><para role="column_definition">
> +       <structfield>xid</structfield> <type>xid</type>
> +      </para>
> +      <para>
> +       Transaction ID of the publisher node being applied when the error
> +       occurred.  This field is always NULL if the error was reported
> +       during the initial data copy.
> +      </para></entry>
>
> Is it important to stress on 'always' in the above two descriptions?

No, removed.

>
> 5.
> The current description of first/last_error_time seems sliglthy
> misleading as one can interpret that these are about different errors.
> Let's slightly change the description of first/last_error_time as
> follows or something on those lines:
>
> </para>
> +      <para>
> +       Time at which the first error occurred
> +      </para></entry>
> +     </row>
>
> First time at which this error occurred
>
> <structfield>last_error_time</structfield> <type>timestamp with time zone</type>
> +      </para>
> +      <para>
> +       Time at which the last error occurred
>
> Last time at which this error occurred. This will be the same as
> first_error_time except when the same error occurred more than once
> consecutively.

Changed. I've removed first_error_time as per discussion on the thread
for adding xact stats.

>
> 6.
> +        </indexterm>
> +        <function>pg_stat_reset_subscription_worker</function> (
> <parameter>subid</parameter> <type>oid</type>, <optional>
> <parameter>relid</parameter> <type>oid</type> </optional> )
> +        <returnvalue>void</returnvalue>
> +       </para>
> +       <para>
> +        Resets the statistics of a single subscription worker running on the
> +        subscription with <parameter>subid</parameter> shown in the
> +        <structname>pg_stat_subscription_worker</structname> view.  If the
> +        argument <parameter>relid</parameter> is not <literal>NULL</literal>,
> +        resets statistics of the subscription worker handling the initial data
> +        copy of the relation with <parameter>relid</parameter>.  Otherwise,
> +        resets the subscription worker statistics of the main apply worker.
> +        If the argument <parameter>relid</parameter> is omitted, resets the
> +        statistics of all subscription workers running on the subscription
> +        with <parameter>subid</parameter>.
> +       </para>
>
> The first line of this description seems to indicate that we can only
> reset the stats of a single worker but the later part indicates that
> we can reset stats of all subscription workers. Can we change the
> first line as: "Resets the statistics of subscription workers running
> on the subscription with <parameter>subid</parameter> shown in the
> <structname>pg_stat_subscription_worker</structname> view.".
>

Changed.

> 7.
> pgstat_vacuum_stat()
> {
> ..
> + pgstat_setheader(&spmsg.m_hdr, PGSTAT_MTYPE_SUBSCRIPTIONPURGE);
> + spmsg.m_databaseid = MyDatabaseId;
> + spmsg.m_nentries = 0;
> ..
> }
>
> Do we really need to set the header here? It seems to be getting set
> in pgstat_send_subscription_purge() while sending this message.

Removed.

>
> 8.
> pgstat_vacuum_stat()
> {
> ..
> +
> + if (hash_search(htab, (void *) &(subwentry->key.subid), HASH_FIND, NULL)
> + != NULL)
> + continue;
> +
> + /* This subscription is dead, add the subid to the message */
> + spmsg.m_subids[spmsg.m_nentries++] = subwentry->key.subid;
> ..
> }
>
> I think it is better to use a separate variable here for subid as we
> are using for funcid and tableid. That will make this part of the code
> easier to follow and look consistent.

Agreed, and changed.

>
> 9.
> +/* ----------
> + * PgStat_MsgSubWorkerError Sent by the apply worker or the table
> sync worker to
> + * report the error occurred during logical replication.
> + * ----------
>
> In this comment "during logical replication" sounds too generic. Can
> we instead use "while processing changes." or something like that to
> make it a bit more specific?

"while processing changes" sounds good.

I've attached an updated version patch. Unless I miss something, all
comments I got so far have been incorporated into this patch. Please
review it.


Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Attachment

v24-0001-Add-a-subscription-worker-statistics-view-pg_sta.patch

pgsql-hackers by date:

From: Michael Paquier
Date: 24 November 2021, 11:36:27
Subject: Re: VS2022: Support Visual Studio 2022 on Windows

From: Jeevan Ladhe
Date: 24 November 2021, 11:54:16
Subject: Re: Teach pg_receivewal to use lz4 compression

Re: Skipping logical replication transactions on subscriber side - Mailing list pgsql-hackers

Attachment

Previous

Next