Re: Skipping logical replication transactions on subscriber side - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Skipping logical replication transactions on subscriber side
Date
Msg-id CAD21AoAvqBA3FiZS-2Jbh0bj_Hp0AmWu5o-RsSLNgFr1BMiYow@mail.gmail.com
Whole thread Raw
In response to Re: Skipping logical replication transactions on subscriber side  (vignesh C <vignesh21@gmail.com>)
List pgsql-hackers
On Fri, Nov 5, 2021 at 12:57 AM vignesh C <vignesh21@gmail.com> wrote:
>
> On Fri, Oct 29, 2021 at 10:55 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Thu, Oct 28, 2021 at 7:40 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Thu, Oct 28, 2021 at 10:36 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > >
> > > > On Wed, Oct 27, 2021 at 7:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > >
> > > > > On Thu, Oct 21, 2021 at 10:29 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > > > >
> > > > > >
> > > > > > I've attached updated patches.
> > > >
> > > > Thank you for the comments!
> > > >
> > > > >
> > > > > Few comments:
> > > > > ==============
> > > > > 1. Is the patch cleaning tablesync error entries except via vacuum? If
> > > > > not, can't we send a message to remove tablesync errors once tablesync
> > > > > is successful (say when we reset skip_xid or when tablesync is
> > > > > finished) or when we drop subscription? I think the same applies to
> > > > > apply worker. I think we may want to track it in some way whether an
> > > > > error has occurred before sending the message but relying completely
> > > > > on a vacuum might be the recipe of bloat. I think in the case of a
> > > > > drop subscription we can simply send the message as that is not a
> > > > > frequent operation. I might be missing something here because in the
> > > > > tests after drop subscription you are expecting the entries from the
> > > > > view to get cleared
> > > >
> > > > Yes, I think we can have tablesync worker send a message to drop stats
> > > > once tablesync is successful. But if we do that also when dropping a
> > > > subscription, I think we need to do that only the transaction is
> > > > committed since we can drop a subscription that doesn't have a
> > > > replication slot and rollback the transaction. Probably we can send
> > > > the message only when the subscritpion does have a replication slot.
> > > >
> > >
> > > Right. And probably for apply worker after updating skip xid.
> >
> > I'm not sure it's better to drop apply worker stats after resetting
> > skip xid (i.g., after skipping the transaction). Since the view is a
> > cumulative view and has last_error_time, I thought we can have the
> > apply worker stats until the subscription gets dropped. Since the
> > error reporting message could get lost, no entry in the view doesn’t
> > mean the worker doesn’t face an issue.
> >
> > >
> > > > In other cases, we can remember the subscriptions being dropped and
> > > > send the message to drop the statistics of them after committing the
> > > > transaction but I’m not sure it’s worth having it.
> > > >
> > >
> > > Yeah, let's not go to that extent. I think in most cases subscriptions
> > > will have corresponding slots.
> >
> > Agreed.
> >
> > >
> > >  FWIW, we completely
> > > > rely on pg_stat_vacuum_stats() for cleaning up the dead tables and
> > > > functions. And we don't expect there are many subscriptions on the
> > > > database.
> > > >
> > >
> > > True, but we do send it for the database, so let's do it for the cases
> > > you explained in the first paragraph.
> >
> > Agreed.
> >
> > I've attached a new version patch. Since the syntax of skipping
> > transaction id is under the discussion I've attached only the error
> > reporting patch for now.
>
> Thanks for the updated patch, few comments:
> 1) This check and return can be moved above CreateTemplateTupleDesc so
> that the tuple descriptor need not be created if there is no worker
> statistics
> +       BlessTupleDesc(tupdesc);
> +
> +       /* Get subscription worker stats */
> +       wentry = pgstat_fetch_subworker(subid, subrelid);
> +
> +       /* Return NULL if there is no worker statistics */
> +       if (wentry == NULL)
> +               PG_RETURN_NULL();
> +
> +       /* Initialise values and NULL flags arrays */
> +       MemSet(values, 0, sizeof(values));
> +       MemSet(nulls, 0, sizeof(nulls));
>
> 2) "NULL for the main apply worker" is mentioned as "null for the main
> apply worker" in case of pg_stat_subscription view, we can mention it
> similarly.
> +      <para>
> +       OID of the relation that the worker is synchronizing; NULL for the
> +       main apply worker
> +      </para></entry>
>
> 3) Variable assignment can be done during declaration and this the
> assignment can be removed
> +       i = 0;
> +       /* subid */
> +       values[i++] = ObjectIdGetDatum(subid);
>
> 4) I noticed that the worker error is still present when queried from
> pg_stat_subscription_workers even after conflict is resolved in the
> subscriber and the worker proceeds with applying the other
> transactions, should this be documented somewhere?
>
> 5) This needs to be aligned, the columns in select have used TAB, we
> should align it using spaces.
> +CREATE VIEW pg_stat_subscription_workers AS
> +    SELECT
> +       w.subid,
> +       s.subname,
> +       w.subrelid,
> +       w.relid,
> +       w.command,
> +       w.xid,
> +       w.error_count,
> +       w.error_message,
> +       w.last_error_time,
> +       w.stats_reset
>

Thank you for the comments! These comments are incorporated into the
latest (v20) patch I just submitted[1].

Regards,

[1] https://www.postgresql.org/message-id/CAD21AoAT42mhcqeB1jPfRL1%2BEUHbZk8MMY_fBgsyZvJeKNpG%2Bw%40mail.gmail.com

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: Skipping logical replication transactions on subscriber side
Next
From: Zhihong Yu
Date:
Subject: Re: [PROPOSAL] new diagnostic items for the dynamic sql