Re: Skipping logical replication transactions on subscriber side - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Skipping logical replication transactions on subscriber side
Date
Msg-id CAA4eK1+28p7eZ919Xp=VmtiF+zcEyk3u0-nUWysbxrCkg-ObZw@mail.gmail.com
Whole thread Raw
In response to Re: Skipping logical replication transactions on subscriber side  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: Skipping logical replication transactions on subscriber side
List pgsql-hackers
On Tue, Nov 2, 2021 at 2:17 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Tue, Nov 2, 2021 at 2:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, Nov 1, 2021 at 7:18 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Fri, Oct 29, 2021 at 8:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > >
> > > > Fair enough. So statistics can be removed either by vacuum or drop
> > > > subscription. Also, if we go by this logic then there is no harm in
> > > > retaining the stat entries for tablesync errors. Why have different
> > > > behavior for apply and tablesync workers?
> > >
> > > My understanding is that the subscription worker statistics entry
> > > corresponds to workers (but not physical workers since the physical
> > > process is changed after restarting). So if the worker finishes its
> > > jobs, it is no longer necessary to show errors since further problems
> > > will not occur after that. Table sync worker’s job finishes when
> > > completing table copy (unless table sync is performed again by REFRESH
> > > PUBLICATION) whereas apply worker’s job finishes when the subscription
> > > is dropped.
> > >
> >
> > Actually, I am not very sure how users can use the old error
> > information after we allowed skipping the conflicting xid. Say, if
> > they want to add/remove some constraints on the table based on
> > previous errors then they might want to refer to errors of both the
> > apply worker and table sync worker.
>
> I think that in general, statistics should be retained as long as a
> corresponding object exists on the database, like other cumulative
> statistic views. So I’m concerned that an entry of a cumulative stats
> view is automatically removed by a non-stats-related function (i.g.,
> ALTER SUBSCRIPTION SKIP). Which seems a new behavior for cumulative
> stats views.
>
> We can retain the stats entries for table sync worker but what I want
> to avoid is that the view shows many old entries that will never be
> updated. I've sometimes seen cases where the user mistakenly restored
> table data on the subscriber before creating a subscription, failed
> table sync on many tables due to unique violation, and truncated
> tables on the subscriber. I think that unlike the stats entries for
> apply worker, retaining the stats entries for table sync could be
> harmful since it’s likely to be a large amount (even hundreds of
> entries). Especially, it could lead to bloat the stats file since it
> has an error message. So if we do that, I'd like to provide a function
> for users to remove (not reset) stats entries manually.
>

If we follow the idea of keeping stats at db level (in
PgStat_StatDBEntry) as discussed above then I think we already have a
way to remove stat entries via pg_stat_reset which removes the stats
corresponding to tables, functions and after this patch corresponding
to subscriptions as well for the current database. Won't that be
sufficient? I see your point but I think it may be better if we keep
the same behavior for stats of apply and table sync workers.

Following the tables, functions, I thought of keeping the name of the
reset function similar to "pg_stat_reset_single_table_counters" but I
feel the currently used name "pg_stat_reset_subscription_worker" in
the patch is better. Do let me know what you think?

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Jan Wieck
Date:
Subject: Re: should we enable log_checkpoints out of the box?
Next
From: Amit Kapila
Date:
Subject: Re: parallel vacuum comments