Re: Skipping logical replication transactions on subscriber side - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Skipping logical replication transactions on subscriber side |
Date | |
Msg-id | CAA4eK1+28p7eZ919Xp=VmtiF+zcEyk3u0-nUWysbxrCkg-ObZw@mail.gmail.com Whole thread Raw |
In response to | Re: Skipping logical replication transactions on subscriber side (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: Skipping logical replication transactions on subscriber side
|
List | pgsql-hackers |
On Tue, Nov 2, 2021 at 2:17 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Tue, Nov 2, 2021 at 2:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Mon, Nov 1, 2021 at 7:18 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > On Fri, Oct 29, 2021 at 8:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > > > Fair enough. So statistics can be removed either by vacuum or drop > > > > subscription. Also, if we go by this logic then there is no harm in > > > > retaining the stat entries for tablesync errors. Why have different > > > > behavior for apply and tablesync workers? > > > > > > My understanding is that the subscription worker statistics entry > > > corresponds to workers (but not physical workers since the physical > > > process is changed after restarting). So if the worker finishes its > > > jobs, it is no longer necessary to show errors since further problems > > > will not occur after that. Table sync worker’s job finishes when > > > completing table copy (unless table sync is performed again by REFRESH > > > PUBLICATION) whereas apply worker’s job finishes when the subscription > > > is dropped. > > > > > > > Actually, I am not very sure how users can use the old error > > information after we allowed skipping the conflicting xid. Say, if > > they want to add/remove some constraints on the table based on > > previous errors then they might want to refer to errors of both the > > apply worker and table sync worker. > > I think that in general, statistics should be retained as long as a > corresponding object exists on the database, like other cumulative > statistic views. So I’m concerned that an entry of a cumulative stats > view is automatically removed by a non-stats-related function (i.g., > ALTER SUBSCRIPTION SKIP). Which seems a new behavior for cumulative > stats views. > > We can retain the stats entries for table sync worker but what I want > to avoid is that the view shows many old entries that will never be > updated. I've sometimes seen cases where the user mistakenly restored > table data on the subscriber before creating a subscription, failed > table sync on many tables due to unique violation, and truncated > tables on the subscriber. I think that unlike the stats entries for > apply worker, retaining the stats entries for table sync could be > harmful since it’s likely to be a large amount (even hundreds of > entries). Especially, it could lead to bloat the stats file since it > has an error message. So if we do that, I'd like to provide a function > for users to remove (not reset) stats entries manually. > If we follow the idea of keeping stats at db level (in PgStat_StatDBEntry) as discussed above then I think we already have a way to remove stat entries via pg_stat_reset which removes the stats corresponding to tables, functions and after this patch corresponding to subscriptions as well for the current database. Won't that be sufficient? I see your point but I think it may be better if we keep the same behavior for stats of apply and table sync workers. Following the tables, functions, I thought of keeping the name of the reset function similar to "pg_stat_reset_single_table_counters" but I feel the currently used name "pg_stat_reset_subscription_worker" in the patch is better. Do let me know what you think? -- With Regards, Amit Kapila.
pgsql-hackers by date: