Re: Skipping logical replication transactions on subscriber side - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Skipping logical replication transactions on subscriber side |
Date | |
Msg-id | CAA4eK1J+PDjw_E6tT_AaCn+WSfMhJ1vC+ZuAwBYCSQzc7x9Yxw@mail.gmail.com Whole thread Raw |
In response to | Re: Skipping logical replication transactions on subscriber side (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: Skipping logical replication transactions on subscriber side
|
List | pgsql-hackers |
On Thu, Oct 28, 2021 at 10:36 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Wed, Oct 27, 2021 at 7:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Thu, Oct 21, 2021 at 10:29 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > > I've attached updated patches. > > Thank you for the comments! > > > > > Few comments: > > ============== > > 1. Is the patch cleaning tablesync error entries except via vacuum? If > > not, can't we send a message to remove tablesync errors once tablesync > > is successful (say when we reset skip_xid or when tablesync is > > finished) or when we drop subscription? I think the same applies to > > apply worker. I think we may want to track it in some way whether an > > error has occurred before sending the message but relying completely > > on a vacuum might be the recipe of bloat. I think in the case of a > > drop subscription we can simply send the message as that is not a > > frequent operation. I might be missing something here because in the > > tests after drop subscription you are expecting the entries from the > > view to get cleared > > Yes, I think we can have tablesync worker send a message to drop stats > once tablesync is successful. But if we do that also when dropping a > subscription, I think we need to do that only the transaction is > committed since we can drop a subscription that doesn't have a > replication slot and rollback the transaction. Probably we can send > the message only when the subscritpion does have a replication slot. > Right. And probably for apply worker after updating skip xid. > In other cases, we can remember the subscriptions being dropped and > send the message to drop the statistics of them after committing the > transaction but I’m not sure it’s worth having it. > Yeah, let's not go to that extent. I think in most cases subscriptions will have corresponding slots. FWIW, we completely > rely on pg_stat_vacuum_stats() for cleaning up the dead tables and > functions. And we don't expect there are many subscriptions on the > database. > True, but we do send it for the database, so let's do it for the cases you explained in the first paragraph. > > > > 5. > > +# Check if the view doesn't show any entries after dropping the subscriptions. > > +$node_subscriber->safe_psql( > > + 'postgres', > > + q[ > > +DROP SUBSCRIPTION tap_sub; > > +DROP SUBSCRIPTION tap_sub_streaming; > > +]); > > +$result = $node_subscriber->safe_psql('postgres', > > + "SELECT count(1) FROM pg_stat_subscription_workers"); > > +is($result, q(0), 'no error after dropping subscription'); > > > > Don't we need to wait after dropping the subscription and before > > checking the view as there might be a slight delay in messages to get > > cleared? > > I think the test always passes without waiting for the statistics to > be updated since we fetch the subscription worker statistics from the > stats collector based on the entries of pg_subscription catalog. So > this test checks if statistics of already-dropped subscription doesn’t > show up in the view after DROP SUBSCRIPTION, but does not check if the > subscription worker statistics entry in the stats collector gets > removed. The primary reason is that as I mentioned above, the patch > relies on pgstat_vacuum_stat() for cleaning up the dead subscriptions. > That makes sense. > > > > 7. > > +# Create subscriptions. The table sync for test_tab2 on tap_sub will enter to > > +# infinite error due to violating the unique constraint. > > +my $appname = 'tap_sub'; > > +$node_subscriber->safe_psql( > > + 'postgres', > > + "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr > > application_name=$appname' PUBLICATION tap_pub WITH (streaming = off, > > two_phase = on);"); > > +my $appname_streaming = 'tap_sub_streaming'; > > +$node_subscriber->safe_psql( > > + 'postgres', > > + "CREATE SUBSCRIPTION tap_sub_streaming CONNECTION > > '$publisher_connstr application_name=$appname_streaming' PUBLICATION > > tap_pub_streaming WITH (streaming = on, two_phase = on);"); > > + > > +$node_publisher->wait_for_catchup($appname); > > +$node_publisher->wait_for_catchup($appname_streaming); > > > > How can we ensure that subscriber would have caught up when one of the > > tablesync workers is constantly in the error loop? Isn't it possible > > that the subscriber didn't send the latest lsn feedback till the table > > sync worker is finished? > > > > I thought that even if tablesync for a table is still ongoing, the > apply worker can apply commit records, update write LSN and flush LSN, > and send the feedback to the wal sender. No? > You are right, this case will work. -- With Regards, Amit Kapila.
pgsql-hackers by date: