Re: Skipping logical replication transactions on subscriber side - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Skipping logical replication transactions on subscriber side
Date
Msg-id CAA4eK1J+PDjw_E6tT_AaCn+WSfMhJ1vC+ZuAwBYCSQzc7x9Yxw@mail.gmail.com
Whole thread Raw
In response to Re: Skipping logical replication transactions on subscriber side  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: Skipping logical replication transactions on subscriber side
List pgsql-hackers
On Thu, Oct 28, 2021 at 10:36 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Wed, Oct 27, 2021 at 7:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Oct 21, 2021 at 10:29 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > >
> > > I've attached updated patches.
>
> Thank you for the comments!
>
> >
> > Few comments:
> > ==============
> > 1. Is the patch cleaning tablesync error entries except via vacuum? If
> > not, can't we send a message to remove tablesync errors once tablesync
> > is successful (say when we reset skip_xid or when tablesync is
> > finished) or when we drop subscription? I think the same applies to
> > apply worker. I think we may want to track it in some way whether an
> > error has occurred before sending the message but relying completely
> > on a vacuum might be the recipe of bloat. I think in the case of a
> > drop subscription we can simply send the message as that is not a
> > frequent operation. I might be missing something here because in the
> > tests after drop subscription you are expecting the entries from the
> > view to get cleared
>
> Yes, I think we can have tablesync worker send a message to drop stats
> once tablesync is successful. But if we do that also when dropping a
> subscription, I think we need to do that only the transaction is
> committed since we can drop a subscription that doesn't have a
> replication slot and rollback the transaction. Probably we can send
> the message only when the subscritpion does have a replication slot.
>

Right. And probably for apply worker after updating skip xid.

> In other cases, we can remember the subscriptions being dropped and
> send the message to drop the statistics of them after committing the
> transaction but I’m not sure it’s worth having it.
>

Yeah, let's not go to that extent. I think in most cases subscriptions
will have corresponding slots.

 FWIW, we completely
> rely on pg_stat_vacuum_stats() for cleaning up the dead tables and
> functions. And we don't expect there are many subscriptions on the
> database.
>

True, but we do send it for the database, so let's do it for the cases
you explained in the first paragraph.

> >
> > 5.
> > +# Check if the view doesn't show any entries after dropping the subscriptions.
> > +$node_subscriber->safe_psql(
> > +    'postgres',
> > +    q[
> > +DROP SUBSCRIPTION tap_sub;
> > +DROP SUBSCRIPTION tap_sub_streaming;
> > +]);
> > +$result = $node_subscriber->safe_psql('postgres',
> > +       "SELECT count(1) FROM pg_stat_subscription_workers");
> > +is($result, q(0), 'no error after dropping subscription');
> >
> > Don't we need to wait after dropping the subscription and before
> > checking the view as there might be a slight delay in messages to get
> > cleared?
>
> I think the test always passes without waiting for the statistics to
> be updated since we fetch the subscription worker statistics from the
> stats collector based on the entries of pg_subscription catalog. So
> this test checks if statistics of already-dropped subscription doesn’t
> show up in the view after DROP SUBSCRIPTION, but does not check if the
> subscription worker statistics entry in the stats collector gets
> removed. The primary reason is that as I mentioned above, the patch
> relies on pgstat_vacuum_stat() for cleaning up the dead subscriptions.
>

That makes sense.

> >
> > 7.
> > +# Create subscriptions. The table sync for test_tab2 on tap_sub will enter to
> > +# infinite error due to violating the unique constraint.
> > +my $appname = 'tap_sub';
> > +$node_subscriber->safe_psql(
> > +    'postgres',
> > +    "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr
> > application_name=$appname' PUBLICATION tap_pub WITH (streaming = off,
> > two_phase = on);");
> > +my $appname_streaming = 'tap_sub_streaming';
> > +$node_subscriber->safe_psql(
> > +    'postgres',
> > +    "CREATE SUBSCRIPTION tap_sub_streaming CONNECTION
> > '$publisher_connstr application_name=$appname_streaming' PUBLICATION
> > tap_pub_streaming WITH (streaming = on, two_phase = on);");
> > +
> > +$node_publisher->wait_for_catchup($appname);
> > +$node_publisher->wait_for_catchup($appname_streaming);
> >
> > How can we ensure that subscriber would have caught up when one of the
> > tablesync workers is constantly in the error loop? Isn't it possible
> > that the subscriber didn't send the latest lsn feedback till the table
> > sync worker is finished?
> >
>
> I thought that even if tablesync for a table is still ongoing, the
> apply worker can apply commit records, update write LSN and flush LSN,
> and send the feedback to the wal sender. No?
>

You are right, this case will work.

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Andrew Bille
Date:
Subject: Re: [Proposal] Global temporary tables
Next
From: vignesh C
Date:
Subject: Re: Skipping logical replication transactions on subscriber side