Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION - Mailing list pgsql-hackers

From japin
Subject Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
Date
Msg-id MEYP282MB1669D9136B6A1EC71C4F7082B6A90@MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM
Whole thread Raw
In response to Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Wed, 13 Jan 2021 at 13:26, Amit Kapila wrote:
> On Tue, Jan 12, 2021 at 4:59 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
>>
>> On Tue, Jan 12, 2021 at 12:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>> > > Here's my analysis:
>> > > 1) in the publisher, alter publication drop table successfully
>> > > removes(PublicationDropTables) the table from the catalogue
>> > > pg_publication_rel
>> > > 2) in the subscriber, alter subscription refresh publication
>> > > successfully removes the table from the catalogue pg_subscription_rel
>> > > (AlterSubscription_refresh->RemoveSubscriptionRel)
>> > > so far so good
>> > >
>> >
>> > Here, it should register the worker to stop on commit, and then on
>> > commit it should call AtEOXact_ApplyLauncher to stop the apply worker.
>> > Once the apply worker is stopped, the corresponding WALSender will
>> > also be stopped. Something here is not happening as per expected
>> > behavior.
>>
>> On the subscriber, an entry for worker stop is created in AlterSubscription_refresh -->
logicalrep_worker_stop_at_commit.At the end of txn, in AtEOXact_ApplyLauncher, we try to stop that worker, but it
cannotbe stopped because logicalrep_worker_find returns null (AtEOXact_ApplyLauncher --> logicalrep_worker_stop -->
logicalrep_worker_find).The worker entry for that subscriber is having relid as 0 [1], due to which the following if
conditionwill not be hit. The apply worker on the subscriber related to the subscription on which refresh publication
wasrun is not closed. It looks like relid 0 is valid because it will be applicable only during the table sync phase,
thecomment in the LogicalRepWorker structure says that.
 
>>
>> And also, I think, expecting the apply worker to be closed this way doesn't make sense because the apply worker is a
per-subscriptionbase, and the subscription can have other tables too.
 
>>
>
> Okay, that makes sense. As responded to Li Japin, let's focus on
> figuring out why we are sending the changes from the publisher node in
> some cases and not in other cases.

After some analysis, I find that the dropped tables always replicate to subscriber.
The difference is that if we drop the table from publication and refresh
publication (on subscriber), the LogicalRepRelMapEntry in should_apply_changes_for_rel()
set state to SUBREL_STATE_UNKNOWN.

(gdb) p *rel
$2 = {remoterel = {remoteid = 16410, nspname = 0x5564fb0177c0 "public",
    relname = 0x5564fb0177a0 "t1", natts = 1, attnames = 0x5564fb0177e0, atttyps = 0x5564fb017780,
    replident = 100 'd', relkind = 0 '\000', attkeys = 0x0}, localrelvalid = true,
  localreloid = 16412, localrel = 0x7f78705da1b8, attrmap = 0x5564fb017800, updatable = false,
  *state = 0 '\000'*, statelsn = 0}

If we insert data between drop table from publication and refresh publication, the
LogicalRepRelMapEntry state is always SUBREL_STATE_READY.

(gdb) p *rel
$2 = {remoterel = {remoteid = 16410, nspname = 0x5564fb0177c0 "public",
    relname = 0x5564fb0177a0 "t1", natts = 1, attnames = 0x5564fb0177e0, atttyps = 0x5564fb017780,
    replident = 100 'd', relkind = 0 '\000', attkeys = 0x0}, localrelvalid = true,
  localreloid = 16412, localrel = 0x7f78705d9d38, attrmap = 0x5564fb017800, updatable = false,
  *state = 114 'r'*, statelsn = 23545672}

I will dig why the state of LogicalRepRelMapEntry doesn't change in second case.

Any suggestion is welcome!

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.



pgsql-hackers by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: Wrong usage of RelationNeedsWAL
Next
From: Heikki Linnakangas
Date:
Subject: Re: ResourceOwner refactoring