Thread: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

Hi,

While providing thoughts on the design in [1], I found a strange
behaviour with the $subject. The use case is shown below as a sequence
of steps that need to be run on publisher and subscriber to arrive at
the strange behaviour.  In step 5, the table is dropped from the
publication and in step 6, the refresh publication is run on the
subscriber, from here onwards, the expectation is that no further
inserts into the publisher table have to be replicated on to the
subscriber, but the opposite happens i.e. the inserts are still
replicated to the subscriber. ISTM as a bug. Let me know if I'm
missing anything.

Thoughts?

step 1) on the publisher:
DROP TABLE t1;
DROP PUBLICATION mypub1;
CREATE TABLE t1 (a int);
INSERT INTO t1 VALUES (1);
CREATE PUBLICATION mypub1 FOR TABLE t1;
postgres=# SELECT r1.*, r2.relname, r3.* FROM pg_publication_rel r1,
pg_class r2, pg_publication r3 WHERE r1.prrelid = r2.oid AND
r1.prpubid = r3.oid;
  oid  | prpubid | prrelid | relname |  oid  | pubname | pubowner |
puballtables | pubinsert | pubupdate | pubdelete | pubtruncate |
pubviaroot

-------+---------+---------+---------+-------+---------+----------+--------------+-----------+-----------+-----------+-------------+------------
 16462 |   16461 |   16458 | t1      | 16461 | mypub1  |       10 | f
          | t         | t         | t         | t           | f
(1 row)

step 2) on the subscriber:
DROP TABLE t1;
DROP SUBSCRIPTION mysub1;
CREATE TABLE t1 (a int);
CREATE SUBSCRIPTION mysub1 CONNECTION 'host=localhost dbname=postgres
user=bharath port=5432' PUBLICATION mypub1;
postgres=# SELECT r1.*, r2.relname, r3.* FROM pg_subscription_rel r1,
pg_class r2, pg_subscription r3 WHERE r1.srrelid = r2.oid AND
r1.srsubid = r3.oid;
 srsubid | srrelid | srsubstate | srsublsn | relname |  oid  | subdbid
| subname | subowner | subenabled | subbinary | substream |
 subconninfo                      | subslotname | subsynccommit |
subpublications

---------+---------+------------+----------+---------+-------+---------+---------+----------+------------+-----------+-----------+---------------------
----------------------------------+-------------+---------------+-----------------
   16446 |   16443 | i          |          | t1      | 16446 |   12872
| mysub1  |       10 | t          | f         | f         |
host=localhost dbnam
e=postgres user=bharath port=5432 | mysub1      | off           | {mypub1}
(1 row)
postgres=# SELECT * FROM t1;
 a
---
 1
(1 row)

step 3) on the publisher:
INSERT INTO t1 VALUES (2);

step 4) on the subscriber:
postgres=# SELECT * FROM t1;
 a
---
 1
 2
(2 rows)

step 5) on the publisher:
ALTER PUBLICATION mypub1 DROP TABLE t1;
postgres=# SELECT r1.*, r2.relname, r3.* FROM pg_publication_rel r1,
pg_class r2, pg_publication r3 WHERE r1.prrelid = r2.oid AND
r1.prpubid = r3.oid;
 oid | prpubid | prrelid | relname | oid | pubname | pubowner |
puballtables | pubinsert | pubupdate | pubdelete | pubtruncate |
pubviaroot

-----+---------+---------+---------+-----+---------+----------+--------------+-----------+-----------+-----------+-------------+------------
(0 rows)
INSERT INTO t1 VALUES (3);

step 6) on the subscriber:
postgres=# SELECT * FROM t1;
 a
---
 1
 2
 3
(3 rows)
ALTER SUBSCRIPTION mysub1 REFRESH PUBLICATION;
postgres=# SELECT r1.*, r2.relname, r3.* FROM pg_subscription_rel r1,
pg_class r2, pg_subscription r3 WHERE r1.srrelid = r2.oid AND
r1.srsubid = r3.oid;
 srsubid | srrelid | srsubstate | srsublsn | relname | oid | subdbid |
subname | subowner | subenabled | subbinary | substream | subconninfo
| subslotn
ame | subsynccommit | subpublications

---------+---------+------------+----------+---------+-----+---------+---------+----------+------------+-----------+-----------+-------------+---------
----+---------------+-----------------
(0 rows)

step 7) on the publisher:
INSERT INTO t1 VALUES (4);

step 8) on the subscriber:
postgres=# SELECT * FROM t1;
 a
---
 1
 2
 3
 4
(4 rows)

step 9) on the publisher:
INSERT INTO t1 SELECT * FROM generate_series(5,100);

step 10) on the subscriber:
postgres=# SELECT count(*) FROM t1;
 count
-------
   100
(1 row)

[1] - https://www.postgresql.org/message-id/CAA4eK1L5TejNHNctyPB3GVuEriRQw6xxU32iMyv%3Dh4tCJKkLew%40mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



On Mon, Jan 11, 2021 at 6:51 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> Hi,
>
> While providing thoughts on the design in [1], I found a strange
> behaviour with the $subject. The use case is shown below as a sequence
> of steps that need to be run on publisher and subscriber to arrive at
> the strange behaviour.  In step 5, the table is dropped from the
> publication and in step 6, the refresh publication is run on the
> subscriber, from here onwards, the expectation is that no further
> inserts into the publisher table have to be replicated on to the
> subscriber, but the opposite happens i.e. the inserts are still
> replicated to the subscriber. ISTM as a bug. Let me know if I'm
> missing anything.
>

Did you try to investigate what's going on? Can you please check what
is the behavior if, after step-5, you restart the subscriber and
separately try creating a new subscription (maybe on a different
server) for that publication after step-5 and see if that allows the
relation to be replicated? AFAIU, in AlterSubscription_refresh, we
remove such dropped rels and stop their corresponding apply workers
which should stop the further replication of such relations but that
doesn't seem to be happening in your case.

-- 
With Regards,
Amit Kapila.



On Tue, 12 Jan 2021 at 11:37, Amit Kapila wrote:
> On Mon, Jan 11, 2021 at 6:51 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
>>
>> Hi,
>>
>> While providing thoughts on the design in [1], I found a strange
>> behaviour with the $subject. The use case is shown below as a sequence
>> of steps that need to be run on publisher and subscriber to arrive at
>> the strange behaviour.  In step 5, the table is dropped from the
>> publication and in step 6, the refresh publication is run on the
>> subscriber, from here onwards, the expectation is that no further
>> inserts into the publisher table have to be replicated on to the
>> subscriber, but the opposite happens i.e. the inserts are still
>> replicated to the subscriber. ISTM as a bug. Let me know if I'm
>> missing anything.
>>
>
> Did you try to investigate what's going on? Can you please check what
> is the behavior if, after step-5, you restart the subscriber and
> separately try creating a new subscription (maybe on a different
> server) for that publication after step-5 and see if that allows the
> relation to be replicated? AFAIU, in AlterSubscription_refresh, we
> remove such dropped rels and stop their corresponding apply workers
> which should stop the further replication of such relations but that
> doesn't seem to be happening in your case.

If we restart the subscriber after step-5, it will not replicate the records.

As I said in [1], if we don't insert a new data in step-5, it will not
replicate the records.

In both cases, the AlterSubscription_refresh() call RemoveSubscriptionRel()
and logicalrep_worker_stop_at_commit().  However, if we insert a data in
step-5, it doesn't work as expected.  Any thoughts?

[1] https://www.postgresql.org/message-id/A7A618FB-F87C-439C-90A3-93CF9E7341FF@hotmail.com

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.



On Tue, Jan 12, 2021 at 9:58 AM japin <japinli@hotmail.com> wrote:
>
>
> On Tue, 12 Jan 2021 at 11:37, Amit Kapila wrote:
> > On Mon, Jan 11, 2021 at 6:51 PM Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote:
> >>
> >> Hi,
> >>
> >> While providing thoughts on the design in [1], I found a strange
> >> behaviour with the $subject. The use case is shown below as a sequence
> >> of steps that need to be run on publisher and subscriber to arrive at
> >> the strange behaviour.  In step 5, the table is dropped from the
> >> publication and in step 6, the refresh publication is run on the
> >> subscriber, from here onwards, the expectation is that no further
> >> inserts into the publisher table have to be replicated on to the
> >> subscriber, but the opposite happens i.e. the inserts are still
> >> replicated to the subscriber. ISTM as a bug. Let me know if I'm
> >> missing anything.
> >>
> >
> > Did you try to investigate what's going on? Can you please check what
> > is the behavior if, after step-5, you restart the subscriber and
> > separately try creating a new subscription (maybe on a different
> > server) for that publication after step-5 and see if that allows the
> > relation to be replicated? AFAIU, in AlterSubscription_refresh, we
> > remove such dropped rels and stop their corresponding apply workers
> > which should stop the further replication of such relations but that
> > doesn't seem to be happening in your case.
>
> If we restart the subscriber after step-5, it will not replicate the records.
>
> As I said in [1], if we don't insert a new data in step-5, it will not
> replicate the records.
>

Hmm, but in Bharath's test, it is replicating the Inserts in step-7
and step-9 as well. Are you seeing something different?

> In both cases, the AlterSubscription_refresh() call RemoveSubscriptionRel()
> and logicalrep_worker_stop_at_commit().  However, if we insert a data in
> step-5, it doesn't work as expected.  Any thoughts?
>

I think the data inserted in step-5 might be visible because we have
stopped the apply process after that but it is not clear why the data
inserted in steps 7 and 9 is getting replicated. I think due to some
reason apply worker is not getting stopped even after Refresh
Publication statement is finished due to which the data is being
replicated after that as well and after restart the apply worker won't
be restarted so the data replication doesn't happen.

-- 
With Regards,
Amit Kapila.



On Tue, 12 Jan 2021 at 13:39, Amit Kapila wrote:
> On Tue, Jan 12, 2021 at 9:58 AM japin <japinli@hotmail.com> wrote:
>>
>>
>> On Tue, 12 Jan 2021 at 11:37, Amit Kapila wrote:
>> > On Mon, Jan 11, 2021 at 6:51 PM Bharath Rupireddy
>> > <bharath.rupireddyforpostgres@gmail.com> wrote:
>> >>
>> >> Hi,
>> >>
>> >> While providing thoughts on the design in [1], I found a strange
>> >> behaviour with the $subject. The use case is shown below as a sequence
>> >> of steps that need to be run on publisher and subscriber to arrive at
>> >> the strange behaviour.  In step 5, the table is dropped from the
>> >> publication and in step 6, the refresh publication is run on the
>> >> subscriber, from here onwards, the expectation is that no further
>> >> inserts into the publisher table have to be replicated on to the
>> >> subscriber, but the opposite happens i.e. the inserts are still
>> >> replicated to the subscriber. ISTM as a bug. Let me know if I'm
>> >> missing anything.
>> >>
>> >
>> > Did you try to investigate what's going on? Can you please check what
>> > is the behavior if, after step-5, you restart the subscriber and
>> > separately try creating a new subscription (maybe on a different
>> > server) for that publication after step-5 and see if that allows the
>> > relation to be replicated? AFAIU, in AlterSubscription_refresh, we
>> > remove such dropped rels and stop their corresponding apply workers
>> > which should stop the further replication of such relations but that
>> > doesn't seem to be happening in your case.
>>
>> If we restart the subscriber after step-5, it will not replicate the records.
>>
>> As I said in [1], if we don't insert a new data in step-5, it will not
>> replicate the records.
>>
>
> Hmm, but in Bharath's test, it is replicating the Inserts in step-7
> and step-9 as well. Are you seeing something different?
>

Yes, however if we don't Inserts in step-5, the Inserts in step-7 and
step-9 will not replicate.


-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.



On Tue, Jan 12, 2021 at 9:05 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Jan 11, 2021 at 6:51 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > Hi,
> >
> > While providing thoughts on the design in [1], I found a strange
> > behaviour with the $subject. The use case is shown below as a sequence
> > of steps that need to be run on publisher and subscriber to arrive at
> > the strange behaviour.  In step 5, the table is dropped from the
> > publication and in step 6, the refresh publication is run on the
> > subscriber, from here onwards, the expectation is that no further
> > inserts into the publisher table have to be replicated on to the
> > subscriber, but the opposite happens i.e. the inserts are still
> > replicated to the subscriber. ISTM as a bug. Let me know if I'm
> > missing anything.
> >
>
> Did you try to investigate what's going on? Can you please check what
> is the behavior if, after step-5, you restart the subscriber and
> separately try creating a new subscription (maybe on a different
> server) for that publication after step-5 and see if that allows the
> relation to be replicated? AFAIU, in AlterSubscription_refresh, we
> remove such dropped rels and stop their corresponding apply workers
> which should stop the further replication of such relations but that
> doesn't seem to be happening in your case.

Here's my analysis:
1) in the publisher, alter publication drop table successfully
removes(PublicationDropTables) the table from the catalogue
pg_publication_rel
2) in the subscriber, alter subscription refresh publication
successfully removes the table from the catalogue pg_subscription_rel
(AlterSubscription_refresh->RemoveSubscriptionRel)
so far so good
3) after the insertion into the table in the publisher(remember that
it's dropped from the publication in (1)), the walsender process is
unable detect that the table has been dropped from the publication
i.e. it doesn't look at the pg_publication_rel catalogue or some
other, but it only does is_publishable_relation() check which returns
true in pgoutput_change(). Maybe the walsender should look at the
catalogue pg_publication_rel in is_publishable_relation()?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



On Tue, Jan 12, 2021 at 11:39 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Tue, Jan 12, 2021 at 9:05 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, Jan 11, 2021 at 6:51 PM Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > >
> > > Hi,
> > >
> > > While providing thoughts on the design in [1], I found a strange
> > > behaviour with the $subject. The use case is shown below as a sequence
> > > of steps that need to be run on publisher and subscriber to arrive at
> > > the strange behaviour.  In step 5, the table is dropped from the
> > > publication and in step 6, the refresh publication is run on the
> > > subscriber, from here onwards, the expectation is that no further
> > > inserts into the publisher table have to be replicated on to the
> > > subscriber, but the opposite happens i.e. the inserts are still
> > > replicated to the subscriber. ISTM as a bug. Let me know if I'm
> > > missing anything.
> > >
> >
> > Did you try to investigate what's going on? Can you please check what
> > is the behavior if, after step-5, you restart the subscriber and
> > separately try creating a new subscription (maybe on a different
> > server) for that publication after step-5 and see if that allows the
> > relation to be replicated? AFAIU, in AlterSubscription_refresh, we
> > remove such dropped rels and stop their corresponding apply workers
> > which should stop the further replication of such relations but that
> > doesn't seem to be happening in your case.
>
> Here's my analysis:
> 1) in the publisher, alter publication drop table successfully
> removes(PublicationDropTables) the table from the catalogue
> pg_publication_rel
> 2) in the subscriber, alter subscription refresh publication
> successfully removes the table from the catalogue pg_subscription_rel
> (AlterSubscription_refresh->RemoveSubscriptionRel)
> so far so good
>

Here, it should register the worker to stop on commit, and then on
commit it should call AtEOXact_ApplyLauncher to stop the apply worker.
Once the apply worker is stopped, the corresponding WALSender will
also be stopped. Something here is not happening as per expected
behavior.

> 3) after the insertion into the table in the publisher(remember that
> it's dropped from the publication in (1)), the walsender process is
> unable detect that the table has been dropped from the publication
> i.e. it doesn't look at the pg_publication_rel catalogue or some
> other, but it only does is_publishable_relation() check which returns
> true in pgoutput_change(). Maybe the walsender should look at the
> catalogue pg_publication_rel in is_publishable_relation()?
>

We must be somewhere checking pg_publication_rel before sending the
decoded change because otherwise, we would have sent the changes for
the table which are not even part of this publication. I think you can
try to create a separate table that is not part of the publication
under test and see how the changes for that are filtered.

-- 
With Regards,
Amit Kapila.



On Tue, 12 Jan 2021 at 14:38, Amit Kapila wrote:
> On Tue, Jan 12, 2021 at 11:39 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
>>
>> On Tue, Jan 12, 2021 at 9:05 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>> >
>> > On Mon, Jan 11, 2021 at 6:51 PM Bharath Rupireddy
>> > <bharath.rupireddyforpostgres@gmail.com> wrote:
>> > >
>> > > Hi,
>> > >
>> > > While providing thoughts on the design in [1], I found a strange
>> > > behaviour with the $subject. The use case is shown below as a sequence
>> > > of steps that need to be run on publisher and subscriber to arrive at
>> > > the strange behaviour.  In step 5, the table is dropped from the
>> > > publication and in step 6, the refresh publication is run on the
>> > > subscriber, from here onwards, the expectation is that no further
>> > > inserts into the publisher table have to be replicated on to the
>> > > subscriber, but the opposite happens i.e. the inserts are still
>> > > replicated to the subscriber. ISTM as a bug. Let me know if I'm
>> > > missing anything.
>> > >
>> >
>> > Did you try to investigate what's going on? Can you please check what
>> > is the behavior if, after step-5, you restart the subscriber and
>> > separately try creating a new subscription (maybe on a different
>> > server) for that publication after step-5 and see if that allows the
>> > relation to be replicated? AFAIU, in AlterSubscription_refresh, we
>> > remove such dropped rels and stop their corresponding apply workers
>> > which should stop the further replication of such relations but that
>> > doesn't seem to be happening in your case.
>>
>> Here's my analysis:
>> 1) in the publisher, alter publication drop table successfully
>> removes(PublicationDropTables) the table from the catalogue
>> pg_publication_rel
>> 2) in the subscriber, alter subscription refresh publication
>> successfully removes the table from the catalogue pg_subscription_rel
>> (AlterSubscription_refresh->RemoveSubscriptionRel)
>> so far so good
>>
>
> Here, it should register the worker to stop on commit, and then on
> commit it should call AtEOXact_ApplyLauncher to stop the apply worker.
> Once the apply worker is stopped, the corresponding WALSender will
> also be stopped. Something here is not happening as per expected
> behavior.
>
>> 3) after the insertion into the table in the publisher(remember that
>> it's dropped from the publication in (1)), the walsender process is
>> unable detect that the table has been dropped from the publication
>> i.e. it doesn't look at the pg_publication_rel catalogue or some
>> other, but it only does is_publishable_relation() check which returns
>> true in pgoutput_change(). Maybe the walsender should look at the
>> catalogue pg_publication_rel in is_publishable_relation()?
>>
>
> We must be somewhere checking pg_publication_rel before sending the
> decoded change because otherwise, we would have sent the changes for
> the table which are not even part of this publication. I think you can
> try to create a separate table that is not part of the publication
> under test and see how the changes for that are filtered.

I find that pgoutput_change() use a hash table RelationSyncCache to
cache the publication info for tables.  When we drop tables from the
publication, the RelationSyncCache doesn't updated, so it replicate
records.

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.




On Jan 12, 2021, at 5:47 PM, japin <japinli@hotmail.com> wrote:


On Tue, 12 Jan 2021 at 14:38, Amit Kapila wrote:
On Tue, Jan 12, 2021 at 11:39 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Tue, Jan 12, 2021 at 9:05 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Jan 11, 2021 at 6:51 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

Hi,

While providing thoughts on the design in [1], I found a strange
behaviour with the $subject. The use case is shown below as a sequence
of steps that need to be run on publisher and subscriber to arrive at
the strange behaviour.  In step 5, the table is dropped from the
publication and in step 6, the refresh publication is run on the
subscriber, from here onwards, the expectation is that no further
inserts into the publisher table have to be replicated on to the
subscriber, but the opposite happens i.e. the inserts are still
replicated to the subscriber. ISTM as a bug. Let me know if I'm
missing anything.


Did you try to investigate what's going on? Can you please check what
is the behavior if, after step-5, you restart the subscriber and
separately try creating a new subscription (maybe on a different
server) for that publication after step-5 and see if that allows the
relation to be replicated? AFAIU, in AlterSubscription_refresh, we
remove such dropped rels and stop their corresponding apply workers
which should stop the further replication of such relations but that
doesn't seem to be happening in your case.

Here's my analysis:
1) in the publisher, alter publication drop table successfully
removes(PublicationDropTables) the table from the catalogue
pg_publication_rel
2) in the subscriber, alter subscription refresh publication
successfully removes the table from the catalogue pg_subscription_rel
(AlterSubscription_refresh->RemoveSubscriptionRel)
so far so good


Here, it should register the worker to stop on commit, and then on
commit it should call AtEOXact_ApplyLauncher to stop the apply worker.
Once the apply worker is stopped, the corresponding WALSender will
also be stopped. Something here is not happening as per expected
behavior.

3) after the insertion into the table in the publisher(remember that
it's dropped from the publication in (1)), the walsender process is
unable detect that the table has been dropped from the publication
i.e. it doesn't look at the pg_publication_rel catalogue or some
other, but it only does is_publishable_relation() check which returns
true in pgoutput_change(). Maybe the walsender should look at the
catalogue pg_publication_rel in is_publishable_relation()?


We must be somewhere checking pg_publication_rel before sending the
decoded change because otherwise, we would have sent the changes for
the table which are not even part of this publication. I think you can
try to create a separate table that is not part of the publication
under test and see how the changes for that are filtered.

I find that pgoutput_change() use a hash table RelationSyncCache to
cache the publication info for tables.  When we drop tables from the
publication, the RelationSyncCache doesn't updated, so it replicate
records.


IIUC the logical replication only replicate the tables in publication, I think
when the tables that aren't in publication should not be replicated.

Attached the patch that fixes it.  Thought?
 
-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.




Attachment
On Tue, Jan 12, 2021 at 12:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > Here's my analysis:
> > 1) in the publisher, alter publication drop table successfully
> > removes(PublicationDropTables) the table from the catalogue
> > pg_publication_rel
> > 2) in the subscriber, alter subscription refresh publication
> > successfully removes the table from the catalogue pg_subscription_rel
> > (AlterSubscription_refresh->RemoveSubscriptionRel)
> > so far so good
> >
>
> Here, it should register the worker to stop on commit, and then on
> commit it should call AtEOXact_ApplyLauncher to stop the apply worker.
> Once the apply worker is stopped, the corresponding WALSender will
> also be stopped. Something here is not happening as per expected
> behavior.

On the subscriber, an entry for worker stop is created in AlterSubscription_refresh --> logicalrep_worker_stop_at_commit. At the end of txn, in AtEOXact_ApplyLauncher, we try to stop that worker, but it cannot be stopped because logicalrep_worker_find returns null (AtEOXact_ApplyLauncher --> logicalrep_worker_stop --> logicalrep_worker_find). The worker entry for that subscriber is having relid as 0 [1], due to which the following if condition will not be hit. The apply worker on the subscriber related to the subscription on which refresh publication was run is not closed. It looks like relid 0 is valid because it will be applicable only during the table sync phase, the comment in the LogicalRepWorker structure says that.

And also, I think, expecting the apply worker to be closed this way doesn't make sense because the apply worker is a per-subscription base, and the subscription can have other tables too.

    /* Search for attached worker for a given subscription id. */
    for (i = 0; i < max_logical_replication_workers; i++)
    {
        LogicalRepWorker *w = &LogicalRepCtx->workers[i];

        if (w->in_use && w->subid == subid && w->relid == relid &&
            (!only_running || w->proc))
        {
            res = w;
            break;
        }
    }

[1]
(gdb) p subid
$5 = 16391
(gdb) p relid
$6 = 16388
(gdb) p *w
$4 = {launch_time = 663760343317760, in_use = true, generation = 1, proc = 0x7fdfd9a7cc90,
  dbid = 12872, userid = 10, subid = 16391, relid = 0, relstate = 0 '\000', relstate_lsn = 0,
  relmutex = 0 '\000', last_lsn = 22798424, last_send_time = 663760483945980,
  last_recv_time = 663760483946087, reply_lsn = 22798424, reply_time = 663760483945980}

postgres=# select * from pg_stat_get_subscription(16391);
 subid | relid |  pid   | received_lsn |        last_msg_send_time        |      last_msg_receipt_time       | latest_end_lsn |         latest_end_time          
-------+-------+--------+--------------+----------------------------------+----------------------------------+----------------+----------------------------------
 16391 |       | 466779 | 0/15BE140    | 2021-01-12 15:26:48.778813+05:30 | 2021-01-12 15:26:48.778878+05:30 | 0/15BE140      | 2021-01-12 15:26:48.778813+05:30
(1 row)

> > 3) after the insertion into the table in the publisher(remember that
> > it's dropped from the publication in (1)), the walsender process is
> > unable detect that the table has been dropped from the publication
> > i.e. it doesn't look at the pg_publication_rel catalogue or some
> > other, but it only does is_publishable_relation() check which returns
> > true in pgoutput_change(). Maybe the walsender should look at the
> > catalogue pg_publication_rel in is_publishable_relation()?
> >
>
> We must be somewhere checking pg_publication_rel before sending the
> decoded change because otherwise, we would have sent the changes for
> the table which are not even part of this publication. I think you can
> try to create a separate table that is not part of the publication
> under test and see how the changes for that are filtered.

As pointed out by japin in the next thread, the walsender process in the publisher uses RelationSyncCache to hold the relations to which the insertions need to be sent to the subscriber. The RelationSyncCache gets created during startup of the walsender process(pgoutput_startup->init_rel_sync_cache) and also rel_sync_cache_publication_cb callback gets registered. So, if any alters happen to the pg_publication_rel catalog table, the callback rel_sync_cache_publication_cb is called, all the entries in the RelationSyncCache are marked as invalid, with the expectation that on the next use of any cache entry in get_rel_sync_entry (pgoutput_change->get_rel_sync_entry), that entry is validated again.

In the use case, the invalidation happens as expected in rel_sync_cache_publication_cb and while revalidating the entry in get_rel_sync_entry, since there is only one publication to which the given relation is attached to, the pubids will be null and we don't set the entry->pubactions.pubinsert/update/delete/truncate to false, so the publisher keeps publishing the inserts to the relation.

/* Validate the entry */
if (!entry->replicate_valid)
{
List   *pubids = GetRelationPublications(relid);

But, we cannot right away set to entry->pubactions.pubinsert/update/delete/truncate to false, when there are no publications attached to the given relation, because in that case, we don't know whether the user has run the alter subscription...refresh publication; on the subscriber.

If we want to achieve the drop table behaviour what's stated in the document (i.e. after publisher drops a table from the publication, the subscriber keeps receiving the data until refresh publication is run on it), the right place to set those false i.e. tell the walsender process not to publish the insertions further, is if there is a way in the publisher we could know that the user has run the alter subscription...refresh publication on the subscriber. But I don't see currently, the subscriber informing back to the publisher after AlterSubscription_refresh().

Thoughts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

On Tue, Jan 12, 2021 at 4:47 PM Li Japin <japinli@hotmail.com> wrote:
> IIUC the logical replication only replicate the tables in publication, I think
> when the tables that aren't in publication should not be replicated.
>
> Attached the patch that fixes it.  Thought?

With that change, we don't get the behaviour that's stated in the
document - "The ADD TABLE and DROP TABLE clauses will add and remove
one or more tables from the publication. Note that adding tables to a
publication that is already subscribed to will require a ALTER
SUBSCRIPTION ... REFRESH PUBLICATION action on the subscribing side in
order to become effective" -
https://www.postgresql.org/docs/devel/sql-alterpublication.html.

The publisher stops sending the tuples whenever the relation gets
dropped from the publication, not waiting until alter subscription ...
refresh publication on the subscriber.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



On Tue, 12 Jan 2021 at 19:32, Bharath Rupireddy wrote:
> On Tue, Jan 12, 2021 at 4:47 PM Li Japin <japinli@hotmail.com> wrote:
>> IIUC the logical replication only replicate the tables in publication, I think
>> when the tables that aren't in publication should not be replicated.
>>
>> Attached the patch that fixes it.  Thought?
>
> With that change, we don't get the behaviour that's stated in the
> document - "The ADD TABLE and DROP TABLE clauses will add and remove
> one or more tables from the publication. Note that adding tables to a
> publication that is already subscribed to will require a ALTER
> SUBSCRIPTION ... REFRESH PUBLICATION action on the subscribing side in
> order to become effective" -
> https://www.postgresql.org/docs/devel/sql-alterpublication.html.
>

The documentation only emphasize adding tables to a publication, not
include dropping tables from a publication.

> The publisher stops sending the tuples whenever the relation gets
> dropped from the publication, not waiting until alter subscription ...
> refresh publication on the subscriber.
>

If we want to wait the subscriber executing alter subscription ... refresh publication,
maybe we should send some feedback to walsender.  How can we send this feedback to
walsender in non-walreceiver process?

> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com


-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.



On Tue, Jan 12, 2021 at 5:23 PM japin <japinli@hotmail.com> wrote:
>
> On Tue, 12 Jan 2021 at 19:32, Bharath Rupireddy wrote:
> > On Tue, Jan 12, 2021 at 4:47 PM Li Japin <japinli@hotmail.com> wrote:
> >> IIUC the logical replication only replicate the tables in publication, I think
> >> when the tables that aren't in publication should not be replicated.
> >>
> >> Attached the patch that fixes it.  Thought?
> >
> > With that change, we don't get the behaviour that's stated in the
> > document - "The ADD TABLE and DROP TABLE clauses will add and remove
> > one or more tables from the publication. Note that adding tables to a
> > publication that is already subscribed to will require a ALTER
> > SUBSCRIPTION ... REFRESH PUBLICATION action on the subscribing side in
> > order to become effective" -
> > https://www.postgresql.org/docs/devel/sql-alterpublication.html.
> >
>
> The documentation only emphasize adding tables to a publication, not
> include dropping tables from a publication.
>

Right and I think that is ensured by the subscriber by calling
should_apply_changes_for_rel() which won't return true unless the
newly added relation is not synced via Refresh Publication. So, this
means with or without this patch we should be sending the changes of
the newly published table from the publisher?

I have another question on your patch which is why in some cases like
when we have not inserted in step-5 (as mentioned by you) the
following insertions are not sent. Is somehow we are setting the
pubactions as false in that case, if so, how?

> > The publisher stops sending the tuples whenever the relation gets
> > dropped from the publication, not waiting until alter subscription ...
> > refresh publication on the subscriber.
> >
>
> If we want to wait the subscriber executing alter subscription ... refresh publication,
> maybe we should send some feedback to walsender.  How can we send this feedback to
> walsender in non-walreceiver process?
>

I don't think we need this if what I said above is correct.

-- 
With Regards,
Amit Kapila.



On Tue, Jan 12, 2021 at 4:59 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Tue, Jan 12, 2021 at 12:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > Here's my analysis:
> > > 1) in the publisher, alter publication drop table successfully
> > > removes(PublicationDropTables) the table from the catalogue
> > > pg_publication_rel
> > > 2) in the subscriber, alter subscription refresh publication
> > > successfully removes the table from the catalogue pg_subscription_rel
> > > (AlterSubscription_refresh->RemoveSubscriptionRel)
> > > so far so good
> > >
> >
> > Here, it should register the worker to stop on commit, and then on
> > commit it should call AtEOXact_ApplyLauncher to stop the apply worker.
> > Once the apply worker is stopped, the corresponding WALSender will
> > also be stopped. Something here is not happening as per expected
> > behavior.
>
> On the subscriber, an entry for worker stop is created in AlterSubscription_refresh -->
logicalrep_worker_stop_at_commit.At the end of txn, in AtEOXact_ApplyLauncher, we try to stop that worker, but it
cannotbe stopped because logicalrep_worker_find returns null (AtEOXact_ApplyLauncher --> logicalrep_worker_stop -->
logicalrep_worker_find).The worker entry for that subscriber is having relid as 0 [1], due to which the following if
conditionwill not be hit. The apply worker on the subscriber related to the subscription on which refresh publication
wasrun is not closed. It looks like relid 0 is valid because it will be applicable only during the table sync phase,
thecomment in the LogicalRepWorker structure says that. 
>
> And also, I think, expecting the apply worker to be closed this way doesn't make sense because the apply worker is a
per-subscriptionbase, and the subscription can have other tables too. 
>

Okay, that makes sense. As responded to Li Japin, let's focus on
figuring out why we are sending the changes from the publisher node in
some cases and not in other cases.

--
With Regards,
Amit Kapila.



On Wed, Jan 13, 2021 at 10:33 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Jan 12, 2021 at 5:23 PM japin <japinli@hotmail.com> wrote:
> >
> > On Tue, 12 Jan 2021 at 19:32, Bharath Rupireddy wrote:
> > > On Tue, Jan 12, 2021 at 4:47 PM Li Japin <japinli@hotmail.com> wrote:
> > >> IIUC the logical replication only replicate the tables in publication, I think
> > >> when the tables that aren't in publication should not be replicated.
> > >>
> > >> Attached the patch that fixes it.  Thought?
> > >
> > > With that change, we don't get the behaviour that's stated in the
> > > document - "The ADD TABLE and DROP TABLE clauses will add and remove
> > > one or more tables from the publication. Note that adding tables to a
> > > publication that is already subscribed to will require a ALTER
> > > SUBSCRIPTION ... REFRESH PUBLICATION action on the subscribing side in
> > > order to become effective" -
> > > https://www.postgresql.org/docs/devel/sql-alterpublication.html.
> > >
> >
> > The documentation only emphasize adding tables to a publication, not
> > include dropping tables from a publication.
> >
>
> Right and I think that is ensured by the subscriber by calling
> should_apply_changes_for_rel() which won't return true unless the
> newly added relation is not synced via Refresh Publication. So, this
> means with or without this patch we should be sending the changes of
> the newly published table from the publisher?

Oh, my bad, alter subscription...refresh publication is required only when the tables are added to the publisher. Patch by japin makes the walsender process to stop sending the data to the subscriber/apply worker. The patch is based on the idea of looking at the PUBLICATIONRELMAP in get_rel_sync_entry when the entries have been invalidated in rel_sync_cache_publication_cb because of alter publication...drop table.
,
When the alter subscription...refresh publication is run on the subscriber, the SUBSCRIPTIONRELMAP catalogue gets invalidated but the corresponding cache entries in the LogicalRepRelMap which is used by logicalrep_rel_open are not invalidated. LogicalRepRelMap is used to know the relations that are associated with the subscription. But we seem to have not taken care of invalidating those entries, though we have the invalidation callback invalidate_syncing_table_states registered for SUBSCRIPTIONRELMAP in ApplyWorkerMain. So, we miss to work on updating the entries in LogicalRepRelMap.

IMO, the ideal way to fix this issue is 1) stop the walsender sending the changes to dropped tables, for this japin patch works 2) we must mark all the LogicalRepRelMap entries as invalid in invalidate_syncing_table_states so that in the next logicalrep_rel_open, if the entry is invalidated, then we have to call GetSubscriptionRelState to get the latest state, as shown in [1]. Likewise, we might also have to mark the cache entries invalid in  subscription_change_cb which is invalidation callback for pg_subscription

Thoughts?

[1] -
    if (entry->state != SUBREL_STATE_READY || entry->invalid)
        entry->state = GetSubscriptionRelState(MySubscription->oid,
                                               entry->localreloid,
                                               &entry->statelsn);
   
   if (entry->invalid)
       entry->invalid = false;

    return entry;

> I have another question on your patch which is why in some cases like
> when we have not inserted in step-5 (as mentioned by you) the
> following insertions are not sent. Is somehow we are setting the
> pubactions as false in that case, if so, how?

The reason is that the issue reported in this thread occurs - when we have the walsender process running, RelationSyncCache is initialized, we inserted some data into the table that's sent to the subscriber and the table is dropped, we miss to set the pubactions to false in get_rel_sync_entry, though the cache entries have been invalidated.

In some cases, it works properly because the old walsender process was stopped and when a new walsender is started, then the cache RelationSyncCache gets initialized again and the pubactions will be set to false in get_rel_sync_entry.

if (!found)
    {
        /* immediately make a new entry valid enough to satisfy callbacks */
        entry->schema_sent = false;
        entry->streamed_txns = NIL;
        entry->replicate_valid = false;
        entry->pubactions.pubinsert = entry->pubactions.pubupdate =
            entry->pubactions.pubdelete = entry->pubactions.pubtruncate = false;
        entry->publish_as_relid = InvalidOid;
    }

Hope that clarifies why the issue happens in some cases and not in other cases.

> > > The publisher stops sending the tuples whenever the relation gets
> > > dropped from the publication, not waiting until alter subscription ...
> > > refresh publication on the subscriber.
> > >
> >
> > If we want to wait the subscriber executing alter subscription ... refresh publication,
> > maybe we should send some feedback to walsender.  How can we send this feedback to
> > walsender in non-walreceiver process?
> >
>
> I don't think we need this if what I said above is correct.

My bad. This thought was emanated from my poor reading of the documentation.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
On Wed, Jan 13, 2021 at 11:08 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Wed, Jan 13, 2021 at 10:33 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Jan 12, 2021 at 5:23 PM japin <japinli@hotmail.com> wrote:
> > >
> > > On Tue, 12 Jan 2021 at 19:32, Bharath Rupireddy wrote:
> > > > On Tue, Jan 12, 2021 at 4:47 PM Li Japin <japinli@hotmail.com> wrote:
> > > >> IIUC the logical replication only replicate the tables in publication, I think
> > > >> when the tables that aren't in publication should not be replicated.
> > > >>
> > > >> Attached the patch that fixes it.  Thought?
> > > >
> > > > With that change, we don't get the behaviour that's stated in the
> > > > document - "The ADD TABLE and DROP TABLE clauses will add and remove
> > > > one or more tables from the publication. Note that adding tables to a
> > > > publication that is already subscribed to will require a ALTER
> > > > SUBSCRIPTION ... REFRESH PUBLICATION action on the subscribing side in
> > > > order to become effective" -
> > > > https://www.postgresql.org/docs/devel/sql-alterpublication.html.
> > > >
> > >
> > > The documentation only emphasize adding tables to a publication, not
> > > include dropping tables from a publication.
> > >
> >
> > Right and I think that is ensured by the subscriber by calling
> > should_apply_changes_for_rel() which won't return true unless the
> > newly added relation is not synced via Refresh Publication. So, this
> > means with or without this patch we should be sending the changes of
> > the newly published table from the publisher?
>
> Oh, my bad, alter subscription...refresh publication is required only when the tables are added to the publisher.
Patchby japin makes the walsender process to stop sending the data to the subscriber/apply worker. The patch is based
onthe idea of looking at the PUBLICATIONRELMAP in get_rel_sync_entry when the entries have been invalidated in
rel_sync_cache_publication_cbbecause of alter publication...drop table. 
> ,
> When the alter subscription...refresh publication is run on the subscriber, the SUBSCRIPTIONRELMAP catalogue gets
invalidatedbut the corresponding cache entries in the LogicalRepRelMap which is used by logicalrep_rel_open are not
invalidated.LogicalRepRelMap is used to know the relations that are associated with the subscription. But we seem to
havenot taken care of invalidating those entries, though we have the invalidation callback
invalidate_syncing_table_statesregistered for SUBSCRIPTIONRELMAP in ApplyWorkerMain. So, we miss to work on updating
theentries in LogicalRepRelMap. 
>
> IMO, the ideal way to fix this issue is 1) stop the walsender sending the changes to dropped tables, for this japin
patchworks 2) we must mark all the LogicalRepRelMap entries as invalid in invalidate_syncing_table_states so that in
thenext logicalrep_rel_open, if the entry is invalidated, then we have to call GetSubscriptionRelState to get the
lateststate, as shown in [1]. Likewise, we might also have to mark the cache entries invalid in  subscription_change_cb
whichis invalidation callback for pg_subscription 
>

Is the second point you described here is related to the original
bug-reported or will it cause some other problem?

> Thoughts?
>
> [1] -
>     if (entry->state != SUBREL_STATE_READY || entry->invalid)
>         entry->state = GetSubscriptionRelState(MySubscription->oid,
>                                                entry->localreloid,
>                                                &entry->statelsn);
>
>    if (entry->invalid)
>        entry->invalid = false;
>
>     return entry;
>
> > I have another question on your patch which is why in some cases like
> > when we have not inserted in step-5 (as mentioned by you) the
> > following insertions are not sent. Is somehow we are setting the
> > pubactions as false in that case, if so, how?
>
> The reason is that the issue reported in this thread occurs - when we have the walsender process running,
RelationSyncCacheis initialized, we inserted some data into the table that's sent to the subscriber and the table is
dropped,we miss to set the pubactions to false in get_rel_sync_entry, though the cache entries have been invalidated. 
>
> In some cases, it works properly because the old walsender process was stopped and when a new walsender is started,
thenthe cache RelationSyncCache gets initialized again and the pubactions will be set to false in get_rel_sync_entry. 
>

Why is walsender process was getting stopped in one case but not in another?

--
With Regards,
Amit Kapila.



On Wed, 13 Jan 2021 at 13:26, Amit Kapila wrote:
> On Tue, Jan 12, 2021 at 4:59 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
>>
>> On Tue, Jan 12, 2021 at 12:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>> > > Here's my analysis:
>> > > 1) in the publisher, alter publication drop table successfully
>> > > removes(PublicationDropTables) the table from the catalogue
>> > > pg_publication_rel
>> > > 2) in the subscriber, alter subscription refresh publication
>> > > successfully removes the table from the catalogue pg_subscription_rel
>> > > (AlterSubscription_refresh->RemoveSubscriptionRel)
>> > > so far so good
>> > >
>> >
>> > Here, it should register the worker to stop on commit, and then on
>> > commit it should call AtEOXact_ApplyLauncher to stop the apply worker.
>> > Once the apply worker is stopped, the corresponding WALSender will
>> > also be stopped. Something here is not happening as per expected
>> > behavior.
>>
>> On the subscriber, an entry for worker stop is created in AlterSubscription_refresh -->
logicalrep_worker_stop_at_commit.At the end of txn, in AtEOXact_ApplyLauncher, we try to stop that worker, but it
cannotbe stopped because logicalrep_worker_find returns null (AtEOXact_ApplyLauncher --> logicalrep_worker_stop -->
logicalrep_worker_find).The worker entry for that subscriber is having relid as 0 [1], due to which the following if
conditionwill not be hit. The apply worker on the subscriber related to the subscription on which refresh publication
wasrun is not closed. It looks like relid 0 is valid because it will be applicable only during the table sync phase,
thecomment in the LogicalRepWorker structure says that.
 
>>
>> And also, I think, expecting the apply worker to be closed this way doesn't make sense because the apply worker is a
per-subscriptionbase, and the subscription can have other tables too.
 
>>
>
> Okay, that makes sense. As responded to Li Japin, let's focus on
> figuring out why we are sending the changes from the publisher node in
> some cases and not in other cases.

After some analysis, I find that the dropped tables always replicate to subscriber.
The difference is that if we drop the table from publication and refresh
publication (on subscriber), the LogicalRepRelMapEntry in should_apply_changes_for_rel()
set state to SUBREL_STATE_UNKNOWN.

(gdb) p *rel
$2 = {remoterel = {remoteid = 16410, nspname = 0x5564fb0177c0 "public",
    relname = 0x5564fb0177a0 "t1", natts = 1, attnames = 0x5564fb0177e0, atttyps = 0x5564fb017780,
    replident = 100 'd', relkind = 0 '\000', attkeys = 0x0}, localrelvalid = true,
  localreloid = 16412, localrel = 0x7f78705da1b8, attrmap = 0x5564fb017800, updatable = false,
  *state = 0 '\000'*, statelsn = 0}

If we insert data between drop table from publication and refresh publication, the
LogicalRepRelMapEntry state is always SUBREL_STATE_READY.

(gdb) p *rel
$2 = {remoterel = {remoteid = 16410, nspname = 0x5564fb0177c0 "public",
    relname = 0x5564fb0177a0 "t1", natts = 1, attnames = 0x5564fb0177e0, atttyps = 0x5564fb017780,
    replident = 100 'd', relkind = 0 '\000', attkeys = 0x0}, localrelvalid = true,
  localreloid = 16412, localrel = 0x7f78705d9d38, attrmap = 0x5564fb017800, updatable = false,
  *state = 114 'r'*, statelsn = 23545672}

I will dig why the state of LogicalRepRelMapEntry doesn't change in second case.

Any suggestion is welcome!

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.



On Wed, Jan 13, 2021 at 11:27 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Jan 13, 2021 at 11:08 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Wed, Jan 13, 2021 at 10:33 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Tue, Jan 12, 2021 at 5:23 PM japin <japinli@hotmail.com> wrote:
> > > >
> > > > On Tue, 12 Jan 2021 at 19:32, Bharath Rupireddy wrote:
> > > > > On Tue, Jan 12, 2021 at 4:47 PM Li Japin <japinli@hotmail.com> wrote:
> > > > >> IIUC the logical replication only replicate the tables in publication, I think
> > > > >> when the tables that aren't in publication should not be replicated.
> > > > >>
> > > > >> Attached the patch that fixes it.  Thought?
> > > > >
> > > > > With that change, we don't get the behaviour that's stated in the
> > > > > document - "The ADD TABLE and DROP TABLE clauses will add and remove
> > > > > one or more tables from the publication. Note that adding tables to a
> > > > > publication that is already subscribed to will require a ALTER
> > > > > SUBSCRIPTION ... REFRESH PUBLICATION action on the subscribing side in
> > > > > order to become effective" -
> > > > > https://www.postgresql.org/docs/devel/sql-alterpublication.html.
> > > > >
> > > >
> > > > The documentation only emphasize adding tables to a publication, not
> > > > include dropping tables from a publication.
> > > >
> > >
> > > Right and I think that is ensured by the subscriber by calling
> > > should_apply_changes_for_rel() which won't return true unless the
> > > newly added relation is not synced via Refresh Publication. So, this
> > > means with or without this patch we should be sending the changes of
> > > the newly published table from the publisher?
> >
> > Oh, my bad, alter subscription...refresh publication is required only when the tables are added to the publisher.
Patchby japin makes the walsender process to stop sending the data to the subscriber/apply worker. The patch is based
onthe idea of looking at the PUBLICATIONRELMAP in get_rel_sync_entry when the entries have been invalidated in
rel_sync_cache_publication_cbbecause of alter publication...drop table. 
> > ,
> > When the alter subscription...refresh publication is run on the subscriber, the SUBSCRIPTIONRELMAP catalogue gets
invalidatedbut the corresponding cache entries in the LogicalRepRelMap which is used by logicalrep_rel_open are not
invalidated.LogicalRepRelMap is used to know the relations that are associated with the subscription. But we seem to
havenot taken care of invalidating those entries, though we have the invalidation callback
invalidate_syncing_table_statesregistered for SUBSCRIPTIONRELMAP in ApplyWorkerMain. So, we miss to work on updating
theentries in LogicalRepRelMap. 
> >
> > IMO, the ideal way to fix this issue is 1) stop the walsender sending the changes to dropped tables, for this japin
patchworks 2) we must mark all the LogicalRepRelMap entries as invalid in invalidate_syncing_table_states so that in
thenext logicalrep_rel_open, if the entry is invalidated, then we have to call GetSubscriptionRelState to get the
lateststate, as shown in [1]. Likewise, we might also have to mark the cache entries invalid in  subscription_change_cb
whichis invalidation callback for pg_subscription 
> >
>
> Is the second point you described here is related to the original
> bug-reported or will it cause some other problem?

It can cause some other problems. I found this while investigating
from the subscriber perspective why the subscriber is accepting the
inserts even though the relation is removed from the
pg_subscription_rel catalogue after refresh publication. I ended up
finding that the cache entries are not being invalidated in
invalidate_syncing_table_states.

Having said that, the patch proposed by japin is enough to solve the
bug reported here.

While we are fixing the bug, I thought it's better to fix this as
well, maybe as a 0002 patch? If okay, I can work on the patch and post
it in a separate thread?

> > Thoughts?
> >
> > [1] -
> >     if (entry->state != SUBREL_STATE_READY || entry->invalid)
> >         entry->state = GetSubscriptionRelState(MySubscription->oid,
> >                                                entry->localreloid,
> >                                                &entry->statelsn);
> >
> >    if (entry->invalid)
> >        entry->invalid = false;
> >
> >     return entry;
> >
> > > I have another question on your patch which is why in some cases like
> > > when we have not inserted in step-5 (as mentioned by you) the
> > > following insertions are not sent. Is somehow we are setting the
> > > pubactions as false in that case, if so, how?
> >
> > The reason is that the issue reported in this thread occurs - when we have the walsender process running,
RelationSyncCacheis initialized, we inserted some data into the table that's sent to the subscriber and the table is
dropped,we miss to set the pubactions to false in get_rel_sync_entry, though the cache entries have been invalidated. 
> >
> > In some cases, it works properly because the old walsender process was stopped and when a new walsender is started,
thenthe cache RelationSyncCache gets initialized again and the pubactions will be set to false in get_rel_sync_entry. 
> >
>
> Why is walsender process was getting stopped in one case but not in another?

Both walsender and logical replication apply workers are getting
stopped because of the default values of 60s for wal_sender_timeout
and wal_receiver_timeout respectively. To analyze further, I increased
both timeouts to 600s.

Here's what happening:

Note that for the sake of simplicity, I'm skipping the create
publication, subscription, initial insertion statements, I'm starting
from alter publication ... drop table statement.

case 1 - issue reported in this thread is observed:
1) on publisher - alter publication ... drop table t1; and insert into
t1 values(67);   Though the table is dropped from the publication, the
walsender process publishes the inserts, which will be fixed by the
patch proposed by japin.

2) on subscriber - on the insert into t1(1) on the publisher,
logicalrep_relmap_update gets called in the subscriber because the
publisher informs the subscriber that it has dropped the table from
the publication. In logicalrep_relmap_update, the LogicalRepRelMap
cache entry corresponding to the dropped relation is reset. Since the
alter subscription ... refresh publication is not yet run, the insert
is taken to the target table and the LogicalRepRelMap cache entry for
that table is updated in logicalrep_rel_open, it's state is read from
the catalogues using GetSubscriptionRelState, because of the entry
reset in logicalrep_relmap_update, entry->state is
SUBREL_STATE_UNKNOWN. After GetSubscriptionRelState, entry->state
becomes SUBREL_STATE_READY
    if (entry->state != SUBREL_STATE_READY)
        entry->state = GetSubscriptionRelState(MySubscription->oid,
                                               entry->localreloid,
                                               &entry->statelsn);

3) on subscriber - run alter subscription ... refresh publication, the
dropped relation entry is removed from the pg_subscription_rel
catalogue and invalidate_syncing_table_states gets called. But note
that we have not invalidated the LogicalRepRelMap. The
LogicalRepRelMap cache entry for the table remains what we have set in
(2) i.e. entry->state is SUBREL_STATE_READY.

4) on publisher - insert into t1 values(68); as mentioned in (1),
walsender process keeps sending the inserts.

5) on subscriber - apply_handle_insert is called and the
logicalrep_rel_open will not call GetSubscriptionRelState to get the
new catalogue entry for pg_subscription_rel (3), because the
entry->state is still SUBREL_STATE_READY which was set (2), so it ends
up in applying the incoming inserts, until the GetSubscriptionRelState
is called for that entry which happens either the apply worker stops
and restarts due to timeout or another change to the publication
happens in the publisher so that logicalrep_relmap_update is called on
the subscriber. If we had marked all the LogicalRepRelMap cache
entries as invalid in the invalidate_syncing_table_states callback as
pointed out by me in the earlier mail, then this problem will be
solved.

case 2 - issue is not observed:
1) on publisher - alter publication ... drop table t1; no inserts into
the table t1, subscriber will not receive logicalrep_relmap_update
yet.

2) on subscriber - run alter subscription ... refresh publication, the
dropped relation entry is removed from the pg_subscription_rel
catalogue and invalidate_syncing_table_states gets called. But note
that we have not invalidated the LogicalRepRelMap. The
LogicalRepRelMap cache entry for the table remains the same.

3) on publisher - insert into t1 values(67);   Though the table is
dropped from the publication (1), the walsender process publishes the
inserts, which will be fixed by the patch proposed by japin.

4) on subscriber - on the insert into t1(3) on the publisher,
logicalrep_relmap_update gets called in the subscriber because the
publisher informs the subscriber that it has dropped the table from
the publication. In logicalrep_relmap_update, the LogicalRepRelMap
cache entry corresponding to the dropped relation is reset, because of
which the next logicalrep_rel_open will call GetSubscriptionRelState
(as entry->state is SUBREL_STATE_UNKNOWN due to the reset in
logicalrep_relmap_update). GetSubscriptionRelState will return
SUBREL_STATE_UNKNOWN, because it doesn't find the tuple in the updated
pg_subscription_rel catalogue because of the alter subscription ...
refresh publication would have removed tuple related to the table from
the pg_subscription_rel. So, the inserts are ignored because the
should_apply_changes_for_rel returns false after the
logicalrep_rel_open in apply_handle_insert.
    /* Try finding the mapping. */
    tup = SearchSysCache2(SUBSCRIPTIONRELMAP,
                          ObjectIdGetDatum(relid),
                          ObjectIdGetDatum(subid));

    if (!HeapTupleIsValid(tup))
    {
        *sublsn = InvalidXLogRecPtr;
        return SUBREL_STATE_UNKNOWN;
    }

5) Here on, all further inserts into the publication table are ignored
by the subscriber because of the same reason (4). So, the issue
reported in this thread doesn't occur.

I'm sorry for the huge write up. I hope I clarified the point this time.

In summary, I feel we need to fix the publisher sending the inserts
even though the table is dropped from the publication, that is the
patch patch proposed by japin. This solves the bug reported in this
thread.

And also, it's good to have the LogicalRepRelMap invalidation fix as a
0002 patch in invalidate_syncing_table_states, subscription_change_cb
and logicalrep_rel_open as proposed by me.

Thoughts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



On Wed, 13 Jan 2021 at 16:49, Bharath Rupireddy wrote:
> On Wed, Jan 13, 2021 at 11:27 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> On Wed, Jan 13, 2021 at 11:08 AM Bharath Rupireddy
>> <bharath.rupireddyforpostgres@gmail.com> wrote:
>> >
>> > On Wed, Jan 13, 2021 at 10:33 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>> > >
>> > > On Tue, Jan 12, 2021 at 5:23 PM japin <japinli@hotmail.com> wrote:
>> > > >
>> > > > On Tue, 12 Jan 2021 at 19:32, Bharath Rupireddy wrote:
>> > > > > On Tue, Jan 12, 2021 at 4:47 PM Li Japin <japinli@hotmail.com> wrote:
>> > > > >> IIUC the logical replication only replicate the tables in publication, I think
>> > > > >> when the tables that aren't in publication should not be replicated.
>> > > > >>
>> > > > >> Attached the patch that fixes it.  Thought?
>> > > > >
>> > > > > With that change, we don't get the behaviour that's stated in the
>> > > > > document - "The ADD TABLE and DROP TABLE clauses will add and remove
>> > > > > one or more tables from the publication. Note that adding tables to a
>> > > > > publication that is already subscribed to will require a ALTER
>> > > > > SUBSCRIPTION ... REFRESH PUBLICATION action on the subscribing side in
>> > > > > order to become effective" -
>> > > > > https://www.postgresql.org/docs/devel/sql-alterpublication.html.
>> > > > >
>> > > >
>> > > > The documentation only emphasize adding tables to a publication, not
>> > > > include dropping tables from a publication.
>> > > >
>> > >
>> > > Right and I think that is ensured by the subscriber by calling
>> > > should_apply_changes_for_rel() which won't return true unless the
>> > > newly added relation is not synced via Refresh Publication. So, this
>> > > means with or without this patch we should be sending the changes of
>> > > the newly published table from the publisher?
>> >
>> > Oh, my bad, alter subscription...refresh publication is required only when the tables are added to the publisher.
Patchby japin makes the walsender process to stop sending the data to the subscriber/apply worker. The patch is based
onthe idea of looking at the PUBLICATIONRELMAP in get_rel_sync_entry when the entries have been invalidated in
rel_sync_cache_publication_cbbecause of alter publication...drop table.
 
>> > ,
>> > When the alter subscription...refresh publication is run on the subscriber, the SUBSCRIPTIONRELMAP catalogue gets
invalidatedbut the corresponding cache entries in the LogicalRepRelMap which is used by logicalrep_rel_open are not
invalidated.LogicalRepRelMap is used to know the relations that are associated with the subscription. But we seem to
havenot taken care of invalidating those entries, though we have the invalidation callback
invalidate_syncing_table_statesregistered for SUBSCRIPTIONRELMAP in ApplyWorkerMain. So, we miss to work on updating
theentries in LogicalRepRelMap.
 
>> >
>> > IMO, the ideal way to fix this issue is 1) stop the walsender sending the changes to dropped tables, for this
japinpatch works 2) we must mark all the LogicalRepRelMap entries as invalid in invalidate_syncing_table_states so that
inthe next logicalrep_rel_open, if the entry is invalidated, then we have to call GetSubscriptionRelState to get the
lateststate, as shown in [1]. Likewise, we might also have to mark the cache entries invalid in  subscription_change_cb
whichis invalidation callback for pg_subscription
 
>> >
>>
>> Is the second point you described here is related to the original
>> bug-reported or will it cause some other problem?
>
> It can cause some other problems. I found this while investigating
> from the subscriber perspective why the subscriber is accepting the
> inserts even though the relation is removed from the
> pg_subscription_rel catalogue after refresh publication. I ended up
> finding that the cache entries are not being invalidated in
> invalidate_syncing_table_states.
>
> Having said that, the patch proposed by japin is enough to solve the
> bug reported here.
>
> While we are fixing the bug, I thought it's better to fix this as
> well, maybe as a 0002 patch? If okay, I can work on the patch and post
> it in a separate thread?
>
>> > Thoughts?
>> >
>> > [1] -
>> >     if (entry->state != SUBREL_STATE_READY || entry->invalid)
>> >         entry->state = GetSubscriptionRelState(MySubscription->oid,
>> >                                                entry->localreloid,
>> >                                                &entry->statelsn);
>> >
>> >    if (entry->invalid)
>> >        entry->invalid = false;
>> >
>> >     return entry;
>> >
>> > > I have another question on your patch which is why in some cases like
>> > > when we have not inserted in step-5 (as mentioned by you) the
>> > > following insertions are not sent. Is somehow we are setting the
>> > > pubactions as false in that case, if so, how?
>> >
>> > The reason is that the issue reported in this thread occurs - when we have the walsender process running,
RelationSyncCacheis initialized, we inserted some data into the table that's sent to the subscriber and the table is
dropped,we miss to set the pubactions to false in get_rel_sync_entry, though the cache entries have been invalidated.
 
>> >
>> > In some cases, it works properly because the old walsender process was stopped and when a new walsender is
started,then the cache RelationSyncCache gets initialized again and the pubactions will be set to false in
get_rel_sync_entry.
>> >
>>
>> Why is walsender process was getting stopped in one case but not in another?
>
> Both walsender and logical replication apply workers are getting
> stopped because of the default values of 60s for wal_sender_timeout
> and wal_receiver_timeout respectively. To analyze further, I increased
> both timeouts to 600s.
>
> Here's what happening:
>
> Note that for the sake of simplicity, I'm skipping the create
> publication, subscription, initial insertion statements, I'm starting
> from alter publication ... drop table statement.
>
> case 1 - issue reported in this thread is observed:
> 1) on publisher - alter publication ... drop table t1; and insert into
> t1 values(67);   Though the table is dropped from the publication, the
> walsender process publishes the inserts, which will be fixed by the
> patch proposed by japin.
>
> 2) on subscriber - on the insert into t1(1) on the publisher,
> logicalrep_relmap_update gets called in the subscriber because the
> publisher informs the subscriber that it has dropped the table from
> the publication. In logicalrep_relmap_update, the LogicalRepRelMap
> cache entry corresponding to the dropped relation is reset. Since the
> alter subscription ... refresh publication is not yet run, the insert
> is taken to the target table and the LogicalRepRelMap cache entry for
> that table is updated in logicalrep_rel_open, it's state is read from
> the catalogues using GetSubscriptionRelState, because of the entry
> reset in logicalrep_relmap_update, entry->state is
> SUBREL_STATE_UNKNOWN. After GetSubscriptionRelState, entry->state
> becomes SUBREL_STATE_READY
>     if (entry->state != SUBREL_STATE_READY)
>         entry->state = GetSubscriptionRelState(MySubscription->oid,
>                                                entry->localreloid,
>                                                &entry->statelsn);
>
> 3) on subscriber - run alter subscription ... refresh publication, the
> dropped relation entry is removed from the pg_subscription_rel
> catalogue and invalidate_syncing_table_states gets called. But note
> that we have not invalidated the LogicalRepRelMap. The
> LogicalRepRelMap cache entry for the table remains what we have set in
> (2) i.e. entry->state is SUBREL_STATE_READY.
>
> 4) on publisher - insert into t1 values(68); as mentioned in (1),
> walsender process keeps sending the inserts.
>
> 5) on subscriber - apply_handle_insert is called and the
> logicalrep_rel_open will not call GetSubscriptionRelState to get the
> new catalogue entry for pg_subscription_rel (3), because the
> entry->state is still SUBREL_STATE_READY which was set (2), so it ends
> up in applying the incoming inserts, until the GetSubscriptionRelState
> is called for that entry which happens either the apply worker stops
> and restarts due to timeout or another change to the publication
> happens in the publisher so that logicalrep_relmap_update is called on
> the subscriber. If we had marked all the LogicalRepRelMap cache
> entries as invalid in the invalidate_syncing_table_states callback as
> pointed out by me in the earlier mail, then this problem will be
> solved.
>
> case 2 - issue is not observed:
> 1) on publisher - alter publication ... drop table t1; no inserts into
> the table t1, subscriber will not receive logicalrep_relmap_update
> yet.
>
> 2) on subscriber - run alter subscription ... refresh publication, the
> dropped relation entry is removed from the pg_subscription_rel
> catalogue and invalidate_syncing_table_states gets called. But note
> that we have not invalidated the LogicalRepRelMap. The
> LogicalRepRelMap cache entry for the table remains the same.
>
> 3) on publisher - insert into t1 values(67);   Though the table is
> dropped from the publication (1), the walsender process publishes the
> inserts, which will be fixed by the patch proposed by japin.
>
> 4) on subscriber - on the insert into t1(3) on the publisher,
> logicalrep_relmap_update gets called in the subscriber because the
> publisher informs the subscriber that it has dropped the table from
> the publication. In logicalrep_relmap_update, the LogicalRepRelMap
> cache entry corresponding to the dropped relation is reset, because of
> which the next logicalrep_rel_open will call GetSubscriptionRelState
> (as entry->state is SUBREL_STATE_UNKNOWN due to the reset in
> logicalrep_relmap_update). GetSubscriptionRelState will return
> SUBREL_STATE_UNKNOWN, because it doesn't find the tuple in the updated
> pg_subscription_rel catalogue because of the alter subscription ...
> refresh publication would have removed tuple related to the table from
> the pg_subscription_rel. So, the inserts are ignored because the
> should_apply_changes_for_rel returns false after the
> logicalrep_rel_open in apply_handle_insert.
>     /* Try finding the mapping. */
>     tup = SearchSysCache2(SUBSCRIPTIONRELMAP,
>                           ObjectIdGetDatum(relid),
>                           ObjectIdGetDatum(subid));
>
>     if (!HeapTupleIsValid(tup))
>     {
>         *sublsn = InvalidXLogRecPtr;
>         return SUBREL_STATE_UNKNOWN;
>     }
>
> 5) Here on, all further inserts into the publication table are ignored
> by the subscriber because of the same reason (4). So, the issue
> reported in this thread doesn't occur.
>
> I'm sorry for the huge write up. I hope I clarified the point this time.
>

Yes! Very clearly.

> In summary, I feel we need to fix the publisher sending the inserts
> even though the table is dropped from the publication, that is the
> patch patch proposed by japin. This solves the bug reported in this
> thread.
>
> And also, it's good to have the LogicalRepRelMap invalidation fix as a
> 0002 patch in invalidate_syncing_table_states, subscription_change_cb
> and logicalrep_rel_open as proposed by me.
>
> Thoughts?
>

I think invalidate the LogicalRepRelMap is necessary.  If the table isn't in
subscription, can we remove the LogicalRepRelMapEntry from LogicalRepRelMap?

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.



On Tue, Jan 12, 2021 at 4:47 PM Li Japin <japinli@hotmail.com> wrote:
>
>
> On Jan 12, 2021, at 5:47 PM, japin <japinli@hotmail.com> wrote:
>
>
> On Tue, 12 Jan 2021 at 14:38, Amit Kapila wrote:
>
> On Tue, Jan 12, 2021 at 11:39 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
>
>
> On Tue, Jan 12, 2021 at 9:05 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
>
> On Mon, Jan 11, 2021 at 6:51 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
>
>
> Hi,
>
> While providing thoughts on the design in [1], I found a strange
> behaviour with the $subject. The use case is shown below as a sequence
> of steps that need to be run on publisher and subscriber to arrive at
> the strange behaviour.  In step 5, the table is dropped from the
> publication and in step 6, the refresh publication is run on the
> subscriber, from here onwards, the expectation is that no further
> inserts into the publisher table have to be replicated on to the
> subscriber, but the opposite happens i.e. the inserts are still
> replicated to the subscriber. ISTM as a bug. Let me know if I'm
> missing anything.
>
>
> Did you try to investigate what's going on? Can you please check what
> is the behavior if, after step-5, you restart the subscriber and
> separately try creating a new subscription (maybe on a different
> server) for that publication after step-5 and see if that allows the
> relation to be replicated? AFAIU, in AlterSubscription_refresh, we
> remove such dropped rels and stop their corresponding apply workers
> which should stop the further replication of such relations but that
> doesn't seem to be happening in your case.
>
>
> Here's my analysis:
> 1) in the publisher, alter publication drop table successfully
> removes(PublicationDropTables) the table from the catalogue
> pg_publication_rel
> 2) in the subscriber, alter subscription refresh publication
> successfully removes the table from the catalogue pg_subscription_rel
> (AlterSubscription_refresh->RemoveSubscriptionRel)
> so far so good
>
>
> Here, it should register the worker to stop on commit, and then on
> commit it should call AtEOXact_ApplyLauncher to stop the apply worker.
> Once the apply worker is stopped, the corresponding WALSender will
> also be stopped. Something here is not happening as per expected
> behavior.
>
> 3) after the insertion into the table in the publisher(remember that
> it's dropped from the publication in (1)), the walsender process is
> unable detect that the table has been dropped from the publication
> i.e. it doesn't look at the pg_publication_rel catalogue or some
> other, but it only does is_publishable_relation() check which returns
> true in pgoutput_change(). Maybe the walsender should look at the
> catalogue pg_publication_rel in is_publishable_relation()?
>
>
> We must be somewhere checking pg_publication_rel before sending the
> decoded change because otherwise, we would have sent the changes for
> the table which are not even part of this publication. I think you can
> try to create a separate table that is not part of the publication
> under test and see how the changes for that are filtered.
>
>
> I find that pgoutput_change() use a hash table RelationSyncCache to
> cache the publication info for tables.  When we drop tables from the
> publication, the RelationSyncCache doesn't updated, so it replicate
> records.
>
>
> IIUC the logical replication only replicate the tables in publication, I think
> when the tables that aren't in publication should not be replicated.
>
> Attached the patch that fixes it.  Thought?
>

Instead of doing this, I would expect that the RelationSyncCache entry
should be removed when the relation is dropped from the publication.
So if that is done then it will reload the publication and then it
will not find that that relation as published and it will ignore the
changes.  But the patch doesn't seem to be exactly on that line.  Am I
missing something here?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



On Wed, 13 Jan 2021 at 17:23, Dilip Kumar wrote:
> On Tue, Jan 12, 2021 at 4:47 PM Li Japin <japinli@hotmail.com> wrote:
>>
>>
>> On Jan 12, 2021, at 5:47 PM, japin <japinli@hotmail.com> wrote:
>>
>>
>> On Tue, 12 Jan 2021 at 14:38, Amit Kapila wrote:
>>
>> On Tue, Jan 12, 2021 at 11:39 AM Bharath Rupireddy
>> <bharath.rupireddyforpostgres@gmail.com> wrote:
>>
>>
>> On Tue, Jan 12, 2021 at 9:05 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>>
>> On Mon, Jan 11, 2021 at 6:51 PM Bharath Rupireddy
>> <bharath.rupireddyforpostgres@gmail.com> wrote:
>>
>>
>> Hi,
>>
>> While providing thoughts on the design in [1], I found a strange
>> behaviour with the $subject. The use case is shown below as a sequence
>> of steps that need to be run on publisher and subscriber to arrive at
>> the strange behaviour.  In step 5, the table is dropped from the
>> publication and in step 6, the refresh publication is run on the
>> subscriber, from here onwards, the expectation is that no further
>> inserts into the publisher table have to be replicated on to the
>> subscriber, but the opposite happens i.e. the inserts are still
>> replicated to the subscriber. ISTM as a bug. Let me know if I'm
>> missing anything.
>>
>>
>> Did you try to investigate what's going on? Can you please check what
>> is the behavior if, after step-5, you restart the subscriber and
>> separately try creating a new subscription (maybe on a different
>> server) for that publication after step-5 and see if that allows the
>> relation to be replicated? AFAIU, in AlterSubscription_refresh, we
>> remove such dropped rels and stop their corresponding apply workers
>> which should stop the further replication of such relations but that
>> doesn't seem to be happening in your case.
>>
>>
>> Here's my analysis:
>> 1) in the publisher, alter publication drop table successfully
>> removes(PublicationDropTables) the table from the catalogue
>> pg_publication_rel
>> 2) in the subscriber, alter subscription refresh publication
>> successfully removes the table from the catalogue pg_subscription_rel
>> (AlterSubscription_refresh->RemoveSubscriptionRel)
>> so far so good
>>
>>
>> Here, it should register the worker to stop on commit, and then on
>> commit it should call AtEOXact_ApplyLauncher to stop the apply worker.
>> Once the apply worker is stopped, the corresponding WALSender will
>> also be stopped. Something here is not happening as per expected
>> behavior.
>>
>> 3) after the insertion into the table in the publisher(remember that
>> it's dropped from the publication in (1)), the walsender process is
>> unable detect that the table has been dropped from the publication
>> i.e. it doesn't look at the pg_publication_rel catalogue or some
>> other, but it only does is_publishable_relation() check which returns
>> true in pgoutput_change(). Maybe the walsender should look at the
>> catalogue pg_publication_rel in is_publishable_relation()?
>>
>>
>> We must be somewhere checking pg_publication_rel before sending the
>> decoded change because otherwise, we would have sent the changes for
>> the table which are not even part of this publication. I think you can
>> try to create a separate table that is not part of the publication
>> under test and see how the changes for that are filtered.
>>
>>
>> I find that pgoutput_change() use a hash table RelationSyncCache to
>> cache the publication info for tables.  When we drop tables from the
>> publication, the RelationSyncCache doesn't updated, so it replicate
>> records.
>>
>>
>> IIUC the logical replication only replicate the tables in publication, I think
>> when the tables that aren't in publication should not be replicated.
>>
>> Attached the patch that fixes it.  Thought?
>>
>
> Instead of doing this, I would expect that the RelationSyncCache entry
> should be removed when the relation is dropped from the publication.
> So if that is done then it will reload the publication and then it
> will not find that that relation as published and it will ignore the
> changes.  But the patch doesn't seem to be exactly on that line.  Am I
> missing something here?

Even if we remove RelationSyncCache entry when the relation is dropped from
the publication, when the relation is change (insert/update), AFAIK it also
create an entry in RelationSyncCache, so removing the RelationSyncCache entry
is unnecessary.

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.



On Wed, Jan 13, 2021 at 2:53 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > IIUC the logical replication only replicate the tables in publication, I think
> > when the tables that aren't in publication should not be replicated.
> >
> > Attached the patch that fixes it.  Thought?
> >
>
> Instead of doing this, I would expect that the RelationSyncCache entry
> should be removed when the relation is dropped from the publication.
> So if that is done then it will reload the publication and then it
> will not find that that relation as published and it will ignore the
> changes.  But the patch doesn't seem to be exactly on that line.  Am I
> missing something here?

IIUC, it's not possible to remove the cache entry inside
rel_sync_cache_publication_cb, because we don't receive the relid of
the alter publication .. dropped relation in the invalidation
callback. See the below comment,

    /*
     * There is no way to find which entry in our cache the hash belongs to so
     * mark the whole cache as invalid.
     */
    hash_seq_init(&status, RelationSyncCache);
    while ((entry = (RelationSyncEntry *) hash_seq_search(&status)) != NULL)
        entry->replicate_valid = false;

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



On Wed, Jan 13, 2021 at 3:06 PM japin <japinli@hotmail.com> wrote:
>
>
> On Wed, 13 Jan 2021 at 17:23, Dilip Kumar wrote:
> > On Tue, Jan 12, 2021 at 4:47 PM Li Japin <japinli@hotmail.com> wrote:
> >>
> >>
> >> On Jan 12, 2021, at 5:47 PM, japin <japinli@hotmail.com> wrote:
> >>
> >>
> >> On Tue, 12 Jan 2021 at 14:38, Amit Kapila wrote:
> >>
> >> On Tue, Jan 12, 2021 at 11:39 AM Bharath Rupireddy
> >> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >>
> >>
> >> On Tue, Jan 12, 2021 at 9:05 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >>
> >>
> >> On Mon, Jan 11, 2021 at 6:51 PM Bharath Rupireddy
> >> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >>
> >>
> >> Hi,
> >>
> >> While providing thoughts on the design in [1], I found a strange
> >> behaviour with the $subject. The use case is shown below as a sequence
> >> of steps that need to be run on publisher and subscriber to arrive at
> >> the strange behaviour.  In step 5, the table is dropped from the
> >> publication and in step 6, the refresh publication is run on the
> >> subscriber, from here onwards, the expectation is that no further
> >> inserts into the publisher table have to be replicated on to the
> >> subscriber, but the opposite happens i.e. the inserts are still
> >> replicated to the subscriber. ISTM as a bug. Let me know if I'm
> >> missing anything.
> >>
> >>
> >> Did you try to investigate what's going on? Can you please check what
> >> is the behavior if, after step-5, you restart the subscriber and
> >> separately try creating a new subscription (maybe on a different
> >> server) for that publication after step-5 and see if that allows the
> >> relation to be replicated? AFAIU, in AlterSubscription_refresh, we
> >> remove such dropped rels and stop their corresponding apply workers
> >> which should stop the further replication of such relations but that
> >> doesn't seem to be happening in your case.
> >>
> >>
> >> Here's my analysis:
> >> 1) in the publisher, alter publication drop table successfully
> >> removes(PublicationDropTables) the table from the catalogue
> >> pg_publication_rel
> >> 2) in the subscriber, alter subscription refresh publication
> >> successfully removes the table from the catalogue pg_subscription_rel
> >> (AlterSubscription_refresh->RemoveSubscriptionRel)
> >> so far so good
> >>
> >>
> >> Here, it should register the worker to stop on commit, and then on
> >> commit it should call AtEOXact_ApplyLauncher to stop the apply worker.
> >> Once the apply worker is stopped, the corresponding WALSender will
> >> also be stopped. Something here is not happening as per expected
> >> behavior.
> >>
> >> 3) after the insertion into the table in the publisher(remember that
> >> it's dropped from the publication in (1)), the walsender process is
> >> unable detect that the table has been dropped from the publication
> >> i.e. it doesn't look at the pg_publication_rel catalogue or some
> >> other, but it only does is_publishable_relation() check which returns
> >> true in pgoutput_change(). Maybe the walsender should look at the
> >> catalogue pg_publication_rel in is_publishable_relation()?
> >>
> >>
> >> We must be somewhere checking pg_publication_rel before sending the
> >> decoded change because otherwise, we would have sent the changes for
> >> the table which are not even part of this publication. I think you can
> >> try to create a separate table that is not part of the publication
> >> under test and see how the changes for that are filtered.
> >>
> >>
> >> I find that pgoutput_change() use a hash table RelationSyncCache to
> >> cache the publication info for tables.  When we drop tables from the
> >> publication, the RelationSyncCache doesn't updated, so it replicate
> >> records.
> >>
> >>
> >> IIUC the logical replication only replicate the tables in publication, I think
> >> when the tables that aren't in publication should not be replicated.
> >>
> >> Attached the patch that fixes it.  Thought?
> >>
> >
> > Instead of doing this, I would expect that the RelationSyncCache entry
> > should be removed when the relation is dropped from the publication.
> > So if that is done then it will reload the publication and then it
> > will not find that that relation as published and it will ignore the
> > changes.  But the patch doesn't seem to be exactly on that line.  Am I
> > missing something here?
>
> Even if we remove RelationSyncCache entry when the relation is dropped from
> the publication, when the relation is change (insert/update), AFAIK it also
> create an entry in RelationSyncCache, so removing the RelationSyncCache entry
> is unnecessary.
>

That is true but the entry->replicate_valid will be false and that
will be only set to true in get_rel_sync_entry.  So now based on the
new publication the pubaction will be updated ?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



On Wed, Jan 13, 2021 at 3:08 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Wed, Jan 13, 2021 at 2:53 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > IIUC the logical replication only replicate the tables in publication, I think
> > > when the tables that aren't in publication should not be replicated.
> > >
> > > Attached the patch that fixes it.  Thought?
> > >
> >
> > Instead of doing this, I would expect that the RelationSyncCache entry
> > should be removed when the relation is dropped from the publication.
> > So if that is done then it will reload the publication and then it
> > will not find that that relation as published and it will ignore the
> > changes.  But the patch doesn't seem to be exactly on that line.  Am I
> > missing something here?
>
> IIUC, it's not possible to remove the cache entry inside
> rel_sync_cache_publication_cb, because we don't receive the relid of
> the alter publication .. dropped relation in the invalidation
> callback. See the below comment,

Hmm, yeah because nothing changed to the relation the change is for
the publication so the invalidation is not registered for the relation
entry.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



On Wed, Jan 13, 2021 at 2:36 PM japin <japinli@hotmail.com> wrote:
> > In summary, I feel we need to fix the publisher sending the inserts
> > even though the table is dropped from the publication, that is the
> > patch patch proposed by japin. This solves the bug reported in this
> > thread.
> >
> > And also, it's good to have the LogicalRepRelMap invalidation fix as a
> > 0002 patch in invalidate_syncing_table_states, subscription_change_cb
> > and logicalrep_rel_open as proposed by me.
> >
> > Thoughts?
> >
>
> I think invalidate the LogicalRepRelMap is necessary.  If the table isn't in
> subscription, can we remove the LogicalRepRelMapEntry from LogicalRepRelMap?

IIUC, it's not possible to know the relid of the alter
publication...dropped table in the invalidation callbacks
invalidate_syncing_table_states or subscription_change_cb, so it's
better to set state of all the entries to SUBREL_STATE_UNKNOWN, so
that, in the next logicalrep_rel_open call the function
GetSubscriptionRelState will be called and the pg_subscription_rel
will be read freshly.

This is inline with the reasoning specified at [1].

[1] - https://www.postgresql.org/message-id/CAFiTN-udwuc_h0TaFrSEKb-Yo1vVvkjz9TDRw7VE3P-KiPSGbQ%40mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



On Wed, Jan 13, 2021 at 3:08 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Wed, Jan 13, 2021 at 2:53 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > IIUC the logical replication only replicate the tables in publication, I think
> > > when the tables that aren't in publication should not be replicated.
> > >
> > > Attached the patch that fixes it.  Thought?
> > >
> >
> > Instead of doing this, I would expect that the RelationSyncCache entry
> > should be removed when the relation is dropped from the publication.
> > So if that is done then it will reload the publication and then it
> > will not find that that relation as published and it will ignore the
> > changes.  But the patch doesn't seem to be exactly on that line.  Am I
> > missing something here?
>
> IIUC, it's not possible to remove the cache entry inside
> rel_sync_cache_publication_cb, because we don't receive the relid of
> the alter publication .. dropped relation in the invalidation
> callback. See the below comment,
>
>     /*
>      * There is no way to find which entry in our cache the hash belongs to so
>      * mark the whole cache as invalid.
>      */
>     hash_seq_init(&status, RelationSyncCache);
>     while ((entry = (RelationSyncEntry *) hash_seq_search(&status)) != NULL)
>         entry->replicate_valid = false;

So basically your point is that all the RelationSyncEntry are getting
invalidated right?  And if so then in get_rel_sync_entry we will just
check the updated publication no?
I am referring to the below code.  So ideally this should be already
working if the relcaches are getting invalidated?

get_rel_sync_entry
{
entry = (RelationSyncEntry *) hash_search(RelationSyncCache,
(void *) &relid,
HASH_ENTER, &found);
Assert(entry != NULL);

.....

/* Validate the entry */
if (!entry->replicate_valid)
{
List *pubids = GetRelationPublications(relid);
ListCell *lc;
Oid publish_as_relid = relid;
... here we will rechek the pulication
}

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



On Wed, Jan 13, 2021 at 5:15 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Wed, Jan 13, 2021 at 3:08 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Wed, Jan 13, 2021 at 2:53 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > IIUC the logical replication only replicate the tables in publication, I think
> > > > when the tables that aren't in publication should not be replicated.
> > > >
> > > > Attached the patch that fixes it.  Thought?
> > > >
> > >
> > > Instead of doing this, I would expect that the RelationSyncCache entry
> > > should be removed when the relation is dropped from the publication.
> > > So if that is done then it will reload the publication and then it
> > > will not find that that relation as published and it will ignore the
> > > changes.  But the patch doesn't seem to be exactly on that line.  Am I
> > > missing something here?
> >
> > IIUC, it's not possible to remove the cache entry inside
> > rel_sync_cache_publication_cb, because we don't receive the relid of
> > the alter publication .. dropped relation in the invalidation
> > callback. See the below comment,
> >
> >     /*
> >      * There is no way to find which entry in our cache the hash belongs to so
> >      * mark the whole cache as invalid.
> >      */
> >     hash_seq_init(&status, RelationSyncCache);
> >     while ((entry = (RelationSyncEntry *) hash_seq_search(&status)) != NULL)
> >         entry->replicate_valid = false;
>
> So basically your point is that all the RelationSyncEntry are getting
> invalidated right?  And if so then in get_rel_sync_entry we will just
> check the updated publication no?
> I am referring to the below code.  So ideally this should be already
> working if the relcaches are getting invalidated?
>
> get_rel_sync_entry
> {
> entry = (RelationSyncEntry *) hash_search(RelationSyncCache,
> (void *) &relid,
> HASH_ENTER, &found);
> Assert(entry != NULL);
>
> .....
>
> /* Validate the entry */
> if (!entry->replicate_valid)
> {
> List *pubids = GetRelationPublications(relid);
> ListCell *lc;
> Oid publish_as_relid = relid;
> ... here we will rechek the pulication
> }

Yes, after the entries got invalidated in
rel_sync_cache_publication_cb, we get to the if
(!entry->replicate_valid) part of the code in get_rel_sync_entry, but
in below point, we don't mark the pubactions to false, which were
earlier set to true. Because of this, we will still have pubactions to
true because of which pgoutput_change publishes the changes.

            /*
             * Don't publish changes for partitioned tables, because
             * publishing those of its partitions suffices, unless partition
             * changes won't be published due to pubviaroot being set.
             */
            if (publish &&
                (relkind != RELKIND_PARTITIONED_TABLE || pub->pubviaroot))
            {
                entry->pubactions.pubinsert |= pub->pubactions.pubinsert;
                entry->pubactions.pubupdate |= pub->pubactions.pubupdate;
                entry->pubactions.pubdelete |= pub->pubactions.pubdelete;
                entry->pubactions.pubtruncate |= pub->pubactions.pubtruncate;
            }

            if (entry->pubactions.pubinsert && entry->pubactions.pubupdate &&
                entry->pubactions.pubdelete && entry->pubactions.pubtruncate)
                break;

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



On Wed, Jan 13, 2021 at 3:16 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Wed, Jan 13, 2021 at 2:36 PM japin <japinli@hotmail.com> wrote:
> > > In summary, I feel we need to fix the publisher sending the inserts
> > > even though the table is dropped from the publication, that is the
> > > patch patch proposed by japin. This solves the bug reported in this
> > > thread.
> > >
> > > And also, it's good to have the LogicalRepRelMap invalidation fix as a
> > > 0002 patch in invalidate_syncing_table_states, subscription_change_cb
> > > and logicalrep_rel_open as proposed by me.
> > >
> > > Thoughts?
> > >
> >
> > I think invalidate the LogicalRepRelMap is necessary.  If the table isn't in
> > subscription, can we remove the LogicalRepRelMapEntry from LogicalRepRelMap?
>
> IIUC, it's not possible to know the relid of the alter
> publication...dropped table in the invalidation callbacks
> invalidate_syncing_table_states or subscription_change_cb, so it's
> better to set state of all the entries to SUBREL_STATE_UNKNOWN, so
> that, in the next logicalrep_rel_open call the function
> GetSubscriptionRelState will be called and the pg_subscription_rel
> will be read freshly.
>
> This is inline with the reasoning specified at [1].
>
> [1] - https://www.postgresql.org/message-id/CAFiTN-udwuc_h0TaFrSEKb-Yo1vVvkjz9TDRw7VE3P-KiPSGbQ%40mail.gmail.com

Attaching following two patches:

0001 - Makes publisher to not publish the changes for the alter
publication ... dropped tables. Original patch is by japin, I added
comments, changed the function name and adjusted the code a bit.
0002 - Invalidates the relation map cache in subscriber syscache
invalidation callbacks. Currently, I'm setting entry->state to
SUBREL_STATE_UNKNOWN in the new invalidation function that's
introduced logicalrep_relmap_invalidate, so that in the next call to
logicalrep_rel_open the entry's state will be read from the system
catalogues using GetSubscriptionRelState. If this is sound's bit
strange, I can add a boolean invalid to LogicalRepRelMapEntry
structure and set that here and in logicalrep_rel_open, I can have
something like:
    if (entry->state != SUBREL_STATE_READY || entry->invalid)
        entry->state = GetSubscriptionRelState(MySubscription->oid,
                                               entry->localreloid,
                                               &entry->statelsn);

   if (entry->invalid)
        entry->invalid = false;

make check and make check-world passes on both patches.

If okay with the fixes, I will try to add a test case and also a cf
entry so that these patches get a chance to be tested on all the
platforms.

Thoughts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment
On Wed, Jan 13, 2021 at 5:40 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Wed, Jan 13, 2021 at 3:16 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Wed, Jan 13, 2021 at 2:36 PM japin <japinli@hotmail.com> wrote:
> > > > In summary, I feel we need to fix the publisher sending the inserts
> > > > even though the table is dropped from the publication, that is the
> > > > patch patch proposed by japin. This solves the bug reported in this
> > > > thread.
> > > >
> > > > And also, it's good to have the LogicalRepRelMap invalidation fix as a
> > > > 0002 patch in invalidate_syncing_table_states, subscription_change_cb
> > > > and logicalrep_rel_open as proposed by me.
> > > >
> > > > Thoughts?
> > > >
> > >
> > > I think invalidate the LogicalRepRelMap is necessary.  If the table isn't in
> > > subscription, can we remove the LogicalRepRelMapEntry from LogicalRepRelMap?
> >
> > IIUC, it's not possible to know the relid of the alter
> > publication...dropped table in the invalidation callbacks
> > invalidate_syncing_table_states or subscription_change_cb, so it's
> > better to set state of all the entries to SUBREL_STATE_UNKNOWN, so
> > that, in the next logicalrep_rel_open call the function
> > GetSubscriptionRelState will be called and the pg_subscription_rel
> > will be read freshly.
> >
> > This is inline with the reasoning specified at [1].
> >
> > [1] - https://www.postgresql.org/message-id/CAFiTN-udwuc_h0TaFrSEKb-Yo1vVvkjz9TDRw7VE3P-KiPSGbQ%40mail.gmail.com
>
> Attaching following two patches:
>
> 0001 - Makes publisher to not publish the changes for the alter
> publication ... dropped tables. Original patch is by japin, I added
> comments, changed the function name and adjusted the code a bit.
>

Do we really need to access PUBLICATIONRELMAP in this patch? What if
we just set it to false in the else condition of (if (publish &&
(relkind != RELKIND_PARTITIONED_TABLE || pub->pubviaroot)))

> 0002 - Invalidates the relation map cache in subscriber syscache
> invalidation callbacks. Currently, I'm setting entry->state to
> SUBREL_STATE_UNKNOWN in the new invalidation function that's
> introduced logicalrep_relmap_invalidate, so that in the next call to
> logicalrep_rel_open the entry's state will be read from the system
> catalogues using GetSubscriptionRelState. If this is sound's bit
> strange, I can add a boolean invalid to LogicalRepRelMapEntry
> structure and set that here and in logicalrep_rel_open, I can have
> something like:
>     if (entry->state != SUBREL_STATE_READY || entry->invalid)
>         entry->state = GetSubscriptionRelState(MySubscription->oid,
>                                                entry->localreloid,
>                                                &entry->statelsn);
>
>    if (entry->invalid)
>         entry->invalid = false;
>

So, the point of the second patch is that it good to have such a
thing, otherwise, we don't see any problem if we just use patch-0001,
right? I think if we fix the first-one, automatically, we will achieve
what we are trying to with the second-one because ultimately
logicalrep_relmap_update will be called due to Alter Pub... Drop table
.. step and it will mark the entry as SUBREL_STATE_UNKNOWN.

> make check and make check-world passes on both patches.
>
> If okay with the fixes, I will try to add a test case and also a cf
> entry so that these patches get a chance to be tested on all the
> platforms.
>

I think it is good to follow both the steps (adding a testcase and
registering it to CF).

-- 
With Regards,
Amit Kapila.




On Jan 14, 2021, at 1:25 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Jan 13, 2021 at 5:40 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Wed, Jan 13, 2021 at 3:16 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Wed, Jan 13, 2021 at 2:36 PM japin <japinli@hotmail.com> wrote:
In summary, I feel we need to fix the publisher sending the inserts
even though the table is dropped from the publication, that is the
patch patch proposed by japin. This solves the bug reported in this
thread.

And also, it's good to have the LogicalRepRelMap invalidation fix as a
0002 patch in invalidate_syncing_table_states, subscription_change_cb
and logicalrep_rel_open as proposed by me.

Thoughts?


I think invalidate the LogicalRepRelMap is necessary.  If the table isn't in
subscription, can we remove the LogicalRepRelMapEntry from LogicalRepRelMap?

IIUC, it's not possible to know the relid of the alter
publication...dropped table in the invalidation callbacks
invalidate_syncing_table_states or subscription_change_cb, so it's
better to set state of all the entries to SUBREL_STATE_UNKNOWN, so
that, in the next logicalrep_rel_open call the function
GetSubscriptionRelState will be called and the pg_subscription_rel
will be read freshly.

This is inline with the reasoning specified at [1].

[1] - https://www.postgresql.org/message-id/CAFiTN-udwuc_h0TaFrSEKb-Yo1vVvkjz9TDRw7VE3P-KiPSGbQ%40mail.gmail.com

Attaching following two patches:

0001 - Makes publisher to not publish the changes for the alter
publication ... dropped tables. Original patch is by japin, I added
comments, changed the function name and adjusted the code a bit.


Do we really need to access PUBLICATIONRELMAP in this patch? What if
we just set it to false in the else condition of (if (publish &&
(relkind != RELKIND_PARTITIONED_TABLE || pub->pubviaroot)))


Thank for you review. I agree with you, it doesn’t need to access PUBLICATIONRELMAP, since
We already get the publication oid in GetRelationPublications(relid) [1], which also access
PUBLICATIONRELMAP.  If the current pub->oid does not in pubids, the publish is false, so we
do not need publish the table.

I have another question, the data->publications is a list, when did it has more then one items?

0001 - change as you suggest.
0002 - does not change.

Please consider v2 for further review.

[1]
List *
GetRelationPublications(Oid relid)
{
    List       *result = NIL;
    CatCList   *pubrellist;
    int         i;

    /* Find all publications associated with the relation. */
    pubrellist = SearchSysCacheList1(PUBLICATIONRELMAP,
                                     ObjectIdGetDatum(relid));
    for (i = 0; i < pubrellist->n_members; i++)
    {
        HeapTuple   tup = &pubrellist->members[i]->tuple;
        Oid         pubid = ((Form_pg_publication_rel) GETSTRUCT(tup))->prpubid;

        result = lappend_oid(result, pubid);
    }

    ReleaseSysCacheList(pubrellist);

    return result;
}


-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

Attachment
On Thu, Jan 14, 2021 at 5:36 PM Li Japin <japinli@hotmail.com> wrote
> Do we really need to access PUBLICATIONRELMAP in this patch? What if
> we just set it to false in the else condition of (if (publish &&
> (relkind != RELKIND_PARTITIONED_TABLE || pub->pubviaroot)))
>
> Thank for you review. I agree with you, it doesn’t need to access PUBLICATIONRELMAP, since
> We already get the publication oid in GetRelationPublications(relid) [1], which also access
> PUBLICATIONRELMAP.  If the current pub->oid does not in pubids, the publish is false, so we
> do not need publish the table.

+1. This is enough.

> I have another question, the data->publications is a list, when did it has more then one items?

IIUC, when the single table is associated with multiple publications,
then data->publications will have multiple entries. Though I have not
tried, we can try having two or three publications for the same table
and verify that.

> 0001 - change as you suggest.
> 0002 - does not change.

Thanks for the updated patch. I will respond to Amit's previous
comment on the 0002 patch soon.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



On Thu, 14 Jan 2021 at 20:19, Bharath Rupireddy wrote:
> On Thu, Jan 14, 2021 at 5:36 PM Li Japin <japinli@hotmail.com> wrote
>> Do we really need to access PUBLICATIONRELMAP in this patch? What if
>> we just set it to false in the else condition of (if (publish &&
>> (relkind != RELKIND_PARTITIONED_TABLE || pub->pubviaroot)))
>>
>> Thank for you review. I agree with you, it doesn’t need to access PUBLICATIONRELMAP, since
>> We already get the publication oid in GetRelationPublications(relid) [1], which also access
>> PUBLICATIONRELMAP.  If the current pub->oid does not in pubids, the publish is false, so we
>> do not need publish the table.
>
> +1. This is enough.
>
>> I have another question, the data->publications is a list, when did it has more then one items?
>
> IIUC, when the single table is associated with multiple publications,
> then data->publications will have multiple entries. Though I have not
> tried, we can try having two or three publications for the same table
> and verify that.
>

I try add one table into two publications, but the data->publications has only
one item.  Is there something I missed?

--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.



On Thu, Jan 14, 2021 at 6:02 PM japin <japinli@hotmail.com> wrote:
>
> On Thu, 14 Jan 2021 at 20:19, Bharath Rupireddy wrote:
> > On Thu, Jan 14, 2021 at 5:36 PM Li Japin <japinli@hotmail.com> wrote
> >> Do we really need to access PUBLICATIONRELMAP in this patch? What if
> >> we just set it to false in the else condition of (if (publish &&
> >> (relkind != RELKIND_PARTITIONED_TABLE || pub->pubviaroot)))
> >>
> >> Thank for you review. I agree with you, it doesn’t need to access PUBLICATIONRELMAP, since
> >> We already get the publication oid in GetRelationPublications(relid) [1], which also access
> >> PUBLICATIONRELMAP.  If the current pub->oid does not in pubids, the publish is false, so we
> >> do not need publish the table.
> >
> > +1. This is enough.
> >
> >> I have another question, the data->publications is a list, when did it has more then one items?
> >
> > IIUC, when the single table is associated with multiple publications,
> > then data->publications will have multiple entries. Though I have not
> > tried, we can try having two or three publications for the same table
> > and verify that.
> >
>
> I try add one table into two publications, but the data->publications has only
> one item.  Is there something I missed?
>

I think you will have multiple publications in that list when the
subscriber has subscribed to multiple publications. For example,
Create Subscription ... Publication pub1, Publication pub2.

--
With Regards,
Amit Kapila.



> On Jan 14, 2021, at 8:44 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> On Thu, Jan 14, 2021 at 6:02 PM japin <japinli@hotmail.com> wrote:
>> 
>> On Thu, 14 Jan 2021 at 20:19, Bharath Rupireddy wrote:
>>> On Thu, Jan 14, 2021 at 5:36 PM Li Japin <japinli@hotmail.com> wrote
>>>> Do we really need to access PUBLICATIONRELMAP in this patch? What if
>>>> we just set it to false in the else condition of (if (publish &&
>>>> (relkind != RELKIND_PARTITIONED_TABLE || pub->pubviaroot)))
>>>> 
>>>> Thank for you review. I agree with you, it doesn’t need to access PUBLICATIONRELMAP, since
>>>> We already get the publication oid in GetRelationPublications(relid) [1], which also access
>>>> PUBLICATIONRELMAP.  If the current pub->oid does not in pubids, the publish is false, so we
>>>> do not need publish the table.
>>> 
>>> +1. This is enough.
>>> 
>>>> I have another question, the data->publications is a list, when did it has more then one items?
>>> 
>>> IIUC, when the single table is associated with multiple publications,
>>> then data->publications will have multiple entries. Though I have not
>>> tried, we can try having two or three publications for the same table
>>> and verify that.
>>> 
>> 
>> I try add one table into two publications, but the data->publications has only
>> one item.  Is there something I missed?
>> 
> 
> I think you will have multiple publications in that list when the
> subscriber has subscribed to multiple publications. For example,
> Create Subscription ... Publication pub1, Publication pub2.
> 

Thanks, you are right! When we create a subscription with multiple publications,
the data->publications has more then one items.

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.


On Thu, Jan 14, 2021 at 5:36 PM Li Japin <japinli@hotmail.com> wrote:
>
> On Jan 14, 2021, at 1:25 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> Attaching following two patches:
>
> 0001 - Makes publisher to not publish the changes for the alter
> publication ... dropped tables. Original patch is by japin, I added
> comments, changed the function name and adjusted the code a bit.
>
>
> Do we really need to access PUBLICATIONRELMAP in this patch? What if
> we just set it to false in the else condition of (if (publish &&
> (relkind != RELKIND_PARTITIONED_TABLE || pub->pubviaroot)))
>
>
> Thank for you review. I agree with you, it doesn’t need to access PUBLICATIONRELMAP, since
> We already get the publication oid in GetRelationPublications(relid) [1], which also access
> PUBLICATIONRELMAP.  If the current pub->oid does not in pubids, the publish is false, so we
> do not need publish the table.
>
> I have another question, the data->publications is a list, when did it has more then one items?
>
> 0001 - change as you suggest.
>

Can we add a test case for this patch as this seems to be a
reproducible bug? We can add the test in
src/test/subscription/100_bugs.pl unless you find a better place to
add it.

--
With Regards,
Amit Kapila.



> On Thu, Jan 14, 2021 at 5:36 PM Li Japin <japinli@hotmail.com> wrote
> > Do we really need to access PUBLICATIONRELMAP in this patch? What if
> > we just set it to false in the else condition of (if (publish &&
> > (relkind != RELKIND_PARTITIONED_TABLE || pub->pubviaroot)))
> >
> > Thank for you review. I agree with you, it doesn’t need to access
> > PUBLICATIONRELMAP, since We already get the publication oid in
> > GetRelationPublications(relid) [1], which also access
> > PUBLICATIONRELMAP.  If the current pub->oid does not in pubids, the
> publish is false, so we do not need publish the table.
> 
> +1. This is enough.
> 
> > I have another question, the data->publications is a list, when did it
> has more then one items?
> 
> IIUC, when the single table is associated with multiple publications, then
> data->publications will have multiple entries. Though I have not tried,
> we can try having two or three publications for the same table and verify
> that.
> 
> > 0001 - change as you suggest.
> > 0002 - does not change.


Hi

In 0001 patch

The comments says " Relation is not associated with the publication anymore "
That comment was accurate when check is_relation_part_of_publication() in the last patch.

But when put the code in the else branch, not only the case in the comments(Relation is dropped),
Some other case such as "partitioned tables" will hit else branch too.

Do you think we need fix the comment a little to make it accurate?


And it seems we can add a "continue;" in else branch.
Of course this it not necessary.

Best regards,
houzj




On Thu, Jan 14, 2021 at 10:53 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > 0002 - Invalidates the relation map cache in subscriber syscache
> > invalidation callbacks. Currently, I'm setting entry->state to
> > SUBREL_STATE_UNKNOWN in the new invalidation function that's
> > introduced logicalrep_relmap_invalidate, so that in the next call to
> > logicalrep_rel_open the entry's state will be read from the system
> > catalogues using GetSubscriptionRelState. If this is sound's bit
> > strange, I can add a boolean invalid to LogicalRepRelMapEntry
> > structure and set that here and in logicalrep_rel_open, I can have
> > something like:
> >     if (entry->state != SUBREL_STATE_READY || entry->invalid)
> >         entry->state = GetSubscriptionRelState(MySubscription->oid,
> >                                                entry->localreloid,
> >                                                &entry->statelsn);
> >
> >    if (entry->invalid)
> >         entry->invalid = false;
> >
>
> So, the point of the second patch is that it good to have such a
> thing, otherwise, we don't see any problem if we just use patch-0001,
> right? I think if we fix the first-one, automatically, we will achieve
> what we are trying to with the second-one because ultimately
> logicalrep_relmap_update will be called due to Alter Pub... Drop table
> .. step and it will mark the entry as SUBREL_STATE_UNKNOWN.

On some further analysis, I found that logicalrep_relmap_update is
called in subscribers for inserts, delets, updates and truncate
statements on the dropped(from publication) relations in the
publisher.

If any alters to pg_subscription, then we invalidate the subscription
in subscription_change_cb, maybe_reread_subscription subscription will
take care of re-reading from the system catalogues, see
apply_handle_XXXX->ensure_transaction -> maybe_reread_subscription.
And we don't have any entry in LogicalRepRelMapEntry that gets
affected because of changes to pg_subscription, so we don't need to
invalidate the rel map cache entries in subscription_change_cb.

If any alters to pg_subscription_rel, then there are two parameters in
LogicalRepRelMapEntry structure, state and statelsn that may get
affected. Changes to statelsn column is taken care of with the
invalidation callback invalidate_syncing_table_states setting
table_states_valid to true and
process_syncing_tables->process_syncing_tables_for_apply. But, if
state column is changed somehow (although I'm not quite sure how it
can change to different states 'i', 'd', 's', 'r' after the initial
table sync phase in logical replication, but we can pretty much do
something like update pg_subscription_rel set srsubstate = 'd';), in
this case invalidate_syncing_table_states gets called, but if we don't
revalidation of relation map cache entries, they will have the old
state value.

So what I feel is at least we need the 0002 patch but with only
invalidating the entries in invalidate_syncing_table_states not in
subscription_change_cb, although I'm not able to back it up with any
use case or bug as such.

Thoughts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



On Fri, Jan 15, 2021 at 11:33 AM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
>
> > On Thu, Jan 14, 2021 at 5:36 PM Li Japin <japinli@hotmail.com> wrote
> > > Do we really need to access PUBLICATIONRELMAP in this patch? What if
> > > we just set it to false in the else condition of (if (publish &&
> > > (relkind != RELKIND_PARTITIONED_TABLE || pub->pubviaroot)))
> > >
> > > Thank for you review. I agree with you, it doesn’t need to access
> > > PUBLICATIONRELMAP, since We already get the publication oid in
> > > GetRelationPublications(relid) [1], which also access
> > > PUBLICATIONRELMAP.  If the current pub->oid does not in pubids, the
> > publish is false, so we do not need publish the table.
> >
> > +1. This is enough.
> >
> > > I have another question, the data->publications is a list, when did it
> > has more then one items?
> >
> > IIUC, when the single table is associated with multiple publications, then
> > data->publications will have multiple entries. Though I have not tried,
> > we can try having two or three publications for the same table and verify
> > that.
> >
> > > 0001 - change as you suggest.
> > > 0002 - does not change.
>
>
> Hi
>
> In 0001 patch
>
> The comments says " Relation is not associated with the publication anymore "
> That comment was accurate when check is_relation_part_of_publication() in the last patch.
>
> But when put the code in the else branch, not only the case in the comments(Relation is dropped),
> Some other case such as "partitioned tables" will hit else branch too.
>
> Do you think we need fix the comment a little to make it accurate?

Thanks, that comment needs a rework, in fact the else condition
also(because we may fall into else branch even when publish is true
and (relkind != RELKIND_PARTITIONED_TABLE || pub->pubviaroot) is
false, but our intention(to fix the bug reported here) is to have the
else condition only when publish is false. And also instead of just
relying on the publish variable which can get set even when the
relation is not in the publication but ancestor_published is true, we
can just have something like below to fix the bug reported here:

            if (publish &&
                (relkind != RELKIND_PARTITIONED_TABLE || pub->pubviaroot))
            {
                entry->pubactions.pubinsert |= pub->pubactions.pubinsert;
                entry->pubactions.pubupdate |= pub->pubactions.pubupdate;
                entry->pubactions.pubdelete |= pub->pubactions.pubdelete;
                entry->pubactions.pubtruncate |= pub->pubactions.pubtruncate;
            }
            else if (!list_member_oid(pubids, pub->oid))
            {
                /*
                 * Relation is not associated with the publication anymore i.e
                 * it would have been dropped from the publication. So we don't
                 * need to publish the changes for it.
                 */
                entry->pubactions.pubinsert = false;
                entry->pubactions.pubupdate = false;
                entry->pubactions.pubdelete = false;
                entry->pubactions.pubtruncate = false;
            }

So this way, the fix only focuses on what we have reported here and it
doesn't cause any regressions may be, and the existing comment becomes
appropriate.

Thoughts?

> And it seems we can add a "continue;" in else branch.
> Of course this it not necessary.

As you said, it's not necessary because the following if condition
will fail anyways. Having said that, if we add continue, then any
future code that gets added after the following if condition will
never get executed. Though the reasoning may sound silly, IMO let's
not add the continue in the else branch.

            if (entry->pubactions.pubinsert && entry->pubactions.pubupdate &&
                entry->pubactions.pubdelete && entry->pubactions.pubtruncate)
                break;

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



On Fri, 15 Jan 2021 at 14:50, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
> On Fri, Jan 15, 2021 at 11:33 AM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
>>
>> > On Thu, Jan 14, 2021 at 5:36 PM Li Japin <japinli@hotmail.com> wrote
>> > > Do we really need to access PUBLICATIONRELMAP in this patch? What if
>> > > we just set it to false in the else condition of (if (publish &&
>> > > (relkind != RELKIND_PARTITIONED_TABLE || pub->pubviaroot)))
>> > >
>> > > Thank for you review. I agree with you, it doesn’t need to access
>> > > PUBLICATIONRELMAP, since We already get the publication oid in
>> > > GetRelationPublications(relid) [1], which also access
>> > > PUBLICATIONRELMAP.  If the current pub->oid does not in pubids, the
>> > publish is false, so we do not need publish the table.
>> >
>> > +1. This is enough.
>> >
>> > > I have another question, the data->publications is a list, when did it
>> > has more then one items?
>> >
>> > IIUC, when the single table is associated with multiple publications, then
>> > data->publications will have multiple entries. Though I have not tried,
>> > we can try having two or three publications for the same table and verify
>> > that.
>> >
>> > > 0001 - change as you suggest.
>> > > 0002 - does not change.
>>
>>
>> Hi
>>
>> In 0001 patch
>>
>> The comments says " Relation is not associated with the publication anymore "
>> That comment was accurate when check is_relation_part_of_publication() in the last patch.
>>
>> But when put the code in the else branch, not only the case in the comments(Relation is dropped),
>> Some other case such as "partitioned tables" will hit else branch too.
>>
>> Do you think we need fix the comment a little to make it accurate?
>
> Thanks, that comment needs a rework, in fact the else condition
> also(because we may fall into else branch even when publish is true
> and (relkind != RELKIND_PARTITIONED_TABLE || pub->pubviaroot) is
> false, but our intention(to fix the bug reported here) is to have the
> else condition only when publish is false. And also instead of just
> relying on the publish variable which can get set even when the
> relation is not in the publication but ancestor_published is true, we
> can just have something like below to fix the bug reported here:
>
>             if (publish &&
>                 (relkind != RELKIND_PARTITIONED_TABLE || pub->pubviaroot))
>             {
>                 entry->pubactions.pubinsert |= pub->pubactions.pubinsert;
>                 entry->pubactions.pubupdate |= pub->pubactions.pubupdate;
>                 entry->pubactions.pubdelete |= pub->pubactions.pubdelete;
>                 entry->pubactions.pubtruncate |= pub->pubactions.pubtruncate;
>             }
>             else if (!list_member_oid(pubids, pub->oid))
>             {
>                 /*
>                  * Relation is not associated with the publication anymore i.e
>                  * it would have been dropped from the publication. So we don't
>                  * need to publish the changes for it.
>                  */
>                 entry->pubactions.pubinsert = false;
>                 entry->pubactions.pubupdate = false;
>                 entry->pubactions.pubdelete = false;
>                 entry->pubactions.pubtruncate = false;
>             }
>
> So this way, the fix only focuses on what we have reported here and it
> doesn't cause any regressions may be, and the existing comment becomes
> appropriate.
>
> Thoughts?
>

I'm sorry, the 0001 patch is totally wrong.  If we have only one publication, it works,
however, this is a special case.  If we have multiple publication in one subscription,
it doesn't work.  Here is a usecase.

Step (1)
-- On publisher
CREATE TABLE t1 (a int);
CREATE TABLE t2 (a int);
CREATE PUBLICATION mypub1 FOR TABLE t1 WITH (publish='insert');
CREATE PUBLICATION mypub2 FOR TABLE t2;

Step (2)
-- On subscriber
CREATE TABLE t1 (a int);
CREATE TABLE t2 (a int);
CREATE SUBSCRIPTION mysub CONNECTION 'host=localhost port=5432 dbname=postgres' PUBLICATION mypub1, mypub2;

Step (3)
-- On publisher
INSERT INTO t1 VALUES (1);

Step (4)
-- On subscriber
SELECT * FROM t1;

In step (4), we cannot get the records, because there has two publications in
data->publications, so we will check one by one.
The first publication is "mypub1" and entry->pubactions.pubinsert will be set
true, however, in the second round, the publication is "mypub2", and we cannot
find pub->oid in pubids (only oid of "mypub1"), so it will set all entry->pubactions
to false, and nothing will be replicate to subscriber.

My apologies!

--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.



On Fri, 15 Jan 2021 at 15:49, japin <japinli@hotmail.com> wrote:
> On Fri, 15 Jan 2021 at 14:50, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
>> On Fri, Jan 15, 2021 at 11:33 AM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
>>>
>>> > On Thu, Jan 14, 2021 at 5:36 PM Li Japin <japinli@hotmail.com> wrote
>>> > > Do we really need to access PUBLICATIONRELMAP in this patch? What if
>>> > > we just set it to false in the else condition of (if (publish &&
>>> > > (relkind != RELKIND_PARTITIONED_TABLE || pub->pubviaroot)))
>>> > >
>>> > > Thank for you review. I agree with you, it doesn’t need to access
>>> > > PUBLICATIONRELMAP, since We already get the publication oid in
>>> > > GetRelationPublications(relid) [1], which also access
>>> > > PUBLICATIONRELMAP.  If the current pub->oid does not in pubids, the
>>> > publish is false, so we do not need publish the table.
>>> >
>>> > +1. This is enough.
>>> >
>>> > > I have another question, the data->publications is a list, when did it
>>> > has more then one items?
>>> >
>>> > IIUC, when the single table is associated with multiple publications, then
>>> > data->publications will have multiple entries. Though I have not tried,
>>> > we can try having two or three publications for the same table and verify
>>> > that.
>>> >
>>> > > 0001 - change as you suggest.
>>> > > 0002 - does not change.
>>>
>>>
>>> Hi
>>>
>>> In 0001 patch
>>>
>>> The comments says " Relation is not associated with the publication anymore "
>>> That comment was accurate when check is_relation_part_of_publication() in the last patch.
>>>
>>> But when put the code in the else branch, not only the case in the comments(Relation is dropped),
>>> Some other case such as "partitioned tables" will hit else branch too.
>>>
>>> Do you think we need fix the comment a little to make it accurate?
>>
>> Thanks, that comment needs a rework, in fact the else condition
>> also(because we may fall into else branch even when publish is true
>> and (relkind != RELKIND_PARTITIONED_TABLE || pub->pubviaroot) is
>> false, but our intention(to fix the bug reported here) is to have the
>> else condition only when publish is false. And also instead of just
>> relying on the publish variable which can get set even when the
>> relation is not in the publication but ancestor_published is true, we
>> can just have something like below to fix the bug reported here:
>>
>>             if (publish &&
>>                 (relkind != RELKIND_PARTITIONED_TABLE || pub->pubviaroot))
>>             {
>>                 entry->pubactions.pubinsert |= pub->pubactions.pubinsert;
>>                 entry->pubactions.pubupdate |= pub->pubactions.pubupdate;
>>                 entry->pubactions.pubdelete |= pub->pubactions.pubdelete;
>>                 entry->pubactions.pubtruncate |= pub->pubactions.pubtruncate;
>>             }
>>             else if (!list_member_oid(pubids, pub->oid))
>>             {
>>                 /*
>>                  * Relation is not associated with the publication anymore i.e
>>                  * it would have been dropped from the publication. So we don't
>>                  * need to publish the changes for it.
>>                  */
>>                 entry->pubactions.pubinsert = false;
>>                 entry->pubactions.pubupdate = false;
>>                 entry->pubactions.pubdelete = false;
>>                 entry->pubactions.pubtruncate = false;
>>             }
>>
>> So this way, the fix only focuses on what we have reported here and it
>> doesn't cause any regressions may be, and the existing comment becomes
>> appropriate.
>>
>> Thoughts?
>>
>
> I'm sorry, the 0001 patch is totally wrong.  If we have only one publication, it works,
> however, this is a special case.  If we have multiple publication in one subscription,
> it doesn't work.  Here is a usecase.
>
> Step (1)
> -- On publisher
> CREATE TABLE t1 (a int);
> CREATE TABLE t2 (a int);
> CREATE PUBLICATION mypub1 FOR TABLE t1 WITH (publish='insert');
> CREATE PUBLICATION mypub2 FOR TABLE t2;
>
> Step (2)
> -- On subscriber
> CREATE TABLE t1 (a int);
> CREATE TABLE t2 (a int);
> CREATE SUBSCRIPTION mysub CONNECTION 'host=localhost port=5432 dbname=postgres' PUBLICATION mypub1, mypub2;
>
> Step (3)
> -- On publisher
> INSERT INTO t1 VALUES (1);
>
> Step (4)
> -- On subscriber
> SELECT * FROM t1;
>
> In step (4), we cannot get the records, because there has two publications in
> data->publications, so we will check one by one.
> The first publication is "mypub1" and entry->pubactions.pubinsert will be set
> true, however, in the second round, the publication is "mypub2", and we cannot
> find pub->oid in pubids (only oid of "mypub1"), so it will set all entry->pubactions
> to false, and nothing will be replicate to subscriber.
>
> My apologies!

As I mentioned in previous thread, if there has multiple publications in
single subscription, it might lead data loss.

When the entry got invalidated in rel_sync_cache_publication_cb(), I think we
should mark the pubactions to false, and let get_rel_sync_entry() recalculate
the pubactions.

0001 - modify as above described.
0002 - do not change.

Any thought?

--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.


Attachment
On Fri, Jan 15, 2021 at 2:46 PM japin <japinli@hotmail.com> wrote:
>
> >
> > I'm sorry, the 0001 patch is totally wrong.  If we have only one publication, it works,
> > however, this is a special case.  If we have multiple publication in one subscription,
> > it doesn't work.  Here is a usecase.
> >
> > Step (1)
> > -- On publisher
> > CREATE TABLE t1 (a int);
> > CREATE TABLE t2 (a int);
> > CREATE PUBLICATION mypub1 FOR TABLE t1 WITH (publish='insert');
> > CREATE PUBLICATION mypub2 FOR TABLE t2;
> >
> > Step (2)
> > -- On subscriber
> > CREATE TABLE t1 (a int);
> > CREATE TABLE t2 (a int);
> > CREATE SUBSCRIPTION mysub CONNECTION 'host=localhost port=5432 dbname=postgres' PUBLICATION mypub1, mypub2;
> >
> > Step (3)
> > -- On publisher
> > INSERT INTO t1 VALUES (1);
> >
> > Step (4)
> > -- On subscriber
> > SELECT * FROM t1;
> >
> > In step (4), we cannot get the records, because there has two publications in
> > data->publications, so we will check one by one.
> > The first publication is "mypub1" and entry->pubactions.pubinsert will be set
> > true, however, in the second round, the publication is "mypub2", and we cannot
> > find pub->oid in pubids (only oid of "mypub1"), so it will set all entry->pubactions
> > to false, and nothing will be replicate to subscriber.
> >
> > My apologies!
>
> As I mentioned in previous thread, if there has multiple publications in
> single subscription, it might lead data loss.
>
> When the entry got invalidated in rel_sync_cache_publication_cb(), I think we
> should mark the pubactions to false, and let get_rel_sync_entry() recalculate
> the pubactions.
>
> 0001 - modify as above described.
> 0002 - do not change.
>
> Any thought?
>

That sounds like a better way to fix and in fact, I was about to
suggest the same after reading your previous email. I'll think more on
this but in the meantime, can you add the test case in the patch as
requested earlier as well.

-- 
With Regards,
Amit Kapila.



On Fri, Jan 15, 2021 at 4:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> I feel it is better to not do anything for this because neither we
> have a test to reproduce the problem nor is it clear from theory if
> there is any problem to solve here.

+1 to ignore 0002 patch. Thanks Amit.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



On Fri, Jan 15, 2021 at 11:48 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Thu, Jan 14, 2021 at 10:53 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > 0002 - Invalidates the relation map cache in subscriber syscache
> > > invalidation callbacks. Currently, I'm setting entry->state to
> > > SUBREL_STATE_UNKNOWN in the new invalidation function that's
> > > introduced logicalrep_relmap_invalidate, so that in the next call to
> > > logicalrep_rel_open the entry's state will be read from the system
> > > catalogues using GetSubscriptionRelState. If this is sound's bit
> > > strange, I can add a boolean invalid to LogicalRepRelMapEntry
> > > structure and set that here and in logicalrep_rel_open, I can have
> > > something like:
> > >     if (entry->state != SUBREL_STATE_READY || entry->invalid)
> > >         entry->state = GetSubscriptionRelState(MySubscription->oid,
> > >                                                entry->localreloid,
> > >                                                &entry->statelsn);
> > >
> > >    if (entry->invalid)
> > >         entry->invalid = false;
> > >
> >
> > So, the point of the second patch is that it good to have such a
> > thing, otherwise, we don't see any problem if we just use patch-0001,
> > right? I think if we fix the first-one, automatically, we will achieve
> > what we are trying to with the second-one because ultimately
> > logicalrep_relmap_update will be called due to Alter Pub... Drop table
> > .. step and it will mark the entry as SUBREL_STATE_UNKNOWN.
>
> On some further analysis, I found that logicalrep_relmap_update is
> called in subscribers for inserts, delets, updates and truncate
> statements on the dropped(from publication) relations in the
> publisher.
>
> If any alters to pg_subscription, then we invalidate the subscription
> in subscription_change_cb, maybe_reread_subscription subscription will
> take care of re-reading from the system catalogues, see
> apply_handle_XXXX->ensure_transaction -> maybe_reread_subscription.
> And we don't have any entry in LogicalRepRelMapEntry that gets
> affected because of changes to pg_subscription, so we don't need to
> invalidate the rel map cache entries in subscription_change_cb.
>
> If any alters to pg_subscription_rel, then there are two parameters in
> LogicalRepRelMapEntry structure, state and statelsn that may get
> affected. Changes to statelsn column is taken care of with the
> invalidation callback invalidate_syncing_table_states setting
> table_states_valid to true and
> process_syncing_tables->process_syncing_tables_for_apply.
>

I think you are saying the reverse here. The function
invalidate_syncing_table_states sets table_states_valid to false and
the other one sets it to true.

> But, if
> state column is changed somehow (although I'm not quite sure how it
> can change to different states 'i', 'd', 's', 'r' after the initial
> table sync phase in logical replication,
>

These states are for the initial copy-phase after that these won't change.

> but we can pretty much do
> something like update pg_subscription_rel set srsubstate = 'd';), in
> this case invalidate_syncing_table_states gets called, but if we don't
> revalidation of relation map cache entries, they will have the old
> state value.
>

Hmm, modifying state values as you are suggesting have much dire
consequences like in this case it can let tablesync worker to restart
and try to do perform initial-copy again or it can lead to missing
data in subscriber. We don't recommend to change system catalogs
manually due to such problems.

> So what I feel is at least we need the 0002 patch but with only
> invalidating the entries in invalidate_syncing_table_states not in
> subscription_change_cb, although I'm not able to back it up with any
> use case or bug as such.
>

I feel it is better to not do anything for this because neither we
have a test to reproduce the problem nor is it clear from theory if
there is any problem to solve here.

-- 
With Regards,
Amit Kapila.



On Fri, Jan 15, 2021 at 4:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> That sounds like a better way to fix and in fact, I was about to
> suggest the same after reading your previous email. I'll think more on
> this but in the meantime, can you add the test case in the patch as
> requested earlier as well.

@Li Japin Please let me know if you have already started to work on
the test case, otherwise I can make a 0002 patch for the test case and
post.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



On Fri, 15 Jan 2021 at 18:57, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
> On Fri, Jan 15, 2021 at 4:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>> That sounds like a better way to fix and in fact, I was about to
>> suggest the same after reading your previous email. I'll think more on
>> this but in the meantime, can you add the test case in the patch as
>> requested earlier as well.
>
> @Li Japin Please let me know if you have already started to work on
> the test case, otherwise I can make a 0002 patch for the test case and
> post.
>

Yeah, I'm working on the test case.  Since I'm not familair with the logical
replication test, it may take some times.

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.



On Fri, Jan 15, 2021 at 4:54 PM japin <japinli@hotmail.com> wrote:
> On Fri, 15 Jan 2021 at 18:57, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
> > On Fri, Jan 15, 2021 at 4:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >> That sounds like a better way to fix and in fact, I was about to
> >> suggest the same after reading your previous email. I'll think more on
> >> this but in the meantime, can you add the test case in the patch as
> >> requested earlier as well.
> >
> > @Li Japin Please let me know if you have already started to work on
> > the test case, otherwise I can make a 0002 patch for the test case and
> > post.
> >
>
> Yeah, I'm working on the test case.  Since I'm not familair with the logical
> replication test, it may take some times.

Thanks a lot. I quickly added the initial use case where we saw the
bug, attached is 0002 patch. Please have a look and add further use
cases if necessary. If okay, it's better to make a cf entry.

I have one comment in v3-0001 patch,
+         * There might some relations dropped from the publication, we do

I think we should change it to - "There might be some relations".

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment
I made an entry in the commitfest[1], so that the patches will get a
chance to run on all the platforms.

Attaching v4 patch set i.e. 0001 - fix, 0002 - test case.

[1] - https://commitfest.postgresql.org/32/2944/

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment
On Sat, Jan 16, 2021 at 11:52 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> I made an entry in the commitfest[1], so that the patches will get a
> chance to run on all the platforms.
>
> Attaching v4 patch set i.e. 0001 - fix, 0002 - test case.
>

In the test, can we have multiple publications for the subscription
because that is how we discovered that the originally proposed patch
was not correct? Also, is it possible to extend some of the existing
tests in 001_rep_changes.pl or anywhere so that we can avoid some cost
of the replication setup.

-- 
With Regards,
Amit Kapila.



On Sat, 16 Jan 2021 at 14:21, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
> I made an entry in the commitfest[1], so that the patches will get a
> chance to run on all the platforms.
>
> Attaching v4 patch set i.e. 0001 - fix, 0002 - test case.
>
> [1] - https://commitfest.postgresql.org/32/2944/
>

Thanks for the updated patch.

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.



On Sat, Jan 16, 2021 at 12:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Sat, Jan 16, 2021 at 11:52 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > I made an entry in the commitfest[1], so that the patches will get a
> > chance to run on all the platforms.
> >
> > Attaching v4 patch set i.e. 0001 - fix, 0002 - test case.
> >
>
> In the test, can we have multiple publications for the subscription
> because that is how we discovered that the originally proposed patch
> was not correct? Also, is it possible to extend some of the existing
> tests in 001_rep_changes.pl or anywhere so that we can avoid some cost
> of the replication setup.

Sure, I will add the multiple publications use case provided by japin
and also see if I can move the tests from 100_bugs.pl to
0001_rep_changes.pl.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



On Sat, Jan 16, 2021 at 12:21 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> > In the test, can we have multiple publications for the subscription
> > because that is how we discovered that the originally proposed patch
> > was not correct? Also, is it possible to extend some of the existing
> > tests in 001_rep_changes.pl or anywhere so that we can avoid some cost
> > of the replication setup.
>
> Sure, I will add the multiple publications use case provided by japin
> and also see if I can move the tests from 100_bugs.pl to
> 0001_rep_changes.pl.

Attaching v5 patch set. 0001 - No change. 0002 - I moved the tests to
001_rep_changes.pl so that we could avoid creation of subscriber,
publisher nodes and other set up. I also added the multiple
publication for the subscription test case which was failing with
v2-0001 patch. Note that I had to create subscriber,  publications for
this test case, because I couldn't find the regression (on v2-001
patch) with any of the existing test cases and also I'm dropping them
after the test so that it will not harm any existing following tests.
Hope that's okay. With v5-0002 and v2-0001, the multiple publication
for the subscription test case fails.

Please consider the v5 patch set for further review.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment
On Sat, Jan 16, 2021 at 6:08 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Sat, Jan 16, 2021 at 12:21 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> > > In the test, can we have multiple publications for the subscription
> > > because that is how we discovered that the originally proposed patch
> > > was not correct? Also, is it possible to extend some of the existing
> > > tests in 001_rep_changes.pl or anywhere so that we can avoid some cost
> > > of the replication setup.
> >
> > Sure, I will add the multiple publications use case provided by japin
> > and also see if I can move the tests from 100_bugs.pl to
> > 0001_rep_changes.pl.
>
> Attaching v5 patch set. 0001 - No change. 0002 - I moved the tests to
> 001_rep_changes.pl so that we could avoid creation of subscriber,
> publisher nodes and other set up.
>

Thanks for the patch. Few comments:

+
+# Test replication with multiple publications for subscription
+

While checking the existing tests in 001_rep_changes.pl, I came across
the below test which has multiple publications, won't that suffice the
need?

"CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr'
PUBLICATION tap_pub, tap_pub_ins_only"

BTW, have we tried to check if this problem exists in back-branches?
It seems to me the problem has been recently introduced by commit
69bd60672a. I am telling this by looking at code and have yet not
performed any testing so I could be wrong.

-- 
With Regards,
Amit Kapila.



On Fri, Jan 22, 2021 at 2:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Sat, Jan 16, 2021 at 6:08 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Sat, Jan 16, 2021 at 12:21 PM Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > > > In the test, can we have multiple publications for the subscription
> > > > because that is how we discovered that the originally proposed patch
> > > > was not correct? Also, is it possible to extend some of the existing
> > > > tests in 001_rep_changes.pl or anywhere so that we can avoid some cost
> > > > of the replication setup.
> > >
> > > Sure, I will add the multiple publications use case provided by japin
> > > and also see if I can move the tests from 100_bugs.pl to
> > > 0001_rep_changes.pl.
> >
> > Attaching v5 patch set. 0001 - No change. 0002 - I moved the tests to
> > 001_rep_changes.pl so that we could avoid creation of subscriber,
> > publisher nodes and other set up.
> >
>
> Thanks for the patch. Few comments:
>
> +
> +# Test replication with multiple publications for subscription
> +
>
> While checking the existing tests in 001_rep_changes.pl, I came across
> the below test which has multiple publications, won't that suffice the
> need?
>
> "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr'
> PUBLICATION tap_pub, tap_pub_ins_only"
>
> BTW, have we tried to check if this problem exists in back-branches?
> It seems to me the problem has been recently introduced by commit
> 69bd60672a. I am telling this by looking at code and have yet not
> performed any testing so I could be wrong.

Thanks for the comments Amit. I will update soon.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



On Fri, Jan 22, 2021 at 2:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> Thanks for the patch. Few comments:
> +
> +# Test replication with multiple publications for subscription
> +
>
> While checking the existing tests in 001_rep_changes.pl, I came across
> the below test which has multiple publications, won't that suffice the
> need?
>
> "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr'
> PUBLICATION tap_pub, tap_pub_ins_only"

Yeah I saw that test case earlier, but we couldn't catch the error with it.

Our earlier fix was to set pubactions to false whenever publish is
false in get_rel_sync_entry

foreach(lc, data->publications)
{
    //publish is set to false when the given relation's publication is
not in the list of publications provided by subscription i.e.
data->publications
    if(!publish)
    {
        //set all entry->pubactions to false
    }
}

The existing use case:
The publication tap_pub has tables tab_rep, tab_full, tab_full2,
tab_mixed, tab_include, tab_nothing, tab_full_pk
The publication tap_pub_ins_only has table tab_ins
CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr'
PUBLICATION tap_pub, tap_pub_ins_only;
INSERT INTO tab_ins SELECT generate_series(1,50);
And the first insertion after the subscription is created happens into
the table tab_ins which is with tap_pub_ins_only i.e. 2nd publication
in the list of publications specified in the create subscription
statement.

The existing use case analysis:
given relation tab_ins's publication is tap_pub_ins_only
data->publications list contains - tap_pub, tap_pub_ins_only
loop 1: data->publications list 1st element is tap_pub and relation
tab_ins's publication is tap_pub_ins_only, so publish is false,
pubations are set to false.
loop 2: data->publications list 2nd element is tap_pub_ins_only and
relation tab_ins's publication is also tap_pub_ins_only, publish is
true, pubations are set to true,
loop is exited with entry->pubactions set to true, hence the
replication happens for the table.

japin's use case [1]:
But, in the japin use case[1] where we saw regression with the initial
version of the fix, the first insertion happens into the table t1
which is with publication mypub1 i.e. 1st in the list of publications
associated with the subscription.

japin's use case [1] analysis:
given relation t1's publication is mypub1
data->publications list contains - mypub1, mypub2
loop 1: data->publications list 1st element is mypub1 and relation
t1's publication is also mypub1, so publish is true, pubations.insert
is set to true, other actions are false because the publication mypub1
is created WITH (publish='insert');, so the loop can't get exited from
below break condition.
            if (entry->pubactions.pubinsert && entry->pubactions.pubupdate &&
                entry->pubactions.pubdelete && entry->pubactions.pubtruncate)
                break;
loop 2: data->publications list 2nd element is mypub2 and relation
t1's publication is mypub1, so publish is false, all pubations are set
to false.
loop is exited with all entry->pubactions set to false, hence the
replication doesn't happen for the table t1.

From the above analysis, it turns out that the existing use case
didn't catch the regression, that is why we had to add a new test
case.

[1]
Step (1)
-- On publisher
CREATE TABLE t1 (a int);
CREATE TABLE t2 (a int);
CREATE PUBLICATION mypub1 FOR TABLE t1 WITH (publish='insert');
CREATE PUBLICATION mypub2 FOR TABLE t2;

Step (2)
-- On subscriber
CREATE TABLE t1 (a int);
CREATE TABLE t2 (a int);
CREATE SUBSCRIPTION mysub1 CONNECTION 'host=localhost port=5432
dbname=postgres' PUBLICATION mypub1, mypub2;

Step (3)
-- On publisher
INSERT INTO t1 VALUES (1);

Step (4)
-- On subscriber
SELECT * FROM t1;

> BTW, have we tried to check if this problem exists in back-branches?
> It seems to me the problem has been recently introduced by commit
> 69bd60672a. I am telling this by looking at code and have yet not
> performed any testing so I could be wrong.

Yes you are right. Looks like the above commit is causing the issue. I
reverted that commit and did not see the drop publication bug, see use
case [1].

[1]
-- On publisher
CREATE TABLE t1 (a int);
INSERT INTO t1 VALUES (1);
CREATE PUBLICATION mypub1 FOR TABLE t1;

-- On subscriber
CREATE TABLE t1 (a int);
CREATE SUBSCRIPTION mysub1 CONNECTION 'host=localhost port=5432
dbname=postgres' PUBLICATION mypub1;

-- On publisher
INSERT INTO t1 VALUES (2);

-- On subscriber
SELECT * FROM t1;

-- On publisher
ALTER PUBLICATION mypub1 DROP TABLE t1;
INSERT INTO t1 VALUES (3);

-- On subscriber
SELECT * FROM t1;   --> we should not see value 3, if the alter
publication ... drop table worked correctly.

-- On publisher
INSERT INTO t1 SELECT * FROM generate_series(4, 88);

-- On subscriber
SELECT * FROM t1;

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



On Fri, Jan 22, 2021 at 2:59 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> > BTW, have we tried to check if this problem exists in back-branches?
> > It seems to me the problem has been recently introduced by commit
> > 69bd60672a. I am telling this by looking at code and have yet not
> > performed any testing so I could be wrong.
>
> Yes you are right. Looks like the above commit is causing the issue. I
> reverted that commit and did not see the drop publication bug, see use
> case [1].

I think, to be precise the issue is because of not setting pubactions
to false when if (!entry->replicate_valid) is true by the commit
69bd60672. In fact, it removed the correct check. Since found(denotes
whether the cache entry exists or not) will be true even after the
alter publication ... drop table, the cache entry->replicate_valid is
the right check which will be set to false in
rel_sync_cache_publication_cb. But this commit removed it.

And also the reason for not catching this bug during the testing of
69bd60672 commit's patch is that we don't have a test case for alter
publication ... drop table behaviour.

-       if (!found || !entry->replicate_valid)
+       if (!found)
+       {
+               /* immediately make a new entry valid enough to
satisfy callbacks */
+               entry->schema_sent = false;
+               entry->streamed_txns = NIL;
+               entry->replicate_valid = false;
+               entry->pubactions.pubinsert = entry->pubactions.pubupdate =
+                       entry->pubactions.pubdelete =
entry->pubactions.pubtruncate = false;
+               entry->publish_as_relid = InvalidOid;
+       }
+
+       /* Validate the entry */
+       if (!entry->replicate_valid)
        {
                List       *pubids = GetRelationPublications(relid);
                ListCell   *lc;
@@ -977,9 +987,6 @@ get_rel_sync_entry(PGOutputData *data, Oid relid)
                 * relcache considers all publications given relation
is in, but here
                 * we only need to consider ones that the subscriber requested.
                 */
-               entry->pubactions.pubinsert = entry->pubactions.pubupdate =
-                       entry->pubactions.pubdelete =
entry->pubactions.pubtruncate = false;
-

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



On Fri, Jan 22, 2021 at 2:59 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Fri, Jan 22, 2021 at 2:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> Yes you are right. Looks like the above commit is causing the issue. I
> reverted that commit and did not see the drop publication bug, see use
> case [1].
>

Okay, thanks for confirming the same. I am planning to push the
attached where I made changes in few comments, combined the code and
test patch, and made a few changes in the commit message. Let me know
if you have any suggestions?

-- 
With Regards,
Amit Kapila.

Attachment
On Sat, Jan 23, 2021 at 11:50 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Jan 22, 2021 at 2:59 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Fri, Jan 22, 2021 at 2:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > Yes you are right. Looks like the above commit is causing the issue. I
> > reverted that commit and did not see the drop publication bug, see use
> > case [1].
> >
>
> Okay, thanks for confirming the same. I am planning to push the
> attached where I made changes in few comments, combined the code and
> test patch, and made a few changes in the commit message. Let me know
> if you have any suggestions?

Thanks Amit. Looks like there were some typos in the test case
description, I corrected them.

+# Drop the table from publicaiton
+# Add the table to publicaiton again
+# Rerfresh publication after table is added to publication

Attaching v7 patch. I'm fine with it.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment
On Sat, Jan 23, 2021 at 1:44 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Sat, Jan 23, 2021 at 11:50 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Fri, Jan 22, 2021 at 2:59 PM Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > >
> > > On Fri, Jan 22, 2021 at 2:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > Yes you are right. Looks like the above commit is causing the issue. I
> > > reverted that commit and did not see the drop publication bug, see use
> > > case [1].
> > >
> >
> > Okay, thanks for confirming the same. I am planning to push the
> > attached where I made changes in few comments, combined the code and
> > test patch, and made a few changes in the commit message. Let me know
> > if you have any suggestions?
>
> Thanks Amit. Looks like there were some typos in the test case
> description, I corrected them.
>

Thanks, I have pushed the patch.

-- 
With Regards,
Amit Kapila.