Thread: BUG #17670: Logical Replication data may be lost on the subscription under certain scenarios

BUG #17670: Logical Replication data may be lost on the subscription under certain scenarios

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      17670
Logged by:          Yunhe Xu
Email address:      xyh@nvn.xyz
PostgreSQL version: 14.4
Operating system:   rhel 7
Description:

Logical Replication data may be lost on the subscription under certain
scenarios.
The following is review process.

* Logical Replication Information
t1=# select * from pg_publication_tables ;
  pubname   | schemaname |   tablename    
------------+------------+----------------
 pub1       | public     | t_test1
t2=# select srrelid::regclass,* from pg_subscription_rel ;
 srrelid | srsubid | srrelid | srsubstate | srsublsn  
---------+---------+---------+------------+-----------
 t_test1 |   57551 |   41170 | r          | 0/696E418
t1=# select application_name,state from pg_stat_replication where
application_name='test2_sub';
 application_name |   state   
------------------+-----------
 test2_sub        | streaming

* Verify the status is normal
t1=# insert into t_test1 values (1);
INSERT 0 1
t1=# select * from t_test1;
 id 
----
  1
t2=# select * from t_test1;
 id 
----
  1

* Then delete this table on subscription : 
t2=# alter table t_test1 rename TO t_test2;
ALTER TABLE

* Now,do DMLs
t1=# insert into t_test1 values (2);
INSERT 0 1
t1=# delete from t_test1 where id=1;
DELETE 1
t1=# 
* The log gives some errors
2022-10-28 15:14:07.919 CST,,,2600,,635b813f.a28,2,,2022-10-28 15:14:07
CST,4/12,0,ERROR,55000,"logical replication target relation
""public.t_test1"" does not exist",,,,,,,,,"","logical replication
worker",,0

* OK,Let me create this table
t2=# create table t_test1 (id int PRIMARY KEY);
CREATE TABLE

* At this point, the log is no longer showing errors.But the incremental
data is lost.
t2=# select * from t_test1;
 id 
----
(0 rows)

t1=# insert into t_test1 values (3);
INSERT 0 1
t1=# insert into t_test1 values (4);
INSERT 0 1

t2=# select * from t_test1;
 id 
----
(0 rows)
t2=# select * from t_test2 ;
 id 
----
  1
(1 row)

* Still no error.
* Now modify the previous table back

t2=# drop table t_test1 ;
DROP TABLE
t2=# alter table t_test2 rename TO t_test1;
ALTER TABLE

t1=# insert into t_test1 values (5);
INSERT 0 1
t1=# select * from t_test1;
 id 
----
  2
  3
  4
  5
(4 rows)

t2=# select * from t_test1;
 id 
----
  1
  5
(2 rows)

* The middle operation-delete id=1 and insert id=3,4-is loss.
*Please confirm if this is a BUG.

Thanks.
Yunhe Xu


On Fri, 28 Oct 2022 at 15:48, PG Bug reporting form <noreply@postgresql.org> wrote:
> The following bug has been logged on the website:
>
> Bug reference:      17670
> Logged by:          Yunhe Xu
> Email address:      xyh@nvn.xyz
> PostgreSQL version: 14.4
> Operating system:   rhel 7
> Description:        
>
> Logical Replication data may be lost on the subscription under certain
> scenarios.
> The following is review process.
>
> * Logical Replication Information
> t1=# select * from pg_publication_tables ;
>   pubname   | schemaname |   tablename    
> ------------+------------+----------------
>  pub1       | public     | t_test1
> t2=# select srrelid::regclass,* from pg_subscription_rel ;
>  srrelid | srsubid | srrelid | srsubstate | srsublsn  
> ---------+---------+---------+------------+-----------
>  t_test1 |   57551 |   41170 | r          | 0/696E418
> t1=# select application_name,state from pg_stat_replication where
> application_name='test2_sub';
>  application_name |   state   
> ------------------+-----------
>  test2_sub        | streaming
>
> * Verify the status is normal
> t1=# insert into t_test1 values (1);
> INSERT 0 1
> t1=# select * from t_test1;
>  id 
> ----
>   1
> t2=# select * from t_test1;
>  id 
> ----
>   1
>
> * Then delete this table on subscription : 
> t2=# alter table t_test1 rename TO t_test2;
> ALTER TABLE
>
> * Now,do DMLs
> t1=# insert into t_test1 values (2);
> INSERT 0 1
> t1=# delete from t_test1 where id=1;
> DELETE 1
> t1=# 
> * The log gives some errors
> 2022-10-28 15:14:07.919 CST,,,2600,,635b813f.a28,2,,2022-10-28 15:14:07
> CST,4/12,0,ERROR,55000,"logical replication target relation
> ""public.t_test1"" does not exist",,,,,,,,,"","logical replication
> worker",,0
>
> * OK,Let me create this table
> t2=# create table t_test1 (id int PRIMARY KEY);
> CREATE TABLE
>
> * At this point, the log is no longer showing errors.But the incremental
> data is lost.
> t2=# select * from t_test1;
>  id 
> ----
> (0 rows)
>
> t1=# insert into t_test1 values (3);
> INSERT 0 1
> t1=# insert into t_test1 values (4);
> INSERT 0 1
>
> t2=# select * from t_test1;
>  id 
> ----
> (0 rows)
> t2=# select * from t_test2 ;
>  id 
> ----
>   1
> (1 row)
>
> * Still no error.
> * Now modify the previous table back
>
> t2=# drop table t_test1 ;
> DROP TABLE
> t2=# alter table t_test2 rename TO t_test1;
> ALTER TABLE
>
> t1=# insert into t_test1 values (5);
> INSERT 0 1
> t1=# select * from t_test1;
>  id 
> ----
>   2
>   3
>   4
>   5
> (4 rows)
>
> t2=# select * from t_test1;
>  id 
> ----
>   1
>   5
> (2 rows)
>
> * The middle operation-delete id=1 and insert id=3,4-is loss.
> *Please confirm if this is a BUG.
>

I can reproduce it on HEAD.  Here is my analysis:

When we rename the t_test1 to t_test2, the subscriber doesn't have a
table matched publication table name, so the logical replication throw
an error.  Then, we create a new t_test1 on subscriber, the logical
replication worker can find the table that matches the published name.
However, the pg_subscription_rel hasn't updated, and when we try to get
the subscription state through GetSubscriptionRelState(), it cannot find
a matched subscription relation mapping, so the WAL cannot apply to the
new table t_test1.

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.



On Fri, Oct 28, 2022 at 8:07 PM Japin Li <japinli@hotmail.com> wrote:

> I can reproduce it on HEAD.  Here is my analysis:
>
> When we rename the t_test1 to t_test2, the subscriber doesn't have a
> table matched publication table name, so the logical replication throw
> an error.  Then, we create a new t_test1 on subscriber, the logical
> replication worker can find the table that matches the published name.
> However, the pg_subscription_rel hasn't updated, and when we try to get
> the subscription state through GetSubscriptionRelState(), it cannot find
> a matched subscription relation mapping, so the WAL cannot apply to the
> new table t_test1.

I am just wondering if it is correct behavior to allow renaming the
table used by a subscription, or should there be some dependency?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



On Sun, 30 Oct 2022 at 14:39, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> On Fri, Oct 28, 2022 at 8:07 PM Japin Li <japinli@hotmail.com> wrote:
>
>> I can reproduce it on HEAD.  Here is my analysis:
>>
>> When we rename the t_test1 to t_test2, the subscriber doesn't have a
>> table matched publication table name, so the logical replication throw
>> an error.  Then, we create a new t_test1 on subscriber, the logical
>> replication worker can find the table that matches the published name.
>> However, the pg_subscription_rel hasn't updated, and when we try to get
>> the subscription state through GetSubscriptionRelState(), it cannot find
>> a matched subscription relation mapping, so the WAL cannot apply to the
>> new table t_test1.
>
> I am just wondering if it is correct behavior to allow renaming the
> table used by a subscription, or should there be some dependency?

Maybe we can add a dependency to make the user know what they are doing.
I also want to know when we should add a dependency?

--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.



On Sun, Oct 30, 2022 at 7:22 PM Japin Li <japinli@hotmail.com> wrote:

> > I am just wondering if it is correct behavior to allow renaming the
> > table used by a subscription, or should there be some dependency?
>
> Maybe we can add a dependency to make the user know what they are doing.
> I also want to know when we should add a dependency?

I haven't thought about it in deep, but I think whenever we add a new
entry to pg_subscription_rel, that time maybe we could add a
dependency on the respective subscription entry IMHO.  But I am just
wondering is there any reason why we are not already having such
dependency?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



 On Fri, Oct 28, 2022 at 2:51 PM PG Bug reporting form
<noreply@postgresql.org> wrote:
>
> The following bug has been logged on the website:
>
> Bug reference:      17670
> Logged by:          Yunhe Xu
> Email address:      xyh@nvn.xyz
> PostgreSQL version: 14.4
> Operating system:   rhel 7
> Description:
>
> Logical Replication data may be lost on the subscription under certain
> scenarios.
> The following is review process.
>
> * Logical Replication Information
> t1=# select * from pg_publication_tables ;
>   pubname   | schemaname |   tablename
> ------------+------------+----------------
>  pub1       | public     | t_test1
> t2=# select srrelid::regclass,* from pg_subscription_rel ;
>  srrelid | srsubid | srrelid | srsubstate | srsublsn
> ---------+---------+---------+------------+-----------
>  t_test1 |   57551 |   41170 | r          | 0/696E418
> t1=# select application_name,state from pg_stat_replication where
> application_name='test2_sub';
>  application_name |   state
> ------------------+-----------
>  test2_sub        | streaming
>
> * Verify the status is normal
> t1=# insert into t_test1 values (1);
> INSERT 0 1
> t1=# select * from t_test1;
>  id
> ----
>   1
> t2=# select * from t_test1;
>  id
> ----
>   1
>
> * Then delete this table on subscription :
> t2=# alter table t_test1 rename TO t_test2;
> ALTER TABLE
>
> * Now,do DMLs
> t1=# insert into t_test1 values (2);
> INSERT 0 1
> t1=# delete from t_test1 where id=1;
> DELETE 1
> t1=#
> * The log gives some errors
> 2022-10-28 15:14:07.919 CST,,,2600,,635b813f.a28,2,,2022-10-28 15:14:07
> CST,4/12,0,ERROR,55000,"logical replication target relation
> ""public.t_test1"" does not exist",,,,,,,,,"","logical replication
> worker",,0
>
> * OK,Let me create this table
> t2=# create table t_test1 (id int PRIMARY KEY);
> CREATE TABLE
>
> * At this point, the log is no longer showing errors.But the incremental
> data is lost.
>

If you do "Alter Subscription test2_sub Refresh Publication;" then the
data should be synced.

-- 
With Regards,
Amit Kapila.



On Mon, Oct 31, 2022 at 2:24 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Sun, Oct 30, 2022 at 7:22 PM Japin Li <japinli@hotmail.com> wrote:
>
> > > I am just wondering if it is correct behavior to allow renaming the
> > > table used by a subscription, or should there be some dependency?
> >
> > Maybe we can add a dependency to make the user know what they are doing.
> > I also want to know when we should add a dependency?
>
> I haven't thought about it in deep, but I think whenever we add a new
> entry to pg_subscription_rel, that time maybe we could add a
> dependency on the respective subscription entry IMHO.  But I am just
> wondering is there any reason why we are not already having such
> dependency?
>

As per our current implementation, users need to be careful with DDL
operations on tables involved in subscriptions to ensure smooth
replication. In the example shared in this thread, say if the user
wants to rename tables on both publisher and subscriber then the case
reported would have worked without hassles, and disallowing Alter
Table would have unnecessarily made that operation difficult.
Similarly, say if the user wants to add a primary key on the table on
both publisher and subscriber, then it would be quite easy with the
current system but adding dependencies could be additional work for
the user and some people can complain about the same.

-- 
With Regards,
Amit Kapila.