BUG #18267: Logical replication bug: data is not synchronized after Alter Publication. - Mailing list pgsql-bugs

From PG Bug reporting form
Subject BUG #18267: Logical replication bug: data is not synchronized after Alter Publication.
Date
Msg-id 18267-a1680726adf7c85d@postgresql.org
Whole thread Raw
Responses Re: BUG #18267: Logical replication bug: data is not synchronized after Alter Publication.  (Amit Kapila <amit.kapila16@gmail.com>)
Re: BUG #18267: Logical replication bug: data is not synchronized after Alter Publication.  (Dilip Kumar <dilipbalaut@gmail.com>)
RE: BUG #18267: Logical replication bug: data is not synchronized after Alter Publication.  ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>)
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      18267
Logged by:          song yutao
Email address:      sytoptimisprime@163.com
PostgreSQL version: 15.5
Operating system:   Linux
Description:

Hi hackers, I found when insert plenty of data into a table, and add the
table to publication (through Alter Publication) meanwhile, it's likely that
the incremental data cannot be synchronized to the subscriber. Here is my
test method:

1. On publisher and subscriber, create table for test:
CREATE TABLE tab_1 (a int);

2. Setup logical replication:
on publisher:  
     SELECT pg_create_logical_replication_slot('slot1', 'pgoutput', false,
false); 
                       CREATE PUBLICATION tap_pub;
on subscriber:  
     CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr' PUBLICATION
tap_pub WITH (enabled = true, create_slot = false, slot_name='slot1')

3. Perform Insert:     
     for (my $i = 1; $i <= 1000; $i++) {
         $node_publisher->safe_psql('postgres', "INSERT INTO tab_1 SELECT
generate_series(1, 1000)");
     }
     Each transaction contains 1000 insertion, and 1000 transactions are in
total.

4. When performing step 3, add table tab_1  to publication.
     ALTER PUBLICATION tap_pub ADD TABLE tab_1
     ALTER SUBSCRIPTION tap_sub REFRESH PUBLICATION

The root cause of the problem is as follows:
pgoutput relies on the invalidation mechanism to validate publications. When
walsender decoding an Alter Publication transaction, catalog caches are
invalidated at once. Furthermore, since pg_publication_rel is modified,
snapshot changes are added to all transactions currently being decoded. For
other transactions, catalog caches have been invalidated. However, it is
likely that the snapshot changes have not yet been decoded. In pgoutput
implementation, these transactions query the system table pg_publication_rel
to determine whether to publish changes made in transactions. In this case,
catalog tuples are not found because snapshot has not been updated. As a
result, changes in transactions are considered not to be published, and
subsequent data cannot be synchronized.

I think it's necessary to add invalidations to other transactions after
adding a snapshot change to them.
Therefore, I submitted a patch for this bug.


pgsql-bugs by date:

Previous
From: Richard Guo
Date:
Subject: Re: Postgres 16.1 - Bug: cache entry already complete
Next
From: Oscar van Baten
Date:
Subject: Bugreport: incorrect result in 16.1 in query with string_agg(distinct+json_build_object