Re: BUG #18267: Logical replication bug: data is not synchronized after Alter Publication. - Mailing list pgsql-bugs

From Amit Kapila
Subject Re: BUG #18267: Logical replication bug: data is not synchronized after Alter Publication.
Date
Msg-id CAA4eK1LDhzMy69s-ZaAMMenZNHsdzCfgOEN=VY9enGFRPu10Xg@mail.gmail.com
Whole thread Raw
In response to BUG #18267: Logical replication bug: data is not synchronized after Alter Publication.  (PG Bug reporting form <noreply@postgresql.org>)
List pgsql-bugs
On Wed, Jan 3, 2024 at 9:51 PM PG Bug reporting form
<noreply@postgresql.org> wrote:
>
> The following bug has been logged on the website:
>
> Bug reference:      18267
> Logged by:          song yutao
> Email address:      sytoptimisprime@163.com
> PostgreSQL version: 15.5
> Operating system:   Linux
> Description:
>
> Hi hackers, I found when insert plenty of data into a table, and add the
> table to publication (through Alter Publication) meanwhile, it's likely that
> the incremental data cannot be synchronized to the subscriber. Here is my
> test method:
>
> 1. On publisher and subscriber, create table for test:
> CREATE TABLE tab_1 (a int);
>
> 2. Setup logical replication:
> on publisher:
>      SELECT pg_create_logical_replication_slot('slot1', 'pgoutput', false,
> false);
>                        CREATE PUBLICATION tap_pub;
> on subscriber:
>      CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr' PUBLICATION
> tap_pub WITH (enabled = true, create_slot = false, slot_name='slot1')
>
> 3. Perform Insert:
>      for (my $i = 1; $i <= 1000; $i++) {
>          $node_publisher->safe_psql('postgres', "INSERT INTO tab_1 SELECT
> generate_series(1, 1000)");
>      }
>      Each transaction contains 1000 insertion, and 1000 transactions are in
> total.
>
> 4. When performing step 3, add table tab_1  to publication.
>      ALTER PUBLICATION tap_pub ADD TABLE tab_1
>      ALTER SUBSCRIPTION tap_sub REFRESH PUBLICATION
>
> The root cause of the problem is as follows:
> pgoutput relies on the invalidation mechanism to validate publications. When
> walsender decoding an Alter Publication transaction, catalog caches are
> invalidated at once. Furthermore, since pg_publication_rel is modified,
> snapshot changes are added to all transactions currently being decoded. For
> other transactions, catalog caches have been invalidated. However, it is
> likely that the snapshot changes have not yet been decoded. In pgoutput
> implementation, these transactions query the system table pg_publication_rel
> to determine whether to publish changes made in transactions. In this case,
> catalog tuples are not found because snapshot has not been updated. As a
> result, changes in transactions are considered not to be published, and
> subsequent data cannot be synchronized.
>

As per my understanding, we distribute snapshot to other transactions
at commit time (LSN) which means in your case at the time of commit
for "ALTER PUBLICATION tap_pub ADD TABLE tab_1". So any changes after
that should see the changes in pg_publication_rel.

> I think it's necessary to add invalidations to other transactions after
> adding a snapshot change to them.
> Therefore, I submitted a patch for this bug.
>

Sorry, I didn't understand your proposal and I don't see any patch
attached as you are claiming in the last sentence.

--
With Regards,
Amit Kapila.



pgsql-bugs by date:

Previous
From: Michael Paquier
Date:
Subject: Re: BUG #18259: Assertion in ExtendBufferedRelLocal() fails after no-space-left condition
Next
From: David Rowley
Date:
Subject: Re: BUG #18264: Table has type text, but query expects integer.attribute 1 of type record has wrong type