Re: long-standing data loss bug in initial sync of logical replication - Mailing list pgsql-hackers

From Shlok Kyal
Subject Re: long-standing data loss bug in initial sync of logical replication
Date
Msg-id CANhcyEUjZ3+0Badk7hZHctzAWwCX3fhNDV3hTjvPhBbexJOLqA@mail.gmail.com
Whole thread Raw
In response to Re: long-standing data loss bug in initial sync of logical replication  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: long-standing data loss bug in initial sync of logical replication
List pgsql-hackers
On Wed, 11 Dec 2024 at 12:37, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Tue, Oct 8, 2024 at 2:51 AM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
> >
> > On Wed, 31 Jul 2024 at 03:27, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Wed, Jul 24, 2024 at 9:53 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > On Wed, Jul 17, 2024 at 5:25 PM vignesh C <vignesh21@gmail.com> wrote:
> > > > >
> > > > > On Wed, 17 Jul 2024 at 11:54, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > >
> > > > > > On Tue, Jul 16, 2024 at 6:54 PM vignesh C <vignesh21@gmail.com> wrote:
> > > > > >
> > > > > > BTW, I noticed that we don't take any table-level locks for Create
> > > > > > Publication .. For ALL TABLES (and Drop Publication). Can that create
> > > > > > a similar problem? I haven't tested so not sure but even if there is a
> > > > > > problem for the Create case, it should lead to some ERROR like missing
> > > > > > publication.
> > > > >
> > > > > I tested these scenarios, and as you expected, it throws an error for
> > > > > the create publication case:
> > > > > 2024-07-17 14:50:01.145 IST [481526] 481526  ERROR:  could not receive
> > > > > data from WAL stream: ERROR:  publication "pub1" does not exist
> > > > >         CONTEXT:  slot "sub1", output plugin "pgoutput", in the change
> > > > > callback, associated LSN 0/1510CD8
> > > > > 2024-07-17 14:50:01.147 IST [481450] 481450  LOG:  background worker
> > > > > "logical replication apply worker" (PID 481526) exited with exit code
> > > > > 1
> > > > >
> > > > > The steps for this process are as follows:
> > > > > 1) Create tables in both the publisher and subscriber.
> > > > > 2) On the publisher: Create a replication slot.
> > > > > 3) On the subscriber: Create a subscription using the slot created by
> > > > > the publisher.
> > > > > 4) On the publisher:
> > > > > 4.a) Session 1: BEGIN; INSERT INTO T1;
> > > > > 4.b) Session 2: CREATE PUBLICATION FOR ALL TABLES
> > > > > 4.c) Session 1: COMMIT;
> > > > >
> > > > > Since we are throwing out a "publication does not exist" error, there
> > > > > is no inconsistency issue here.
> > > > >
> > > > > However, an issue persists with DROP ALL TABLES publication, where
> > > > > data continues to replicate even after the publication is dropped.
> > > > > This happens because the open transaction consumes the invalidation,
> > > > > causing the publications to be revalidated using old snapshot. As a
> > > > > result, both the open transactions and the subsequent transactions are
> > > > > getting replicated.
> > > > >
> > > > > We can reproduce this issue by following these steps in a logical
> > > > > replication setup with an "ALL TABLES" publication:
> > > > > On the publisher:
> > > > > Session 1: BEGIN; INSERT INTO T1 VALUES (val1);
> > > > > In another session on the publisher:
> > > > > Session 2: DROP PUBLICATION
> > > > > Back in Session 1 on the publisher:
> > > > > COMMIT;
> > > > > Finally, in Session 1 on the publisher:
> > > > > INSERT INTO T1 VALUES (val2);
> > > > >
> > > > > Even after dropping the publication, both val1 and val2 are still
> > > > > being replicated to the subscriber. This means that both the
> > > > > in-progress concurrent transaction and the subsequent transactions are
> > > > > being replicated.
> > > > >
> > > > > I don't think locking all tables is a viable solution in this case, as
> > > > > it would require asking the user to refrain from performing any
> > > > > operations on any of the tables in the database while creating a
> > > > > publication.
> > > > >
> > > >
> > > > Indeed, locking all tables in the database to prevent concurrent DMLs
> > > > for this scenario also looks odd to me. The other alternative
> > > > previously suggested by Andres is to distribute catalog modifying
> > > > transactions to all concurrent in-progress transactions [1] but as
> > > > mentioned this could add an overhead. One possibility to reduce
> > > > overhead is that we selectively distribute invalidations for
> > > > catalogs-related publications but I haven't analyzed the feasibility.
> > > >
> > > > We need more opinions to decide here, so let me summarize the problem
> > > > and solutions discussed. As explained with an example in an email [1],
> > > > the problem related to logical decoding is that it doesn't process
> > > > invalidations corresponding to DDLs for the already in-progress
> > > > transactions. We discussed preventing DMLs in the first place when
> > > > concurrent DDLs like ALTER PUBLICATION ... ADD TABLE ... are in
> > > > progress. The solution discussed was to acquire
> > > > ShareUpdateExclusiveLock for all the tables being added via such
> > > > commands. Further analysis revealed that the same handling is required
> > > > for ALTER PUBLICATION ... ADD TABLES IN SCHEMA which means locking all
> > > > the tables in the specified schemas. Then DROP PUBLICATION also seems
> > > > to have similar symptoms which means in the worst case (where
> > > > publication is for ALL TABLES) we have to lock all the tables in the
> > > > database. We are not sure if that is good so the other alternative we
> > > > can pursue is to distribute invalidations in logical decoding
> > > > infrastructure [1] which has its downsides.
> > > >
> > > > Thoughts?
> > >
> > > Thank you for summarizing the problem and solutions!
> > >
> > > I think it's worth trying the idea of distributing invalidation
> > > messages, and we will see if there could be overheads or any further
> > > obstacles. IIUC this approach would resolve another issue we discussed
> > > before too[1].
> > >
> > > Regards,
> > >
> > > [1] https://www.postgresql.org/message-id/CAD21AoAenVqiMjpN-PvGHL1N9DWnHSq673bfgr6phmBUzx=kLQ@mail.gmail.com
> > >
> >
> > Hi Sawada-san,
> >
> > I have tested the scenario shared by you on the thread [1]. And I
> > confirm that the latest patch [2] fixes this issue.
> >
>
> I confirmed that the proposed patch fixes these issues. I have one
> question about the patch:
>
> In the main loop in SnapBuildDistributeSnapshotAndInval(), we have the
> following code:
>
>        /*
>         * If we don't have a base snapshot yet, there are no changes in this
>         * transaction which in turn implies we don't yet need a snapshot at
>         * all. We'll add a snapshot when the first change gets queued.
>         *
>         * NB: This works correctly even for subtransactions because
>         * ReorderBufferAssignChild() takes care to transfer the base snapshot
>         * to the top-level transaction, and while iterating the changequeue
>         * we'll get the change from the subtxn.
>         */
>        if (!ReorderBufferXidHasBaseSnapshot(builder->reorder, txn->xid))
>            continue;
>
> Is there any case where we need to distribute inval messages to
> transactions that don't have the base snapshot yet but eventually need
> the inval messages?
>
> Overall, with this idea, we distribute invalidation messages to all
> concurrent decoded transactions. It could introduce performance
> regressions by several causes. For example, we could end up
> invalidating RelationSyncCache entries in more cases. While this is
> addressed by your selectively cache invalidation patch, there is still
> 5% regression. We might need to accept a certain amount of regressions
> for making it correct but it would be better to figure out where these
> regressions come from. Other than that, I think the performance
> regression could happen due to the costs of distributing invalidation
> messages. You've already observed there is 1~3% performance regression
> in cases where we distribute a large amount of invalidation messages
> to one concurrently decoded transaction[1]. I guess that the
> selectively cache invalidation idea would not help this case. Also, I
> think we might want to test other cases like where we distribute a
> small amount of invalidation messages to many concurrently decoded
> transactions.
>

Hi Sawada-san,

I have done the performance testing for cases where we distribute a
small amount of invalidation messages to many concurrently decoded
transactions.
Here are results:

Concurrent Txn |    Head (sec)    |    Patch (sec)      | Degradation in %
---------------------------------------------------------------------------------------------
50                      |   0.2627734      |   0.2654608        | 1.022706256
100                    |   0.4801048      |   0.4869254        | 1.420648158
500                    |   2.2170336      |   2.2438656        | 1.210265825
1000                  |   4.4957402      |   4.5282574        | 0.723289126
2000                  |   9.2013082      |   9.21164             | 0.112286207

The steps I followed is:
1. Initially logical replication is setup.
2. Then we start 'n' number of concurrent transactions.
    Each txn look like:
    BEGIN;
    Insert into t1 values(11);
3. Now we add two invalidation which will be distributed each
transaction by running command:
    ALTER PUBLICATION regress_pub1 DROP TABLE t1
    ALTER PUBLICATION regress_pub1 ADD TABLE t1
4. Then run an insert for each txn. It will build cache for relation
in each txn.
5. Commit Each transaction.

I have also attached the script.

Thanks and Regards,
Shlok Kyal

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Statistics Import and Export
Next
From: Tom Lane
Date:
Subject: Re: Postmaster crashed during start