Thread: Add an option to skip loading missing publication to avoid logical replication failure
Add an option to skip loading missing publication to avoid logical replication failure
From
vignesh C
Date:
Hi, Currently ALTER SUBSCRIPTION ... SET PUBLICATION will break the logical replication in certain cases. This can happen as the apply worker will get restarted after SET PUBLICATION, the apply worker will use the existing slot and replication origin corresponding to the subscription. Now, it is possible that before restart the origin has not been updated and the WAL start location points to a location prior to where PUBLICATION pub exists which can lead to such an error. Once this error occurs, apply worker will never be able to proceed and will always return the same error. There was discussion on this and Amit had posted a patch to handle this at [2]. Amit's patch does continue using a historic snapshot but ignores publications that are not found for the purpose of computing RelSyncEntry attributes. We won't mark such an entry as valid till all the publications are loaded without anything missing. This means we won't publish operations on tables corresponding to that publication till we found such a publication and that seems okay. I have added an option skip_not_exist_publication to enable this operation only when skip_not_exist_publication is specified as true. There is no change in default behavior when skip_not_exist_publication is specified as false. But one thing to note with the patch (with skip_not_exist_publication option) is that replication of few WAL entries will be skipped till the publication is loaded like in the below example: -- Create table in publisher and subscriber create table t1(c1 int); create table t2(c1 int); -- Create publications create publication pub1 for table t1; create publication pub2 for table t2; -- Create subscription create subscription test1 connection 'dbname=postgres host=localhost port=5432' publication pub1, pub2; -- Drop one publication drop publication pub1; -- Insert in the publisher insert into t1 values(11); insert into t2 values(21); -- Select in subscriber postgres=# select * from t1; c1 ---- (0 rows) postgres=# select * from t2; c1 ---- 21 (1 row) -- Create the dropped publication in publisher create publication pub1 for table t1; -- Insert in the publisher insert into t1 values(12); postgres=# select * from t1; c1 ---- 11 12 (2 rows) -- Select data in subscriber postgres=# select * from t1; -- record with value 11 will be missing in subscriber c1 ---- 12 (1 row) Thoughts? [1] - https://www.postgresql.org/message-id/CAA4eK1%2BT-ETXeRM4DHWzGxBpKafLCp__5bPA_QZfFQp7-0wj4Q%40mail.gmail.com Regards, Vignesh
Attachment
Re: Add an option to skip loading missing publication to avoid logical replication failure
From
vignesh C
Date:
On Mon, 19 Feb 2024 at 12:48, vignesh C <vignesh21@gmail.com> wrote: > > Hi, > > Currently ALTER SUBSCRIPTION ... SET PUBLICATION will break the > logical replication in certain cases. This can happen as the apply > worker will get restarted after SET PUBLICATION, the apply worker will > use the existing slot and replication origin corresponding to the > subscription. Now, it is possible that before restart the origin has > not been updated and the WAL start location points to a location prior > to where PUBLICATION pub exists which can lead to such an error. Once > this error occurs, apply worker will never be able to proceed and will > always return the same error. > > There was discussion on this and Amit had posted a patch to handle > this at [2]. Amit's patch does continue using a historic snapshot but > ignores publications that are not found for the purpose of computing > RelSyncEntry attributes. We won't mark such an entry as valid till all > the publications are loaded without anything missing. This means we > won't publish operations on tables corresponding to that publication > till we found such a publication and that seems okay. > I have added an option skip_not_exist_publication to enable this > operation only when skip_not_exist_publication is specified as true. > There is no change in default behavior when skip_not_exist_publication > is specified as false. I have updated the patch to now include changes for pg_dump, added few tests, describe changes and added documentation changes. The attached v2 version patch has the changes for the same. Regards, Vignesh
Attachment
Re: Add an option to skip loading missing publication to avoid logical replication failure
From
Amit Kapila
Date:
On Mon, Feb 19, 2024 at 12:49 PM vignesh C <vignesh21@gmail.com> wrote: > > Currently ALTER SUBSCRIPTION ... SET PUBLICATION will break the > logical replication in certain cases. This can happen as the apply > worker will get restarted after SET PUBLICATION, the apply worker will > use the existing slot and replication origin corresponding to the > subscription. Now, it is possible that before restart the origin has > not been updated and the WAL start location points to a location prior > to where PUBLICATION pub exists which can lead to such an error. Once > this error occurs, apply worker will never be able to proceed and will > always return the same error. > > There was discussion on this and Amit had posted a patch to handle > this at [2]. Amit's patch does continue using a historic snapshot but > ignores publications that are not found for the purpose of computing > RelSyncEntry attributes. We won't mark such an entry as valid till all > the publications are loaded without anything missing. This means we > won't publish operations on tables corresponding to that publication > till we found such a publication and that seems okay. > I have added an option skip_not_exist_publication to enable this > operation only when skip_not_exist_publication is specified as true. > There is no change in default behavior when skip_not_exist_publication > is specified as false. > Did you try to measure the performance impact of this change? We can try a few cases where DDL and DMLs are involved, missing publication (drop publication and recreate after a varying number of records to check the impact). The other names for the option could be: skip_notexistant_publications, or ignore_nonexistant_publications. Can we think of any others? -- With Regards, Amit Kapila.
Re: Add an option to skip loading missing publication to avoid logical replication failure
From
Amit Kapila
Date:
On Tue, Feb 18, 2025 at 2:24 PM vignesh C <vignesh21@gmail.com> wrote: > > On Fri, 14 Feb 2025 at 15:36, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > Did you try to measure the performance impact of this change? We can > > try a few cases where DDL and DMLs are involved, missing publication > > (drop publication and recreate after a varying number of records to > > check the impact). > > Since we don't have an exact scenario to compare with the patch > (because, in the current HEAD, when the publication is missing, an > error is thrown and the walsender/worker restarts), I compared the > positive case, where records are successfully replicated to the > subscriber, as shown below. For the scenario with the patch, I ran the > same test, where the publication is dropped before the insert, > allowing the walsender to check whether the publication is present. > The test results, which represent the median of 7 runs and the > execution run is in milliseconds, are provided below: > > Brach/records | 100 | 1000 | 10000 | 100000 | 1000000 > Head | 1.214 | 2.548 | 10.823 | 90.3 | 951.833 > Patch | 1.215 | 2.5485 | 10.8545 | 90.94 | 955.134 > % diff | 0.082 | 0.020 | 0.291 | 0.704 | 0.347 > > I noticed that the test run with patches is very negligible. The > scripts used for execution are attached. > You have used the synchronous_standby_name to evaluate the performance which covers other parts of replication than the logical decoding. It would be better to test using pg_recvlogical. -- With Regards, Amit Kapila.