Re: Add an option to skip loading missing publication to avoid logical replication failure - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Add an option to skip loading missing publication to avoid logical replication failure |
Date | |
Msg-id | CAA4eK1L02YLsBZjdEi0cxnkdj=vvPUerRZQ8Nx6TT=y24=9+Ng@mail.gmail.com Whole thread Raw |
In response to | Re: Add an option to skip loading missing publication to avoid logical replication failure (vignesh C <vignesh21@gmail.com>) |
List | pgsql-hackers |
On Tue, Mar 4, 2025 at 12:23 PM vignesh C <vignesh21@gmail.com> wrote: > > There is almost negligible dip with the above suggested way, the test > results for the same is given below(execution time is in milli > seconds): > Brach/records | 100 | 1000 | 10000 | 100000 | 1000000 > Head | 10.25 | 15.85 | 65.53 | 569.15 | 9194.19 > Patch | 10.25 | 15.84 | 65.91 | 571.75 | 9208.66 > % diff | 0.00 | 0.06 | -0.58 | -0.46 | -0.16 > > There is a performance dip in the range of 0 to 0.58 percent. > The attached patch has the changes for the same. The test script used > is also attached. > The patch still needs more review but the change has negligible performance impact. The next step is to get more opinions on whether we should add a new subscription option (say skip_not_existant_publication) for this work. See patch v1-0002-* in email [1]. The problem summary is explained in email [2] and in the commit message of the 0001 patch in this thread. But still, let me write briefly for the ease of others. The problem is that ALTER SUBSCRIPTION ... SET PUBLICATION ... will lead to restarting of apply worker, and after the restart, the apply worker will use the existing slot and replication origin corresponding to the subscription. Now, it is possible that before the restart, the origin has not been updated, and the WAL start location points to a location before where PUBLICATION pointed to by SET PUBLICATION exists. This leads to an error: "ERROR: publication "pub1" does not exist". Once this error occurs, apply worker will never be able to proceed and will always return the same error. For users, this is a problem because they would have created a publication before executing ALTER SUBSCRIPTION ... SET PUBLICATION .. and now they have no way to proceed. The solution we came up with is to skip loading the publication if the publication does not exist. We load the publication later and update the relation entry when the publication gets created. The two main concerns with this idea, as shared in email [3], are performance implications of this change and the possibility of current behaviour expectations from the users. We came up with a solution where the performance impact is negligible, as shown in the tests [4]. For that, we won't try to reload the skipped/missing publication for each change but will attempt it only when any new publication is created/dropped for a valid relation entry in RelationSyncCache (maintained by pgoutput). The new option skip_not_existant_publication is to address the second concern "Imagine you have a subscriber using two publications p1 and p2, and someone comes around and drops p1 by mistake. With the proposed patch, the subscription will notice this, but it'll continue sending data ignoring the missing publication. Yes, it will continue working, but it's quite possible this breaks the subscriber and it's be better to fail and stop replicating.". I see the point of adding such an option to avoid breaking the current applications (if there are any) that are relying on current behaviour. But OTOH, I am not sure if users expect us to fail explicitly in such scenarios. This is a long-standing behaviour for which we get reports from time to time, and once analyzing a failure, Tom also looked at it and agreed that we don't have much choice to avoid skipping non-existent publications [5]. But we never concluded as to whether skipping should be a default behavior or an optional one. So, we need more opinions on it. Thoughts? [1] - https://www.postgresql.org/message-id/CALDaNm0-n8FGAorM%2BbTxkzn%2BAOUyx5%3DL_XmnvOP6T24%2B-NcBKg%40mail.gmail.com [2] - https://www.postgresql.org/message-id/CAA4eK1Lc%3DNDV1HrY2gNasFK90MtysnA575a%2Brd0p%2BPOjXN%2BSpw%40mail.gmail.com [3] - https://www.postgresql.org/message-id/dc08add3-10a8-738b-983a-191c7406707b%40enterprisedb.com [4] - https://www.postgresql.org/message-id/CALDaNm2Xkm1M-ik2RLJZ9rMhW2zW2GRLL6ePyZJbXcAjOVwzXg%40mail.gmail.com [5] - https://www.postgresql.org/message-id/631312.1707251789%40sss.pgh.pa.us -- With Regards, Amit Kapila.
pgsql-hackers by date: