Re: Optionally automatically disable logical replication subscriptions on error - Mailing list pgsql-hackers

From vignesh C
Subject Re: Optionally automatically disable logical replication subscriptions on error
Date
Msg-id CALDaNm1O8iBHSjfkv2-Ld6B7C4ay8F3-O1Q2QHVrM8H7Zu40iA@mail.gmail.com
Whole thread Raw
In response to RE: Optionally automatically disable logical replication subscriptions on error  ("osumi.takamichi@fujitsu.com" <osumi.takamichi@fujitsu.com>)
Responses RE: Optionally automatically disable logical replication subscriptions on error
List pgsql-hackers
On Mon, Dec 6, 2021 at 4:22 PM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:
>
> On Monday, December 6, 2021 1:16 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> > On Sat, Dec 4, 2021 at 12:20 AM osumi.takamichi@fujitsu.com
> > <osumi.takamichi@fujitsu.com> wrote:
> > >
> > > Hi, I've made a new patch v11 that incorporated suggestions described
> > above.
> > >
> >
> > Some review comments for the v11 patch:
> Thank you for your reviews !
>
> > doc/src/sgml/ref/create_subscription.sgml
> > (1) Possible wording improvement?
> >
> > BEFORE:
> > +  Specifies whether the subscription should be automatically disabled
> > + if replicating data from the publisher triggers errors. The default
> > + is <literal>false</literal>.
> > AFTER:
> > +  Specifies whether the subscription should be automatically disabled
> > + if any errors are detected by subscription workers during data
> > + replication from the publisher. The default is <literal>false</literal>.
> Fixed.
>
> > src/backend/replication/logical/worker.c
> > (2) WorkerErrorRecovery comments
> > Instead of:
> >
> > + * As a preparation for disabling the subscription, emit the error,
> > + * handle the transaction and clean up the memory context of
> > + * error. ErrorContext is reset by FlushErrorState.
> >
> > why not just say:
> >
> > + Worker error recovery processing, in preparation for disabling the
> > + subscription.
> >
> > And then comment the function's code lines:
> >
> > e.g.
> >
> > /* Emit the error */
> > ...
> > /* Abort any active transaction */
> > ...
> > /* Reset the ErrorContext */
> > ...
> Agreed. Fixed.
>
> > (3) DisableSubscriptionOnError return
> >
> > The "if (!subform->subdisableonerr)" block should probably first:
> >    heap_freetuple(tup);
> >
> > (regardless of the fact the only current caller will proc_exit anyway)
> Fixed.
>
> > (4) did_error flag
> >
> > I think perhaps the previously-used flag name "disable_subscription"
> > is better, or maybe "error_recovery_done".
> > Also, I think it would look better if it was set AFTER
> > WorkerErrorRecovery() was called.
> Adopted error_recovery_done
> and changed its places accordingly.
>
> > (5) DisableSubscriptionOnError LOG message
> >
> > This version of the patch removes the LOG message:
> >
> > + ereport(LOG,
> > + errmsg("logical replication subscription \"%s\" will be disabled due
> > to error: %s",
> > +    MySubscription->name, edata->message));
> >
> > Perhaps a similar error message could be logged prior to EmitErrorReport()?
> >
> > e.g.
> >  "logical replication subscription \"%s\" will be disabled due to an error"
> Added.
>
> I've attached the new version v12.

Thanks for the updated patch, few comments:
1) This is not required as it is not used in the caller.
+++ b/src/backend/replication/logical/launcher.c
@@ -132,6 +132,7 @@ get_subscription_list(void)
                sub->dbid = subform->subdbid;
                sub->owner = subform->subowner;
                sub->enabled = subform->subenabled;
+               sub->disableonerr = subform->subdisableonerr;
                sub->name = pstrdup(NameStr(subform->subname));
                /* We don't fill fields we are not interested in. */

2) Should this be changed:
+       <structfield>subdisableonerr</structfield> <type>bool</type>
+      </para>
+      <para>
+       If true, the subscription will be disabled when subscription
+       worker detects any errors
+      </para></entry>
+     </row>
To:
+       <structfield>subdisableonerr</structfield> <type>bool</type>
+      </para>
+      <para>
+       If true, the subscription will be disabled when subscription's
+       worker detects any errors
+      </para></entry>
+     </row>

3) The last line can be slightly adjusted to keep within 80 chars:
+          Specifies whether the subscription should be automatically disabled
+          if any errors are detected by subscription workers during data
+          replication from the publisher. The default is
<literal>false</literal>.

4) Similarly this too can be handled:
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1259,7 +1259,7 @@ REVOKE ALL ON pg_replication_origin_status FROM public;
 -- All columns of pg_subscription except subconninfo are publicly readable.
 REVOKE ALL ON pg_subscription FROM public;
 GRANT SELECT (oid, subdbid, subname, subowner, subenabled, subbinary,
-              substream, subtwophasestate, subslotname,
subsynccommit, subpublications)
+              substream, subtwophasestate, subdisableonerr,
subslotname, subsynccommit, subpublications)
     ON pg_subscription TO public;

5) Since disabling subscription code is common in and else, can we
move it below:
+                       if (MySubscription->disableonerr)
+                       {
+                               WorkerErrorRecovery();
+                               error_recovery_done = true;
+                       }
+                       else
+                       {
+                               /*
+                                * Some work in error recovery work is
done. Switch to the old
+                                * memory context and rethrow.
+                                */
+                               MemoryContextSwitchTo(ecxt);
+                               PG_RE_THROW();
+                       }
+               }
+               else
+               {
+                       /*
+                        * Don't miss any error, even when it's not
reported to stats
+                        * collector.
+                        */
+                       if (MySubscription->disableonerr)
+                       {
+                               WorkerErrorRecovery();
+                               error_recovery_done = true;
+                       }
+                       else
+                               /* Simply rethrow because of no recovery work */
+                               PG_RE_THROW();
+               }

6) Can we move LockSharedObject below the if condition.
+       subform = (Form_pg_subscription) GETSTRUCT(tup);
+       LockSharedObject(SubscriptionRelationId, subform->oid, 0,
AccessExclusiveLock);
+
+       /*
+        * We would not be here unless this subscription's
disableonerr field was
+        * true when our worker began applying changes, but check whether that
+        * field has changed in the interim.
+        */
+       if (!subform->subdisableonerr)
+       {
+               /*
+                * Disabling the subscription has been done already. No need of
+                * additional work.
+                */
+               heap_freetuple(tup);
+               table_close(rel, RowExclusiveLock);
+               CommitTransactionCommand();
+               return;
+       }
+

Regards,
Vignesh



pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: Re: row filtering for logical replication
Next
From: Amit Kapila
Date:
Subject: Re: Assertion failure with replication origins and PREPARE TRANSACTIOn