Re: Optionally automatically disable logical replication subscriptions on error - Mailing list pgsql-hackers
From | Mark Dilger |
---|---|
Subject | Re: Optionally automatically disable logical replication subscriptions on error |
Date | |
Msg-id | 38706E50-9190-4623-9FD7-4E9E3DD3F81A@enterprisedb.com Whole thread Raw |
In response to | Re: Optionally automatically disable logical replication subscriptions on error (Peter Smith <smithpb2250@gmail.com>) |
List | pgsql-hackers |
> On Jun 17, 2021, at 11:34 PM, Peter Smith <smithpb2250@gmail.com> wrote: > > I tried your patch. Thanks for the quick and thorough review! > (2) New column "disabled_by_error". > > I wondered if there was actually any need for this column. Isn't the > same information conveyed by just having "subenabled" = false, at same > time as as non-empty "suberrmsg"? This would remove any confusion for > having 2 booleans which both indicate disabled. Yeah, I wondered about that before posting v1. I removed the disabled_by_error field for v2. > (3) New columns "disabled_by_error", "disabled_on_error". > > All other columns of the pg_subscription have a "sub" prefix. I don't feel strongly about this. How about "subdisableonerr"? I used that in v2. > I did not find any code using that newly added member "errhint". Thanks for catching that. I had tried to remove all references to "errhint" before posting v1. The original idea was thatboth the message and hint of the error would be kept, but in testing I found the hint field was typically empty, so Iremoved it. Sorry that I left one mention of it lying around. > (5) dump.c I didn't bother getting pg_dump working before posting v1, and I still have not done so, as I mainly want to solicit feedbackon whether the basic direction I am going will work for the community. > (6) Patch only handles errors only from the Apply worker. > > Tablesync can give similar errors (e.g. PK violation during DATASYNC > phase) which will trigger re-launch forever regardless of the setting > of "disabled_on_error". > (confirmed by observations below) Yes, this is a good point, and also mentioned by Amit. I have fixed it in v2 and adjusted the regression test to triggeran automatic disabling for initial table sync as well as for change replication. > 2021-06-18 15:12:45.905 AEST [25904] LOG: edata is true for > subscription 'tap_sub': message = "duplicate key value violates unique > constraint "test_tab_pkey"", hint = "<NONE>" You didn't call this out, but FYI, I don't intend to leave this particular log message in the patch. It was for developmentonly. I have removed it for v2 and have added a different log message much sooner after catching the error, toavoid squashing the error in case some other action fails. The regression test shows this, if you open tmp_check/log/022_disable_on_error_subscriber.log: 2021-06-18 16:25:20.138 PDT [56926] LOG: logical replication subscription "s1" will be disabled due to error: duplicatekey value violates unique constraint "s1_tbl_unique" 2021-06-18 16:25:20.139 PDT [56926] ERROR: duplicate key value violates unique constraint "s1_tbl_unique" 2021-06-18 16:25:20.139 PDT [56926] DETAIL: Key (i)=(1) already exists. 2021-06-18 16:25:20.139 PDT [56926] CONTEXT: COPY tbl, line 2 The first line logs the error prior to attempting to disable the subscription, and the next three lines are due to rethrowingthe error after committing the successful disabling of the subscription. If the attempt to disable the subscriptionitself throws, these additional three lines won't show up, but the first one should. Amit mentioned this upthread. Do you think this will be ok, or would you like to also have a suberrdetail field so that the detail doesn't getlost? I haven't added such an extra field, and am inclined to think it would be excessive, but maybe others feel differently? > ====== > > Test: Cause a PK violation in the Tablesync copy (DATASYNC) phase. > (when disable_on_error = true) > Observation: This patch changes nothing for this case. The Tablesyn > re-launchs in a forever loop the same as current functionality. In v2, tablesync copy errors should also be caught. The test has been extended to cover this also. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
pgsql-hackers by date: