Re: Slow catchup of 2PC (twophase) transactions on replica in LR - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Slow catchup of 2PC (twophase) transactions on replica in LR
Date
Msg-id CAA4eK1LhYsNq_0XKspfWKx4oXZpXk_q-A1Oa-tbA-wqQUudhCQ@mail.gmail.com
Whole thread Raw
In response to RE: Slow catchup of 2PC (twophase) transactions on replica in LR  ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>)
Responses RE: Slow catchup of 2PC (twophase) transactions on replica in LR
List pgsql-hackers
On Thu, Jul 4, 2024 at 1:34 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> > >
> > > It succeeds if force_alter is also expressly set. Prepared transactions will be
> > > aborted at that time.
> > >
> > > ```
> > > subscriber=# ALTER SUBSCRIPTION sub SET (two_phase = off, force_alter =
> > on);
> > > ALTER SUBSCRIPTION
> > >
> >
> > Isn't it better to give a Notice when force_alter option leads to the
> > rollback of already prepared transactions?
>
> Indeed. I think this can be added for 0003. For now, it says like:
>
> ```
> postgres=# ALTER SUBSCRIPTION sub SET (TWO_PHASE = off, FORCE_ALTER = on);
> WARNING:  requested altering to two_phase = false but there are prepared transactions done by the subscription
> DETAIL:  Such transactions are being rollbacked.
> ALTER SUBSCRIPTION
>

Is it possible to get a NOTICE instead of a WARNING?

>
> > I have another question on the latest 0001 patch:
> > + /*
> > + * Stop all the subscription workers, just in case.
> > + * Workers may still survive even if the subscription is
> > + * disabled.
> > + */
> > + logicalrep_workers_stop(subid);
> >
> > In which case the workers will survive when the subscription is disabled?
>
> I think both normal and tablesync worker can survive, because ALTER SUBSCRIPTION
> DISABLE command does not send signal to workers. It just change the system catalog.
> logicalrep_workers_stop() is added to ensure all workers are stopped.
>
> Actually, earlier version (-v3) did not have a mechanism but they sometimes got
> assertion failures in maybe_reread_subscription(). This was because the survived
> workers read pg_subscription catalog and failed below assertion:
>
> ```
>         /* two-phase cannot be altered while the worker exists */
>         Assert(newsub->twophasestate == MySubscription->twophasestate);
> ```
>

But that is not a good reason for this operation to stop workers
first. Instead, we should prohibit this operation if any worker is
present. The reason is that there is always a chance that if any
worker is alive, it can prepare a new transaction after we have
checked for the presence of any prepared transactions.

Comments:
=========
1.
There is no need to do something remarkable regarding
+ * the "false" to "true" case; the backend process alters
+ * subtwophase <funny_char> to LOGICALREP_TWOPHASE_STATE_PENDING once.
+ * After the subscription is enabled, a new logical
+ * replication worker requests to change the two_phase
+ * option of its slot from pending to true when the
+ * initial data synchronization is done. The code path is
+ * the same as the case in which two_phase <funny_char> is initially
+ * set <funny_char> to true.

The patch has some funny characters in the above comment at the places
highlighted by me. It seems you have copied from some editor that has
inserted such characters.

2.
/*
* Do not allow toggling of two_phase option. Doing so could cause
* missing of transactions and lead to an inconsistent replica.
* See comments atop worker.c
*
* Note: Unsupported twophase indicates that this call originated
* from AlterSubscription.
*/
if (!IsSet(supported_opts, SUBOPT_TWOPHASE_COMMIT))
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("unrecognized subscription parameter: \"%s\"", defel->defname)));

This part of the code must either be removed or converted to an assert.

3. The tests added in 099_twophase_added.pl should be part of 021_twophase.pl

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Bertrand Drouvot
Date:
Subject: Re: Pluggable cumulative statistics
Next
From: Heikki Linnakangas
Date:
Subject: Re: Make query cancellation keys longer