Re: Slow catchup of 2PC (twophase) transactions on replica in LR - Mailing list pgsql-hackers

From vignesh C
Subject Re: Slow catchup of 2PC (twophase) transactions on replica in LR
Date
Msg-id CALDaNm3YY+bzj+JWJbY+DsUgJ2mPk8OR1ttjVX2cywKr4BUgxw@mail.gmail.com
Whole thread Raw
In response to Re: Slow catchup of 2PC (twophase) transactions on replica in LR  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Slow catchup of 2PC (twophase) transactions on replica in LR
List pgsql-hackers
On Thu, 25 Jul 2024 at 08:39, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Jul 24, 2024 at 9:13 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > Amit Kapila <amit.kapila16@gmail.com> writes:
> > > I merged these changes, made a few other cosmetic changes, and pushed the patch.
> >
> > There is a CF entry pointing at this thread [1].  Should it be closed?
> >
>
> Yes, closed now. Thanks for the reminder.

I noticed one random test failure in my environment with 021_twophase test.
[10:37:01.131](0.053s) ok 24 - should be no prepared transactions on subscriber
error running SQL: 'psql:<stdin>:2: ERROR:  cannot alter two_phase
when logical replication worker is still running
HINT:  Try again after some time.'

We can reproduce the issue by adding a delay at apply_worker_exit like
in the attached Reproduce_random_021_twophase_test_failure.patch
patch.

This is happening because the check here is wrong:
+$node_subscriber->poll_query_until('postgres',
+   "SELECT count(*) = 0 FROM pg_stat_activity WHERE backend_type =
'logical replication worker'"

Here "logical replication worker" should be "logical replication apply worker".

Attached patch has the changes for the same.

Regards,
Vignesh

Attachment

pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: xid_wraparound tests intermittent failure.
Next
From: Dilip Kumar
Date:
Subject: Re: Conflict Detection and Resolution