Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop - Mailing list pgsql-bugs

From Amit Kapila
Subject Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
Date
Msg-id CAA4eK1KhU1eCrOBCBMUXOEspzd0Jza7+jEuNyG-h28kzvvruHQ@mail.gmail.com
Whole thread Raw
In response to Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop  (Dilip Kumar <dilipbalaut@gmail.com>)
Responses Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
List pgsql-bugs
On Mon, Nov 23, 2020 at 10:51 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Sat, Nov 21, 2020 at 12:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > 2.
> > @@ -902,7 +906,9 @@ apply_handle_stream_abort(StringInfo s)
> >   {
> >   /* Cleanup the subxact info */
> >   cleanup_subxact_info();
> > - CommitTransactionCommand();
> > +
> > + if (!am_tablesync_worker())
> > + CommitTransactionCommand();
> >
> > Here, also you can add a comment: "/* The synchronization worker runs
> > in single transaction. */"
> >
>
> Done
>

Okay, thanks. I have slightly changed the comments and moved the newly
added function in the attached patch. I have tested the reported
scenario and additionally verified that the fix is good even if the
tablesync worker processed the partial transaction due to streaming.
This won't do any harm because later apply worker will replay the
entire transaction. This could be a problem if the apply worker also
tries to stream the transaction between the SUBREL_STATE_CATCHUP and
SUBREL_STATE_SYNCDONE state because then apply worker might have
skipped applying the partial transactions processed by tablesync
worker. But, I have checked that the apply worker waits for sync
worker to complete its processing between these two states. See
process_syncing_tables_for_apply. Does this make sense?

Peter, can you also please once test the attached and see if this
fixes the problem for you as well?

-- 
With Regards,
Amit Kapila.

Attachment

pgsql-bugs by date:

Previous
From: PG Bug reporting form
Date:
Subject: BUG #16739: Temporary files not deleting from data folder on disk
Next
From: Dilip Kumar
Date:
Subject: Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop