Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop - Mailing list pgsql-bugs

From Amit Kapila
Subject Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
Date
Msg-id CAA4eK1L6_MUmOTBKdkhfuSykj4Sx2-_fTeT_NaR0pDyzaCdb+A@mail.gmail.com
Whole thread Raw
In response to Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Responses Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
List pgsql-bugs
On Sat, Nov 7, 2020 at 5:31 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> On 2020-Nov-05, Amit Kapila wrote:
>
> > On Wed, Nov 4, 2020 at 7:19 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > >
> > > On 2020-Nov-04, Amit Kapila wrote:
> > >
> > > > On Thu, Oct 15, 2020 at 8:20 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > >
> > > > > * STREAM COMMIT bug?
> > > > >   In apply_handle_stream_commit, we do CommitTransactionCommand, but
> > > > >   apparently in a tablesync worker we shouldn't do it.
> > > >
> > > > In the tablesync stage, we don't allow streaming. See pgoutput_startup
> > > > where we disable streaming for the init phase. As far as I understand,
> > > > for tablesync we create the initial slot during which streaming will
> > > > be disabled then we will copy the table (here logical decoding won't
> > > > be used) and then allow the apply worker to get any other data which
> > > > is inserted in the meantime. Now, I might be missing something here
> > > > but if you can explain it a bit more or share some test to show how we
> > > > can reach here via tablesync worker then we can discuss the possible
> > > > solution.
> > >
> > > Hmm, okay, that sounds like there would be no bug then.  Maybe what we
> > > need is just an assert in apply_handle_stream_commit that
> > > !am_tablesync_worker(), as in the attached patch.  Passes tests.
> > >
> >
> > +1. But do we want to have this Assert only in stream_commit API or
> > all stream APIs as well?
>
> Well, the only reason I care about this is that apply_handle_commit
> contains a comment that we must not do CommitTransactionCommand in the
> syncworker case; so if you look at apply_handle_stream_commit and note
> that it doesn't concern it about that, you become concerned that it
> might be broken.  I don't think the other routines handling the "stream"
> thing have that issue.
>

Fair enough, as mentioned in my previous email, I think we need to
confirm once that after copy how the decoding happens on upstream for
transactions during the phase where tablesync workers is moving to
state SUBREL_STATE_SYNCDONE from SUBREL_STATE_CATCHUP. I'll try to
come up (in next few days) with some test case to debug and test this
particular scenario and share my findings.

-- 
With Regards,
Amit Kapila.



pgsql-bugs by date:

Previous
From: Thomas Munro
Date:
Subject: Re: pg_dump error attempting to upgrade from PostgreSQL 10 to PostgreSQL 12
Next
From: Dilip Kumar
Date:
Subject: Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop