Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop - Mailing list pgsql-bugs

From Amit Kapila
Subject Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop
Date
Msg-id CAA4eK1LjXR+0zor0raDcp-0=9u_nkT5DZCyx7zudYKSEBehrLA@mail.gmail.com
Whole thread Raw
In response to Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-bugs
On Thu, Nov 5, 2020 at 9:13 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Nov 4, 2020 at 7:19 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> >
> > On 2020-Nov-04, Amit Kapila wrote:
> >
> > > On Thu, Oct 15, 2020 at 8:20 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> >
> > > > * STREAM COMMIT bug?
> > > >   In apply_handle_stream_commit, we do CommitTransactionCommand, but
> > > >   apparently in a tablesync worker we shouldn't do it.
> > >
> > > In the tablesync stage, we don't allow streaming. See pgoutput_startup
> > > where we disable streaming for the init phase. As far as I understand,
> > > for tablesync we create the initial slot during which streaming will
> > > be disabled then we will copy the table (here logical decoding won't
> > > be used) and then allow the apply worker to get any other data which
> > > is inserted in the meantime. Now, I might be missing something here
> > > but if you can explain it a bit more or share some test to show how we
> > > can reach here via tablesync worker then we can discuss the possible
> > > solution.
> >
> > Hmm, okay, that sounds like there would be no bug then.  Maybe what we
> > need is just an assert in apply_handle_stream_commit that
> > !am_tablesync_worker(), as in the attached patch.  Passes tests.
> >
>
> +1. But do we want to have this Assert only in stream_commit API or
> all stream APIs as well?
>

One more point to look here is at what point does the tablesync worker
is involved in applying decode transactions if any?  Basically, I
would like to ensure that if it uses the slot it has initially created
(before copy) then it is probably fine because we don't enable
streaming with it during the initial phase but if it uses the slot to
decode xacts after copy then we need to probably once check if it is
still true that streaming is not enabled at that point. I am not
completely sure if there are existing test cases to cover any such
scenarios so probably thinking a bit more about this might be helpful.


-- 
With Regards,
Amit Kapila.



pgsql-bugs by date:

Previous
From: PG Bug reporting form
Date:
Subject: BUG #16701: PostGreSQL Error : could not open relation with OID 2610
Next
From: PG Bug reporting form
Date:
Subject: BUG #16702: inline code and function : when use dynamic name for rowtype, there is some bug!