At Thu, 19 May 2022 16:42:31 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in
> On Thu, May 19, 2022 at 3:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > This happens after "ALTER SUBSCRIPTION sub1 SET PUBLICATION pub9". The
> > probable theory is that ALTER SUBSCRIPTION will lead to restarting of
> > apply worker (which we can see in LOGS as well) and after the restart,
Yes.
> > the apply worker will use the existing slot and replication origin
> > corresponding to the subscription. Now, it is possible that before
> > restart the origin has not been updated and the WAL start location
> > points to a location prior to where PUBLICATION pub9 exists which can
> > lead to such an error. Once this error occurs, apply worker will never
> > be able to proceed and will always return the same error. Does this
> > make sense?
Wow. I didin't thought that line. That theory explains the silence and
makes sense even though I don't see LSN transistions that clearly
support it. I dimly remember a similar kind of problem..
> > Unless you or others see a different theory, this seems to be the
> > existing problem in logical replication which is manifested by this
> > test. If we just want to fix these test failures, we can create a new
> > subscription instead of altering the existing publication to point to
> > the new publication.
> >
>
> If the above theory is correct then I think allowing the publisher to
> catch up with "$node_publisher->wait_for_catchup('sub1');" before
> ALTER SUBSCRIPTION should fix this problem. Because if before ALTER
> both publisher and subscriber are in sync then the new publication
> should be visible to WALSender.
It looks right to me. That timetravel seems inintuitive but it's the
(current) way it works.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center