Re: Build-farm - intermittent error in 031_column_list.pl - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: Build-farm - intermittent error in 031_column_list.pl
Date
Msg-id 20220520.102819.126748780727079715.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: Build-farm - intermittent error in 031_column_list.pl  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Build-farm - intermittent error in 031_column_list.pl
List pgsql-hackers
At Thu, 19 May 2022 16:42:31 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in 
> On Thu, May 19, 2022 at 3:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > This happens after "ALTER SUBSCRIPTION sub1 SET PUBLICATION pub9". The
> > probable theory is that ALTER SUBSCRIPTION will lead to restarting of
> > apply worker (which we can see in LOGS as well) and after the restart,

Yes.

> > the apply worker will use the existing slot and replication origin
> > corresponding to the subscription. Now, it is possible that before
> > restart the origin has not been updated and the WAL start location
> > points to a location prior to where PUBLICATION pub9 exists which can
> > lead to such an error. Once this error occurs, apply worker will never
> > be able to proceed and will always return the same error. Does this
> > make sense?

Wow. I didin't thought that line. That theory explains the silence and
makes sense even though I don't see LSN transistions that clearly
support it.  I dimly remember a similar kind of problem..

> > Unless you or others see a different theory, this seems to be the
> > existing problem in logical replication which is manifested by this
> > test. If we just want to fix these test failures, we can create a new
> > subscription instead of altering the existing publication to point to
> > the new publication.
> >
> 
> If the above theory is correct then I think allowing the publisher to
> catch up with "$node_publisher->wait_for_catchup('sub1');" before
> ALTER SUBSCRIPTION should fix this problem. Because if before ALTER
> both publisher and subscriber are in sync then the new publication
> should be visible to WALSender.

It looks right to me.  That timetravel seems inintuitive but it's the
(current) way it works.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: Re: Skipping schema changes in publication
Next
From: Tom Lane
Date:
Subject: Re: 15beta1 test failure on mips in isolation/expected/stats