Re: CF Bot failure in wait_for_subscription_sync() - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: CF Bot failure in wait_for_subscription_sync()
Date
Msg-id CAA4eK1LRouxE3sv=u_u_9doDPzV9arMyJPNvF-7ynJZiQXg1tg@mail.gmail.com
Whole thread Raw
In response to RE: CF Bot failure in wait_for_subscription_sync()  ("houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com>)
List pgsql-hackers
On Tue, Oct 18, 2022 at 2:57 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
>
> On Tuesday, October 18, 2022 2:16 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > Hi,
> >
> > I have seen 2 patches registered in CF failing on Linux - Debian Bullseye in
> > wait_for_subscription_sync(). It seems like the tables aren't being synced. I
> > have not done any further analysis. I'm not sure if this issue is being discussed
> > elsewhere.
> >
> > # Postmaster PID for node "twoways" is 50208 Waiting for all subscriptions in
> > "twoways" to synchronize data
> > [14:12:43.092](198.391s) # poll_query_until timed out executing this query:
> > # SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT IN ('r',
> > 's'); # expecting this output:
> > # t
> > # last actual query output:
> > # f
> > # with stderr:
> > timed out waiting for subscriber to synchronize data at t/100_bugs.pl line 147.
> >
> > https://api.cirrus-ci.com/v1/artifact/task/6618623857917952/log/src/test/sub
> > scription/tmp_check/log/regress_log_100_bugs
> > https://cirrus-ci.com/task/6618623857917952
> > https://cirrus-ci.com/task/5764058174455808
>
> Thanks for reporting this. I am not sure about the root cause but just share
> some initial analysis here.
>
> This testcase waits for table sync to finish for both table "t" and table "t2".
> But from the log, I can only see the log[1] related to the table sync of table
> "t". So it seems that the table sync worker for table "t2" was never started
> due to some reason.
>

Yeah, the reason is not clear to me either. Let me state my
understanding of the issue. IIUC, the following test in 100_bugs.pl
has failed:
$node_twoways->safe_psql('d1', 'ALTER PUBLICATION testpub ADD TABLE t2');
$node_twoways->safe_psql('d2',
'ALTER SUBSCRIPTION testsub REFRESH PUBLICATION');
...
...
$node_twoways->wait_for_subscription_sync($node_twoways, 'testsub', 'd2');

After the REFRESH operation, the new table should have been added to
pg_subscription_rel. This is also visible from LOGS because one of
tables has synced and for table another worker is not started. Now,
ideally, after the new entry is created in pg_subscription_rel, the
apply worker should have got an invalidation message and refreshed the
'table_states_not_ready' list which should have let it start the new
table sync worker for table 't2' but that is not happening here.

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Perform streaming logical transactions by background workers and parallel apply
Next
From: Bharath Rupireddy
Date:
Subject: Re: thinko in basic_archive.c