Thread: CF Bot failure in wait_for_subscription_sync()
Hi, I have seen 2 patches registered in CF failing on Linux - Debian Bullseye in wait_for_subscription_sync(). It seems like the tables aren't being synced. I have not done any further analysis. I'm not sure if this issue is being discussed elsewhere. # Postmaster PID for node "twoways" is 50208 Waiting for all subscriptions in "twoways" to synchronize data [14:12:43.092](198.391s) # poll_query_until timed out executing this query: # SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT IN ('r', 's'); # expecting this output: # t # last actual query output: # f # with stderr: timed out waiting for subscriber to synchronize data at t/100_bugs.pl line 147. https://api.cirrus-ci.com/v1/artifact/task/6618623857917952/log/src/test/subscription/tmp_check/log/regress_log_100_bugs https://cirrus-ci.com/task/6618623857917952 https://cirrus-ci.com/task/5764058174455808 -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Tuesday, October 18, 2022 2:16 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > Hi, > > I have seen 2 patches registered in CF failing on Linux - Debian Bullseye in > wait_for_subscription_sync(). It seems like the tables aren't being synced. I > have not done any further analysis. I'm not sure if this issue is being discussed > elsewhere. > > # Postmaster PID for node "twoways" is 50208 Waiting for all subscriptions in > "twoways" to synchronize data > [14:12:43.092](198.391s) # poll_query_until timed out executing this query: > # SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT IN ('r', > 's'); # expecting this output: > # t > # last actual query output: > # f > # with stderr: > timed out waiting for subscriber to synchronize data at t/100_bugs.pl line 147. > > https://api.cirrus-ci.com/v1/artifact/task/6618623857917952/log/src/test/sub > scription/tmp_check/log/regress_log_100_bugs > https://cirrus-ci.com/task/6618623857917952 > https://cirrus-ci.com/task/5764058174455808 Thanks for reporting this. I am not sure about the root cause but just share some initial analysis here. This testcase waits for table sync to finish for both table "t" and table "t2". But from the log, I can only see the log[1] related to the table sync of table "t". So it seems that the table sync worker for table "t2" was never started due to some reason. I tried it locally but have not reproduced this yet. [1]--- 2022-10-17 10:16:37.216 UTC [48051][logical replication worker] LOG: logical replication table synchronization worker forsubscription "testsub", table "t" has finished --- Best regards, Hou zj
On Tue, Oct 18, 2022 at 2:57 PM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote: > > On Tuesday, October 18, 2022 2:16 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > Hi, > > > > I have seen 2 patches registered in CF failing on Linux - Debian Bullseye in > > wait_for_subscription_sync(). It seems like the tables aren't being synced. I > > have not done any further analysis. I'm not sure if this issue is being discussed > > elsewhere. > > > > # Postmaster PID for node "twoways" is 50208 Waiting for all subscriptions in > > "twoways" to synchronize data > > [14:12:43.092](198.391s) # poll_query_until timed out executing this query: > > # SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT IN ('r', > > 's'); # expecting this output: > > # t > > # last actual query output: > > # f > > # with stderr: > > timed out waiting for subscriber to synchronize data at t/100_bugs.pl line 147. > > > > https://api.cirrus-ci.com/v1/artifact/task/6618623857917952/log/src/test/sub > > scription/tmp_check/log/regress_log_100_bugs > > https://cirrus-ci.com/task/6618623857917952 > > https://cirrus-ci.com/task/5764058174455808 > > Thanks for reporting this. I am not sure about the root cause but just share > some initial analysis here. > > This testcase waits for table sync to finish for both table "t" and table "t2". > But from the log, I can only see the log[1] related to the table sync of table > "t". So it seems that the table sync worker for table "t2" was never started > due to some reason. > Yeah, the reason is not clear to me either. Let me state my understanding of the issue. IIUC, the following test in 100_bugs.pl has failed: $node_twoways->safe_psql('d1', 'ALTER PUBLICATION testpub ADD TABLE t2'); $node_twoways->safe_psql('d2', 'ALTER SUBSCRIPTION testsub REFRESH PUBLICATION'); ... ... $node_twoways->wait_for_subscription_sync($node_twoways, 'testsub', 'd2'); After the REFRESH operation, the new table should have been added to pg_subscription_rel. This is also visible from LOGS because one of tables has synced and for table another worker is not started. Now, ideally, after the new entry is created in pg_subscription_rel, the apply worker should have got an invalidation message and refreshed the 'table_states_not_ready' list which should have let it start the new table sync worker for table 't2' but that is not happening here. -- With Regards, Amit Kapila.