Re: Build-farm - intermittent error in 031_column_list.pl - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: Build-farm - intermittent error in 031_column_list.pl
Date
Msg-id 20220519.155804.753270708308766360.horikyota.ntt@gmail.com
Whole thread Raw
In response to Build-farm - intermittent error in 031_column_list.pl  (Peter Smith <smithpb2250@gmail.com>)
Responses Re: Build-farm - intermittent error in 031_column_list.pl
List pgsql-hackers
At Thu, 19 May 2022 14:26:56 +1000, Peter Smith <smithpb2250@gmail.com> wrote in 
> Hi hackers.
> 
> FYI, I saw that there was a recent Build-farm error on the "grison" machine [1]
> [1] https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=grison&br=HEAD
> 
> The error happened during "subscriptionCheck" phase in the TAP test
> t/031_column_list.pl
> This test file was added by this [2] commit.
> [2] https://github.com/postgres/postgres/commit/923def9a533a7d986acfb524139d8b9e5466d0a5

What is happening for all of them looks like that the name of a
publication created by CREATE PUBLICATION without a failure report is
missing for a walsender came later. It seems like CREATE PUBLICATION
can silently fail to create a publication, or walsender somehow failed
to find existing one.


> ~~
> 
> I checked the history of fails for that TAP test t/031_column_list.pl
> and found that this same error seems to have been happening
> intermittently for at least the last 50 days.
> 
> Details of similar previous errors from the BF are listed below.
> 
> ~~~
> 
> 1. Details for system "grison" failure at stage subscriptionCheck,
> snapshot taken 2022-05-18 18:11:45
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grison&dt=2022-05-18%2018%3A11%3A45
> 
> [22:02:08] t/029_on_error.pl .................. ok    25475 ms ( 0.01
> usr  0.00 sys + 15.39 cusr  5.59 csys = 20.99 CPU)
> # poll_query_until timed out executing this query:
> # SELECT '0/1530588' <= replay_lsn AND state = 'streaming'
> #          FROM pg_catalog.pg_stat_replication
> #          WHERE application_name IN ('sub1', 'walreceiver')
> # expecting this output:
> # t
> # last actual query output:
> #
> # with stderr:
> # Tests were run but no plan was declared and done_testing() was not seen.
> # Looks like your test exited with 29 just after 22.
> [22:09:25] t/031_column_list.pl ...............
> ...
> [22:02:47.887](1.829s) ok 22 - partitions with different replica
> identities not replicated correctly Waiting for replication conn
> sub1's replay_lsn to pass 0/1530588 on publisher
> [22:09:25.395](397.508s) # poll_query_until timed out executing this query:
> # SELECT '0/1530588' <= replay_lsn AND state = 'streaming'
> #          FROM pg_catalog.pg_stat_replication
> #          WHERE application_name IN ('sub1', 'walreceiver')
> # expecting this output:
> # t
> # last actual query output:
> #
> # with stderr:
> timed out waiting for catchup at t/031_column_list.pl line 728.
> ### Stopping node "publisher" using mode immediate

2022-04-17 00:16:04.278 CEST [293659][client backend][4/270:0][031_column_list.pl] LOG:  statement: CREATE PUBLICATION
pub9FOR TABLE test_part_d (a) WITH (publish_via_partition_root = true);
 
2022-04-17 00:16:04.279 CEST [293659][client backend][:0][031_column_list.pl] LOG:  disconnection: session time:
0:00:00.002user=bf database=postgres host=[local]
 

"CREATE PUBLICATION pub9" is executed at 00:16:04.278 on 293659 then
the session has been disconnected. But the following request for the
same publication fails due to the absense of the publication.

2022-04-17 00:16:08.147 CEST [293856][walsender][3/0:0][sub1] STATEMENT:  START_REPLICATION SLOT "sub1" LOGICAL
0/153DB88(proto_version '3', publication_names '"pub9"')
 
2022-04-17 00:16:08.148 CEST [293856][walsender][3/0:0][sub1] ERROR:  publication "pub9" does not exist


> ~~~
> 
> 2. Details for system "xenodermus" failure at stage subscriptionCheck,
> snapshot taken 2022-04-16 21:00:04
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=xenodermus&dt=2022-04-16%2021%3A00%3A04

The same. pub9 is missing after creation.

> ~~~
> 
> 3. Details for system "phycodurus" failure at stage subscriptionCheck,
> snapshot taken 2022-04-05 17:30:04
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=phycodurus&dt=2022-04-05%2017%3A30%3A04

The same happens for pub7..

> 4. Details for system "phycodurus" failure at stage subscriptionCheck,
> snapshot taken 2022-04-05 17:30:04
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=phycodurus&dt=2022-04-05%2017%3A30%3A04

Same. pub7 is missing.

> 5. Details for system "grison" failure at stage subscriptionCheck,
> snapshot taken 2022-04-03 18:11:39
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grison&dt=2022-04-03%2018%3A11%3A39

Same. pub7 is missing.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: Re: Perform streaming logical transactions by background workers and parallel apply
Next
From: Daniel Gustafsson
Date:
Subject: Re: Addition of PostgreSQL::Test::Cluster::pg_version()