Re: Build-farm - intermittent error in 031_column_list.pl - Mailing list pgsql-hackers

From Alexander Lakhin
Subject Re: Build-farm - intermittent error in 031_column_list.pl
Date
Msg-id 16d6d9cc-f97d-0b34-be65-425183ed3721@gmail.com
Whole thread Raw
In response to RE: Build-farm - intermittent error in 031_column_list.pl  ("osumi.takamichi@fujitsu.com" <osumi.takamichi@fujitsu.com>)
List pgsql-hackers
Hello hackers,

01.06.2022 05:06, osumi.takamichi@fujitsu.com wrote:
> FYI, I've noticed that after the last report by Peter-san
> we've gotten the same errors on Build Farm.
> We need to keep discussing to conclude this.
>
>
> 1. Details for system "xenodermus" failure at stage subscriptionCheck, snapshot taken 2022-05-31 13:00:04
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=xenodermus&dt=2022-05-31%2013%3A00%3A04
>
>
> 2. Details for system "phycodurus" failure at stage subscriptionCheck, snapshot taken 2022-05-26 17:30:04
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=phycodurus&dt=2022-05-26%2017%3A30%3A04

I think, I've discovered what causes that test failure.
When playing with bgwriter during researching [1], I made it more
aggressive with a dirty hack:
-#define LOG_SNAPSHOT_INTERVAL_MS 15000
+#define LOG_SNAPSHOT_INTERVAL_MS 1

          rc = WaitLatch(MyLatch,
                         WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
-                       BgWriterDelay /* ms */ , WAIT_EVENT_BGWRITER_MAIN);
+                       1 /* ms */ , WAIT_EVENT_BGWRITER_MAIN);

With this modification, I ran `make check -C src/test/subscription` in a
loop and observed the same failure of 031_column_list as discussed here.
With log_min_messages = DEBUG2, I see that in a failed case there is no
'"sub1" has now caught up' message (as Amit and Tomas guessed upthread)
(See full logs from one successful and two failed runs attached (I added
some extra logging and enabled wal_debug to better understand what's going
on here).)

If we look at the failures occurred in the buildfarm:
The first two from the past:
1)
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=xenodermus&dt=2022-05-31%2013%3A00%3A04

2022-05-26 20:39:23.828 CEST [276284][postmaster][:0][] LOG: starting PostgreSQL 15beta1 on x86_64-pc-linux-gnu, 
compiled by gcc-10 (Debian 10.3.0-15) 10.3.0, 64-bit
...
2022-05-26 20:39:39.768 CEST [277545][walsender][3/0:0][sub1] ERROR:  publication "pub6" does not exist

2)
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=phycodurus&dt=2022-05-26%2017%3A30%3A04

2022-05-31 16:33:25.506 CEST [3223685][postmaster][:0][] LOG: starting PostgreSQL 15beta1 on x86_64-pc-linux-gnu, 
compiled by clang version 6.0.1 , 64-bit
...
2022-05-31 16:33:41.114 CEST [3224511][walsender][3/0:0][sub1] ERROR:  publication "pub6" does not exist

The other two from the present:
3)

https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=kestrel&dt=2023-12-14%2009%3A14%3A52&stg=subscription-check

2023-12-14 09:26:12.523 UTC [1144979][postmaster][:0] LOG:  starting PostgreSQL 16.1 on x86_64-pc-linux-gnu, compiled
by
 
Debian clang version 13.0.1-11+b2, 64-bit
...
2023-12-14 09:26:28.663 UTC [1157936][walsender][3/0:0] ERROR: publication "pub6" does not exist

4)
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mylodon&dt=2023-11-17%2018%3A28%3A24

2023-11-17 18:31:13.594 UTC [200939] LOG:  starting PostgreSQL 17devel on x86_64-linux, compiled by clang-14.0.6,
64-bit
...
2023-11-17 18:31:29.292 UTC [222103] sub1 ERROR:  publication "pub6" does not exist

we can see that all the failures occurred approximately since 16 seconds
after the server start. And it's very close to predefined
LOG_SNAPSHOT_INTERVAL_MS.

[1] https://www.postgresql.org/message-id/6f85667e-5754-5d35-dbf1-c83fe08c1e48%40gmail.com

Best regards,
Alexander
Attachment

pgsql-hackers by date:

Previous
From: "David G. Johnston"
Date:
Subject: New Window Function: ROW_NUMBER_DESC() OVER() ?
Next
From: "Andrey M. Borodin"
Date:
Subject: Re: UUID v7