Re: Build-farm - intermittent error in 031_column_list.pl - Mailing list pgsql-hackers
From | Alexander Lakhin |
---|---|
Subject | Re: Build-farm - intermittent error in 031_column_list.pl |
Date | |
Msg-id | 16d6d9cc-f97d-0b34-be65-425183ed3721@gmail.com Whole thread Raw |
In response to | RE: Build-farm - intermittent error in 031_column_list.pl ("osumi.takamichi@fujitsu.com" <osumi.takamichi@fujitsu.com>) |
List | pgsql-hackers |
Hello hackers, 01.06.2022 05:06, osumi.takamichi@fujitsu.com wrote: > FYI, I've noticed that after the last report by Peter-san > we've gotten the same errors on Build Farm. > We need to keep discussing to conclude this. > > > 1. Details for system "xenodermus" failure at stage subscriptionCheck, snapshot taken 2022-05-31 13:00:04 > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=xenodermus&dt=2022-05-31%2013%3A00%3A04 > > > 2. Details for system "phycodurus" failure at stage subscriptionCheck, snapshot taken 2022-05-26 17:30:04 > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=phycodurus&dt=2022-05-26%2017%3A30%3A04 I think, I've discovered what causes that test failure. When playing with bgwriter during researching [1], I made it more aggressive with a dirty hack: -#define LOG_SNAPSHOT_INTERVAL_MS 15000 +#define LOG_SNAPSHOT_INTERVAL_MS 1 rc = WaitLatch(MyLatch, WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, - BgWriterDelay /* ms */ , WAIT_EVENT_BGWRITER_MAIN); + 1 /* ms */ , WAIT_EVENT_BGWRITER_MAIN); With this modification, I ran `make check -C src/test/subscription` in a loop and observed the same failure of 031_column_list as discussed here. With log_min_messages = DEBUG2, I see that in a failed case there is no '"sub1" has now caught up' message (as Amit and Tomas guessed upthread) (See full logs from one successful and two failed runs attached (I added some extra logging and enabled wal_debug to better understand what's going on here).) If we look at the failures occurred in the buildfarm: The first two from the past: 1) https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=xenodermus&dt=2022-05-31%2013%3A00%3A04 2022-05-26 20:39:23.828 CEST [276284][postmaster][:0][] LOG: starting PostgreSQL 15beta1 on x86_64-pc-linux-gnu, compiled by gcc-10 (Debian 10.3.0-15) 10.3.0, 64-bit ... 2022-05-26 20:39:39.768 CEST [277545][walsender][3/0:0][sub1] ERROR: publication "pub6" does not exist 2) https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=phycodurus&dt=2022-05-26%2017%3A30%3A04 2022-05-31 16:33:25.506 CEST [3223685][postmaster][:0][] LOG: starting PostgreSQL 15beta1 on x86_64-pc-linux-gnu, compiled by clang version 6.0.1 , 64-bit ... 2022-05-31 16:33:41.114 CEST [3224511][walsender][3/0:0][sub1] ERROR: publication "pub6" does not exist The other two from the present: 3) https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=kestrel&dt=2023-12-14%2009%3A14%3A52&stg=subscription-check 2023-12-14 09:26:12.523 UTC [1144979][postmaster][:0] LOG: starting PostgreSQL 16.1 on x86_64-pc-linux-gnu, compiled by Debian clang version 13.0.1-11+b2, 64-bit ... 2023-12-14 09:26:28.663 UTC [1157936][walsender][3/0:0] ERROR: publication "pub6" does not exist 4) https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mylodon&dt=2023-11-17%2018%3A28%3A24 2023-11-17 18:31:13.594 UTC [200939] LOG: starting PostgreSQL 17devel on x86_64-linux, compiled by clang-14.0.6, 64-bit ... 2023-11-17 18:31:29.292 UTC [222103] sub1 ERROR: publication "pub6" does not exist we can see that all the failures occurred approximately since 16 seconds after the server start. And it's very close to predefined LOG_SNAPSHOT_INTERVAL_MS. [1] https://www.postgresql.org/message-id/6f85667e-5754-5d35-dbf1-c83fe08c1e48%40gmail.com Best regards, Alexander
Attachment
pgsql-hackers by date: