Thread: The stats.sql test is failing sporadically in v14- on POWER7/AIX 7.1 buildfarm animals
The stats.sql test is failing sporadically in v14- on POWER7/AIX 7.1 buildfarm animals
From
Alexander Lakhin
Date:
Hello hackers, Yesterday, the buildfarm animal sungazer was benevolent enough to demonstrate a rare anomaly, related to old stats collector: test stats ... FAILED 469155 ms ======================== 1 of 212 tests failed. ======================== --- /home/nm/farm/gcc64/REL_14_STABLE/pgsql.build/src/test/regress/expected/stats.out 2022-03-30 01:18:17.000000000 +0000 +++ /home/nm/farm/gcc64/REL_14_STABLE/pgsql.build/src/test/regress/results/stats.out 2024-07-30 09:49:39.000000000 +0000 @@ -165,11 +165,11 @@ WHERE relname like 'trunc_stats_test%' order by relname; relname | n_tup_ins | n_tup_upd | n_tup_del | n_live_tup | n_dead_tup -------------------+-----------+-----------+-----------+------------+------------ - trunc_stats_test | 3 | 0 | 0 | 0 | 0 - trunc_stats_test1 | 4 | 2 | 1 | 1 | 0 - trunc_stats_test2 | 1 | 0 | 0 | 1 | 0 - trunc_stats_test3 | 4 | 0 | 0 | 2 | 2 - trunc_stats_test4 | 2 | 0 | 0 | 0 | 2 + trunc_stats_test | 0 | 0 | 0 | 0 | 0 + trunc_stats_test1 | 0 | 0 | 0 | 0 | 0 + trunc_stats_test2 | 0 | 0 | 0 | 0 | 0 + trunc_stats_test3 | 0 | 0 | 0 | 0 | 0 + trunc_stats_test4 | 0 | 0 | 0 | 0 | 0 ... inst/logfile contains: 2024-07-30 09:25:11.225 UTC [63307946:1] LOG: using stale statistics instead of current ones because stats collector is not responding 2024-07-30 09:25:11.345 UTC [11206724:559] pg_regress/create_index LOG: using stale statistics instead of current ones because stats collector is not responding ... That's not the only failure of that kind occurred on sungazer, there were also [2] (REL_13_STABLE), [3] (REL_13_STABLE), [4] (REL_12_STABLE). Moreover, such failures were produced by all the other POWER7/AIX 7.1 animals: hornet ([5], [6]), tern ([7], [8]), mandrill ([9], [10], ...). But I could not find such failures coming from POWER8 animals: hoverfly (running AIX 7200-04-03-2038), ayu, boa, chub, and I did not encounter such anomalies on x86 nor ARM platforms. Thus, it looks like this stats collector issue is only happening on this concrete platform, and given [11], I think such failures perhaps should be just ignored for the next two years (until v14 EOL) unless AIX 7.1 will be upgraded and we see them on a vendor-supported OS version. So I'm parking this information here just for reference. [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer&dt=2024-07-30%2003%3A49%3A35 [2] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer&dt=2023-02-09%2009%3A29%3A10 [3] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer&dt=2022-06-16%2009%3A52%3A47 [4] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer&dt=2023-12-13%2003%3A40%3A42 [5] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet&dt=2024-03-29%2005%3A27%3A09 [6] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet&dt=2024-03-19%2002%3A09%3A07 [7] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tern&dt=2022-12-16%2009%3A17%3A38 [8] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tern&dt=2021-04-01%2003%3A09%3A38 [9] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&dt=2021-04-05%2004%3A22%3A17 [10] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&dt=2021-07-12%2004%3A31%3A37 [11] https://www.postgresql.org/message-id/3154146.1697661946%40sss.pgh.pa.us Best regards, Alexander
Re: The stats.sql test is failing sporadically in v14- on POWER7/AIX 7.1 buildfarm animals
From
Noah Misch
Date:
On Wed, Jul 31, 2024 at 02:00:00PM +0300, Alexander Lakhin wrote: > --- /home/nm/farm/gcc64/REL_14_STABLE/pgsql.build/src/test/regress/expected/stats.out 2022-03-30 01:18:17.000000000 +0000 > +++ /home/nm/farm/gcc64/REL_14_STABLE/pgsql.build/src/test/regress/results/stats.out 2024-07-30 09:49:39.000000000 +0000 > @@ -165,11 +165,11 @@ > WHERE relname like 'trunc_stats_test%' order by relname; > relname | n_tup_ins | n_tup_upd | n_tup_del | n_live_tup | n_dead_tup > -------------------+-----------+-----------+-----------+------------+------------ > - trunc_stats_test | 3 | 0 | 0 | 0 | 0 > - trunc_stats_test1 | 4 | 2 | 1 | 1 | 0 > - trunc_stats_test2 | 1 | 0 | 0 | 1 | 0 > - trunc_stats_test3 | 4 | 0 | 0 | 2 | 2 > - trunc_stats_test4 | 2 | 0 | 0 | 0 | 2 > + trunc_stats_test | 0 | 0 | 0 | 0 | 0 > + trunc_stats_test1 | 0 | 0 | 0 | 0 | 0 > + trunc_stats_test2 | 0 | 0 | 0 | 0 | 0 > + trunc_stats_test3 | 0 | 0 | 0 | 0 | 0 > + trunc_stats_test4 | 0 | 0 | 0 | 0 | 0 > ... > > inst/logfile contains: > 2024-07-30 09:25:11.225 UTC [63307946:1] LOG: using stale statistics > instead of current ones because stats collector is not responding > 2024-07-30 09:25:11.345 UTC [11206724:559] pg_regress/create_index LOG: > using stale statistics instead of current ones because stats collector is > not responding > ... > I could not find such failures coming from POWER8 animals: hoverfly > (running AIX 7200-04-03-2038), ayu, boa, chub, and I did not encounter such > anomalies on x86 nor ARM platforms. The animals you list as affected share a filesystem. The failure arises from the slow filesystem metadata operations of that filesystem. > Thus, it looks like this stats collector issue is only happening on this > concrete platform, and given [11], I think such failures perhaps should > be just ignored for the next two years (until v14 EOL) unless AIX 7.1 > will be upgraded and we see them on a vendor-supported OS version. This has happened on non-POWER, I/O-constrained machines. Still, I have been ignoring these failures. The stats subsystem was designed to drop stats updates at times, which was always at odds with the need for stable tests. So the failures witness a defect of the test, not a defect of the backend. Stabilizing this test was a known benefit of the new stats implementation.