Hi,
On 2025-03-12 20:41:29 -0400, Tom Lane wrote:
> I happened to notice these entries in a log file on a
> buildfarm member [1]:
>
> 2025-03-12 15:39:53.265 UTC [7296] WARNING: found incorrect redo LSN 0/159FB60 (expected 0/40000028)
> 2025-03-12 15:39:53.265 UTC [7296] LOG: corrupted statistics file "pg_stat/pgstat.stat"
>
> (this is near the end of the pg_upgrade_server.log log file).
> I don't think these are related to that run's subsequent test failure,
> which looks to be good old Windows randomness. I then looked into the
> logs of a local BF instance that also runs xversion-upgrade tests, and
> darned if I didn't find
>
> 2025-03-12 14:59:15.792 EDT [2216647] LOG: database system was shut down at 2025-03-12 14:59:13 EDT
> 2025-03-12 14:59:15.794 EDT [2216647] WARNING: found incorrect redo LSN 0/46F73F18 (expected 0/47000028)
> 2025-03-12 14:59:15.794 EDT [2216647] LOG: corrupted statistics file "pg_stat/pgstat.stat"
> 2025-03-12 14:59:15.795 EDT [2216644] LOG: database system is ready to accept connections
>
> despite that run having completed with no report of trouble.
> So this may have been going on for quite some time without our
> noticing. The "corrupted statistics file" whine is most likely
> caused by pg_upgrade copying the old system's pgstat.stat file
> into the new installation --- is that a good idea? I have
> no idea what's causing the redo LSN complaint, but it seems
> like that might deserve closer investigation.
I think the two issues are closely related - this is code that was introduced,
in b860848232aa, as part of work to make pgstats somewhat crashsafe.
Greetings,
Andres Freund