Re: strange git problems on turaco - Mailing list buildfarm-members
From | Noah Misch |
---|---|
Subject | Re: strange git problems on turaco |
Date | |
Msg-id | 20241202034623.39@rfd.leadboat.com Whole thread Raw |
In response to | strange git problems on turaco (Tomas Vondra <tomas@vondra.me>) |
Responses |
Re: strange git problems on turaco
Re: strange git problems on turaco |
List | buildfarm-members |
On Mon, Dec 02, 2024 at 02:20:35AM +0100, Tomas Vondra wrote: > turaco seems to be having some strange git issues - some of the > buildfarm runs fail like this: > > > turaco:REL_16_STABLE [22:41:11] OK > Sun Dec 1 22:41:27 2024: buildfarm run for turaco:REL_17_STABLE starting > turaco:REL_17_STABLE [22:41:27] checking out source ... > Missing checked out branch bf_REL_17_STABLE: > fatal: not a git repository (or any parent up to mount point /mnt) > Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set). > turaco:REL_17_STABLE [22:41:32] failed at stage pgsql-Git > Sun Dec 1 22:41:33 2024: buildfarm run for turaco:HEAD starting > turaco:HEAD [22:41:33] checking out source ... > > > I initially suspected this might be due to aging storage (SD card on > rpi), but I replaced that, and there's nothing strange in dmesg. Also, > other branches seem to be working fine ... > > Any ideas what could be causing this? I had this happen ~9 times on the host of my AIX buildfarm members. Example: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&dt=2024-07-10%2019%3A51%3A28 I figured it was some system problem, so I didn't root-cause it. I carry the following workaround in my fork of the buildfarm client code. The unknown problem caused failure reports and work stoppage ~4 times before I installed this workaround, then logs show the workaround prevented damage 5 times. The last "removed intruder .git" log message appeared on 2024-07-23. There was no kernel reboot, and logs don't point to buildfarm client processes getting involuntary termination, either. diff --git a/PGBuild/SCM.pm b/PGBuild/SCM.pm index dcfd180..2cd610a 100644 --- a/PGBuild/SCM.pm +++ b/PGBuild/SCM.pm @@ -1059,9 +1059,19 @@ sub _update_target my @gitlog; # If a run crashed during copy_source(), repair. - if (-d "./git-save" && !-d "$target/.git") + if (-d "./git-save") { + # As of 2024-07-13, the following has happened about four times in the + # last month, to different gcc111 animals. Despite no known crash, + # there's a git-save directory containing the proper git repo, and + # there's a bogus .git missing most content. Remove the bogus one. + # This is deeply hacky, but it beats buildfarm report noise and manual + # intervention. + if (rmtree("$target/.git") > 0) { + print "removed intruder .git\n" if $verbose; + } move "./git-save", "$target/.git"; + print "restored git-save\n" if $verbose; } chdir $target;
buildfarm-members by date: