Thread: Possible infinite loop on buildfarm animals
Between approximately 11:05 UTC and 13:25 UTC today (15 Mar 2024), the Postgres git repo contained a buggy test recipe that caused an infinite loop that will eventually exhaust disk space. If you have any animals that might have launched a test run on HEAD in that interval, you might want to check up on them. A manual kill of the process that's consuming 100% CPU should be enough to get out of it. Another idea to consider is to set the wait_timeout parameter in your animals' configuration files, to put an upper bound on the total elapsed time for a run. By default that's infinite, since it's really hard to select a one-size-fits-all value ... but it's a good backstop if you don't mind picking machine-specific limits. regards, tom lane
On Fri, Mar 15, 2024 at 02:42:14PM -0400, Tom Lane wrote: > Another idea to consider is to set the wait_timeout parameter > in your animals' configuration files, to put an upper bound > on the total elapsed time for a run. By default that's > infinite, since it's really hard to select a one-size-fits-all > value ... but it's a good backstop if you don't mind picking > machine-specific limits. Other than CLOBBER_CACHE animals, the server rejects results older than 24h: https://github.com/PGBuildFarm/server-code/blob/8572ac7/cgi-bin/pgstatus.pl#L273 The same 24h should probably be the default wait_timeout. One might use a longer timeout if wanting to attach a debugger to a process of a stuck run. If one just wants an intervention-free buildfarm animal, 24h is good.
On 2024-Mar-17, Noah Misch wrote: > Other than CLOBBER_CACHE animals, the server rejects results older than 24h: > https://github.com/PGBuildFarm/server-code/blob/8572ac7/cgi-bin/pgstatus.pl#L273 > The same 24h should probably be the default wait_timeout. One might use a > longer timeout if wanting to attach a debugger to a process of a stuck run. > If one just wants an intervention-free buildfarm animal, 24h is good. Maybe that should be the default value embedded in the buildfarm client script, which can be overridden for specific purposes such as CLOBBER_CACHE animals? -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "No hay hombre que no aspire a la plenitud, es decir, la suma de experiencias de que un hombre es capaz"
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > On 2024-Mar-17, Noah Misch wrote: >> Other than CLOBBER_CACHE animals, the server rejects results older than 24h: >> https://github.com/PGBuildFarm/server-code/blob/8572ac7/cgi-bin/pgstatus.pl#L273 >> The same 24h should probably be the default wait_timeout. One might use a >> longer timeout if wanting to attach a debugger to a process of a stuck run. >> If one just wants an intervention-free buildfarm animal, 24h is good. > Maybe that should be the default value embedded in the buildfarm client > script, which can be overridden for specific purposes such as > CLOBBER_CACHE animals? We don't normally hard-wire such choices in the script, but it could be plausible to change build-farm.conf.sample, perhaps like: # max time in seconds allowed for a single branch run # undef/0 means unlimited - wait_timeout => undef, + wait_timeout => 24 * 60 * 60, regards, tom lane