Thread: Possible infinite loop on buildfarm animals

Possible infinite loop on buildfarm animals

From
Tom Lane
Date:
Between approximately 11:05 UTC and 13:25 UTC today (15 Mar 2024),
the Postgres git repo contained a buggy test recipe that caused
an infinite loop that will eventually exhaust disk space.
If you have any animals that might have launched a test run on
HEAD in that interval, you might want to check up on them.
A manual kill of the process that's consuming 100% CPU should
be enough to get out of it.

Another idea to consider is to set the wait_timeout parameter
in your animals' configuration files, to put an upper bound
on the total elapsed time for a run.  By default that's
infinite, since it's really hard to select a one-size-fits-all
value ... but it's a good backstop if you don't mind picking
machine-specific limits.

            regards, tom lane



Re: Possible infinite loop on buildfarm animals

From
Noah Misch
Date:
On Fri, Mar 15, 2024 at 02:42:14PM -0400, Tom Lane wrote:
> Another idea to consider is to set the wait_timeout parameter
> in your animals' configuration files, to put an upper bound
> on the total elapsed time for a run.  By default that's
> infinite, since it's really hard to select a one-size-fits-all
> value ... but it's a good backstop if you don't mind picking
> machine-specific limits.

Other than CLOBBER_CACHE animals, the server rejects results older than 24h:
https://github.com/PGBuildFarm/server-code/blob/8572ac7/cgi-bin/pgstatus.pl#L273

The same 24h should probably be the default wait_timeout.  One might use a
longer timeout if wanting to attach a debugger to a process of a stuck run.
If one just wants an intervention-free buildfarm animal, 24h is good.



Re: Possible infinite loop on buildfarm animals

From
Alvaro Herrera
Date:
On 2024-Mar-17, Noah Misch wrote:

> Other than CLOBBER_CACHE animals, the server rejects results older than 24h:
> https://github.com/PGBuildFarm/server-code/blob/8572ac7/cgi-bin/pgstatus.pl#L273

> The same 24h should probably be the default wait_timeout.  One might use a
> longer timeout if wanting to attach a debugger to a process of a stuck run.
> If one just wants an intervention-free buildfarm animal, 24h is good.

Maybe that should be the default value embedded in the buildfarm client
script, which can be overridden for specific purposes such as
CLOBBER_CACHE animals?

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"No hay hombre que no aspire a la plenitud, es decir,
la suma de experiencias de que un hombre es capaz"



Re: Possible infinite loop on buildfarm animals

From
Tom Lane
Date:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> On 2024-Mar-17, Noah Misch wrote:
>> Other than CLOBBER_CACHE animals, the server rejects results older than 24h:
>> https://github.com/PGBuildFarm/server-code/blob/8572ac7/cgi-bin/pgstatus.pl#L273

>> The same 24h should probably be the default wait_timeout.  One might use a
>> longer timeout if wanting to attach a debugger to a process of a stuck run.
>> If one just wants an intervention-free buildfarm animal, 24h is good.

> Maybe that should be the default value embedded in the buildfarm client
> script, which can be overridden for specific purposes such as
> CLOBBER_CACHE animals?

We don't normally hard-wire such choices in the script, but it could
be plausible to change build-farm.conf.sample, perhaps like:

    # max time in seconds allowed for a single branch run
    # undef/0 means unlimited
-    wait_timeout => undef,
+    wait_timeout => 24 * 60 * 60,

            regards, tom lane