I wrote:
> Tom Lane wrote:
>
>> "Andrew Dunstan" <andrew@dunslane.net> writes:
>>
>>> It could certainly be done. In general, I have generally taken the view
>>> that owners have the responsibility for monitoring their own machines.
>>>
>> Sure, but providing them tools to do that seems within buildfarm's
>> purview.
>>
>> For some types of failure, the buildfarm script could make a local
>> notification without bothering the server --- but a timeout on the
>> server side would cover a wider variety of failures, including "this
>> machine is dead and ought to be removed from the farm".
>>
>>
>
> Nothing gets removed. If a machine does not report on a branch for 30 days
> it drops off the dashboard, but apart from that it is a retained historic
> aretfact. This buildup in history has been gradually slowing down the
> dashboard, in fact, but Ian Barwick tells me that he has rewritten my
> lousy SQL to make it fast again, so we'll soon get that working better.
>
> Anyway, I think we can do something fairly simply for these alarms. We'll
> just have a special stanza in the config file, and a cron job that checks,
> say, once a day, to see if we have exceeded the alarm period on any
> machine/branch combination.
>
>
OK, I have a gadget to do this in place.
It looks at the config of the last build registered on each branch for a
stanza called 'alerts' that would look like this:
alerts => { HEAD => { alert_after => 24, alert_every => 48 }, REL8_1_STABLE => { alert_after => 168, alert_every
=>48 }, }
The settings are in hours, so this says that if we haven't seen a HEAD
build in 1 day or a stable branch build in 1 week, alert the owner by
email, and keep repeating the alert in each case every 2 days.
If some intrepid buildfarm owner wants to test this out by using low
settings that would trigger an alert that would be good - the cron job
runs every hour.
cheers
andrew