Re: Buildfarm alarms - Mailing list pgsql-hackers

From Andrew Dunstan
Subject Re: Buildfarm alarms
Date
Msg-id 451A82FA.4060708@dunslane.net
Whole thread Raw
In response to Re: Buildfarm alarms  ("Andrew Dunstan" <andrew@dunslane.net>)
Responses Re: Buildfarm alarms  ("Dave Page" <dpage@vale-housing.co.uk>)
Re: Buildfarm alarms  (Kris Jurka <books@ejurka.com>)
List pgsql-hackers
I wrote:
> Tom Lane wrote:
>   
>> "Andrew Dunstan" <andrew@dunslane.net> writes:
>>     
>>> It could certainly be done. In general, I have generally taken the view
>>> that owners have the responsibility for monitoring their own machines.
>>>       
>> Sure, but providing them tools to do that seems within buildfarm's
>> purview.
>>
>> For some types of failure, the buildfarm script could make a local
>> notification without bothering the server --- but a timeout on the
>> server side would cover a wider variety of failures, including "this
>> machine is dead and ought to be removed from the farm".
>>
>>     
>
> Nothing gets removed. If a machine does not report on a branch for 30 days
> it drops off the dashboard, but apart from that it is a retained historic
> aretfact. This buildup in history has been gradually slowing down the
> dashboard, in fact, but Ian Barwick tells me that he has rewritten my
> lousy SQL to make it fast again, so we'll soon get that working better.
>
> Anyway, I think we can do something fairly simply for these alarms. We'll
> just have a special stanza in the config file, and a cron job that checks,
> say, once a day, to see if we have exceeded the alarm period on any
> machine/branch combination.
>
>   

OK, I have a gadget to do this in place.


It looks at the config of the last build registered on each branch for a 
stanza called 'alerts' that would look like this:
 alerts => {   HEAD => { alert_after => 24, alert_every => 48 },   REL8_1_STABLE => { alert_after => 168, alert_every
=>48 }, }
 

The settings are in hours, so this says that if we haven't seen  a HEAD 
build in 1 day or a stable branch build in 1 week, alert the owner by 
email, and keep repeating the alert in each case every 2 days.

If some intrepid buildfarm owner wants to test this out by using low 
settings that would trigger an alert that would be good - the cron job 
runs every hour.

cheers

andrew



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: horo(r)logy test fail on solaris (again and solved)
Next
From: "Strong, David"
Date:
Subject: Re: Faster StrNCpy