Re: Buildfarm feature request: some way to track/classify failures - Mailing list pgsql-hackers

From Andrew Dunstan
Subject Re: Buildfarm feature request: some way to track/classify failures
Date
Msg-id 45FF4388.6060406@dunslane.net
Whole thread Raw
In response to Re: Buildfarm feature request: some way to track/classify failures  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Buildfarm feature request: some way to track/classify failures  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>   
>> Tom Lane wrote:
>>     
>>> Actually what I *really* want is something closer to "show me all the
>>> unexplained failures", but unless Andrew is willing to support some way
>>> of tagging failures in the master database, I suppose that won't happen.
>>>       
>
>   
>> Who would do the tagging, and how?
>>     
>
> Well, that's the hard part isn't it?  I was sort of envisioning a group
> of users who'd be authorized to log in and set tags on database entries
> somehow.  I'm not sure about details.  One issue is that the majority
> of failures come in batches (when one of us commits a bad patch).
> With the current web interface it would be real tedious to verify which
> of the failures in a particular time interval matched the symptoms of
> a failure.  What I did for my experiment this weekend was to download
> the last-stage-log of each failed build, which required an hour or so
> of setup time; then I could use grep to confirm which logs matched a
> failure that I'd identified.  Doing that through the current webpage
> would involve lots of clicking and waiting.  If we could expose a
> text-search-style API for grepping the stage logs, it'd be a lot easier
> to collect related failures.  Then maybe a few widgets to let authorized
> users apply a tag to the search results ...
>
> I'm not entirely sure that this infrastructure would pay for itself,
> though.  Without some users willing to take the time to separate
> explained from unexplained failures, it'd be a waste of effort.
> But we've already had a couple of cases of interesting failures going
> unnoticed because of the noise level.  Between duplicate reports about
> busted patches and transient problems on particular build machines
> (out of disk space, misconfiguration, etc) it's pretty hard to not miss
> the once-in-a-while failures.  Is there some other way we could attack
> that problem?
>
>     

I'm not too sanguine about having a team of eager taggers.

I think we probably need to work on a usable API for extracting data in 
small or large amounts, and maybe some good text search facilities.

The real issue is the one you identify of stuff getting lost in the 
noise. But I'm not sure there's any realistic cure for that.

cheers

andrew




pgsql-hackers by date:

Previous
From: Mark Kirkwood
Date:
Subject: Re: Stats for multi-column indexes
Next
From: Jan Wieck
Date:
Subject: Re: [COMMITTERS] pgsql: Changes pg_trigger and extend pg_rewrite in order to allow