Re: Buildfarm feature request: some way to track/classify failures - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Buildfarm feature request: some way to track/classify failures
Date
Msg-id 18456.1174345108@sss.pgh.pa.us
Whole thread Raw
In response to Re: Buildfarm feature request: some way to track/classify failures  (Andrew Dunstan <andrew@dunslane.net>)
Responses Re: Buildfarm feature request: some way to track/classify failures  (Andrew Dunstan <andrew@dunslane.net>)
List pgsql-hackers
Andrew Dunstan <andrew@dunslane.net> writes:
> Tom Lane wrote:
>> Actually what I *really* want is something closer to "show me all the
>> unexplained failures", but unless Andrew is willing to support some way
>> of tagging failures in the master database, I suppose that won't happen.

> Who would do the tagging, and how?

Well, that's the hard part isn't it?  I was sort of envisioning a group
of users who'd be authorized to log in and set tags on database entries
somehow.  I'm not sure about details.  One issue is that the majority
of failures come in batches (when one of us commits a bad patch).
With the current web interface it would be real tedious to verify which
of the failures in a particular time interval matched the symptoms of
a failure.  What I did for my experiment this weekend was to download
the last-stage-log of each failed build, which required an hour or so
of setup time; then I could use grep to confirm which logs matched a
failure that I'd identified.  Doing that through the current webpage
would involve lots of clicking and waiting.  If we could expose a
text-search-style API for grepping the stage logs, it'd be a lot easier
to collect related failures.  Then maybe a few widgets to let authorized
users apply a tag to the search results ...

I'm not entirely sure that this infrastructure would pay for itself,
though.  Without some users willing to take the time to separate
explained from unexplained failures, it'd be a waste of effort.
But we've already had a couple of cases of interesting failures going
unnoticed because of the noise level.  Between duplicate reports about
busted patches and transient problems on particular build machines
(out of disk space, misconfiguration, etc) it's pretty hard to not miss
the once-in-a-while failures.  Is there some other way we could attack
that problem?
        regards, tom lane


pgsql-hackers by date:

Previous
From: "Luis D. García"
Date:
Subject: Make TIMESTAMP + TIME in the source code
Next
From: "Florian G. Pflug"
Date:
Subject: Re: modifying the tbale function