Thread: Buildfarm feature request: some way to track/classify failures

Buildfarm feature request: some way to track/classify failures

From
Tom Lane
Date:
The current buildfarm webpages make it easy to see when a branch tip
is seriously broken, but it's not very easy to investigate transient
failures, such as a regression test race condition that only
materializes once in awhile.  I would like to have a way of seeing
just the failed build attempts across all machines running a given
branch.  Ideally it would be possible to tag failures as to the cause
(if known) and/or symptom pattern, and then be able to examine just
the ones without known cause or having similar symptoms.

I'm not sure how much of this is reasonable to try to do with webpages
similar to what we've got.  But the data is all in a database AIUI,
so another possibility is to do this work via SQL.  That'd require
having the ability to pull the information from the buildfarm database
so someone else could manipulate it.

So I guess the first question is can you make the build data available,
and the second is whether you're interested in building more flexible
views or just want to let someone else do that.  Also, if anyone does
make an effort to tag failures, it'd be good to somehow push that data
back into the master database, so that we don't end up duplicating such
work.
        regards, tom lane


Re: Buildfarm feature request: some way to track/classify failures

From
"Joshua D. Drake"
Date:
Tom Lane wrote:
> The current buildfarm webpages make it easy to see when a branch tip
> is seriously broken, but it's not very easy to investigate transient
> failures, such as a regression test race condition that only
> materializes once in awhile.  I would like to have a way of seeing
> just the failed build attempts across all machines running a given
> branch.  Ideally it would be possible to tag failures as to the cause
> (if known) and/or symptom pattern, and then be able to examine just
> the ones without known cause or having similar symptoms.
> 
> I'm not sure how much of this is reasonable to try to do with webpages
> similar to what we've got.  But the data is all in a database AIUI,
> so another possibility is to do this work via SQL.  That'd require
> having the ability to pull the information from the buildfarm database
> so someone else could manipulate it.
> 
> So I guess the first question is can you make the build data available,
> and the second is whether you're interested in building more flexible
> views or just want to let someone else do that.  Also, if anyone does
> make an effort to tag failures, it'd be good to somehow push that data
> back into the master database, so that we don't end up duplicating such
> work.

If the data is already there and just not represented, just let me know
exactly what you want and I will implement pages for that data happily.

Joshua D. Drake


> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
> 
>                http://www.postgresql.org/docs/faq
> 


-- 
     === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997            http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/



Re: Buildfarm feature request: some way to track/classify failures

From
Andrew Dunstan
Date:
Tom Lane wrote:
> The current buildfarm webpages make it easy to see when a branch tip
> is seriously broken, but it's not very easy to investigate transient
> failures, such as a regression test race condition that only
> materializes once in awhile.  I would like to have a way of seeing
> just the failed build attempts across all machines running a given
> branch.  Ideally it would be possible to tag failures as to the cause
> (if known) and/or symptom pattern, and then be able to examine just
> the ones without known cause or having similar symptoms.
>
> I'm not sure how much of this is reasonable to try to do with webpages
> similar to what we've got.  But the data is all in a database AIUI,
> so another possibility is to do this work via SQL.  That'd require
> having the ability to pull the information from the buildfarm database
> so someone else could manipulate it.
>
> So I guess the first question is can you make the build data available,
> and the second is whether you're interested in building more flexible
> views or just want to let someone else do that.  Also, if anyone does
> make an effort to tag failures, it'd be good to somehow push that data
> back into the master database, so that we don't end up duplicating such
> work.
>
>             
>   

Well, the db is currently running around 13Gb, so that's not something 
to be exported lightly ;-)

If we upgraded from Postgres 8.0.x to 8.2.x we could make use of some 
features, like dynamic partitioning and copy from queries, that might 
make life easier (CP people: that's a hint :-) )

I don't want to fragment effort, but I also know CP don't want open 
access, for obvious reasons.

We can also look at a safe API that we could make available freely. I've 
already done this over SOAP (see example client at 
http://people.planetpostgresql.org/andrew/index.php?/archives/14-SOAP-server-for-Buildfarm-dashboard.html 
). Doing updates is a whole other matter, of course.

Lastly, note that some buildfarm enhancements are on the SOC project 
list. I have no idea if anyone will express any interest in that, of 
course. It's not very glamorous work.

cheers

andrew



Re: Buildfarm feature request: some way to track/classify failures

From
"Joshua D. Drake"
Date:
> Well, the db is currently running around 13Gb, so that's not something
> to be exported lightly ;-)
> 
> If we upgraded from Postgres 8.0.x to 8.2.x we could make use of some
> features, like dynamic partitioning and copy from queries, that might
> make life easier (CP people: that's a hint :-) )

Yeah, Yeah... I need to get you off that machine as a whole :) Which is
on the list but I am waiting for 8.3 *badda bing*.

Sincerely,

Joshua D. Drake



-- 
     === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997            http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/



Re: Buildfarm feature request: some way to track/classify failures

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> Well, the db is currently running around 13Gb, so that's not something 
> to be exported lightly ;-)

Yeah.  I would assume though that the vast bulk of that is captured log
files.  For the purposes I'm imagining, it'd be sufficient to export
only the rest of the database --- or ideally, records including all the
other fields and a URL for each log file.  For the small number of log
files you actually need to examine, you'd chase the URL.
        regards, tom lane


Re: Buildfarm feature request: some way to track/classify failures

From
Andrew Dunstan
Date:
Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>   
>> Well, the db is currently running around 13Gb, so that's not something 
>> to be exported lightly ;-)
>>     
>
> Yeah.  I would assume though that the vast bulk of that is captured log
> files.  For the purposes I'm imagining, it'd be sufficient to export
> only the rest of the database --- or ideally, records including all the
> other fields and a URL for each log file.  For the small number of log
> files you actually need to examine, you'd chase the URL.
>
>   

OK, for anyone that wants to play, I have created an extract that 
contains a summary of every non-CVS-related failure we've had. It's a 
single table looking like this:

CREATE TABLE mfailures (   sysname text,   snapshot timestamp without time zone,   stage text,   conf_sum text,
branchtext,   changed_this_run text,   changed_since_success text,   log_archive_filenames text[],   build_flags
text[]
);


The dump is just under 1Mb and can be downloaded from 
http://www.pgbuildfarm.org/mfailures.dump

If this is useful we can create it or something like it on a regular 
basis (say nightly).

The summary log for a given build can be got from: 
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=<sysname>&dt=<snapshot>

To look at the log for a given run stage select 
http://www.pgbuildfarm.org/cgi-bin/show_stage_log.pl?nm=<sysname>&dt=<snapshot>&stg=<stagename> 
- the stage names available (if any) are the entries in 
log_archive_filenames, stripped of the ".log" suffix.

We can make these available over an API that isn't plain http is people 
want. Or we can provide a version of the buildlog that is tripped of the 
html.

cheers

andrew






Re: Buildfarm feature request: some way to track/classify failures

From
Jeremy Drake
Date:
On Fri, 16 Mar 2007, Andrew Dunstan wrote:

> OK, for anyone that wants to play, I have created an extract that contains a
> summary of every non-CVS-related failure we've had. It's a single table
> looking like this:
>
> CREATE TABLE mfailures (
>    sysname text,
>    snapshot timestamp without time zone,
>    stage text,
>    conf_sum text,
>    branch text,
>    changed_this_run text,
>    changed_since_success text,
>    log_archive_filenames text[],
>    build_flags text[]
> );

Sweet.  Should be interesting to look at.

>
>
> The dump is just under 1Mb and can be downloaded from
> http://www.pgbuildfarm.org/mfailures.dump

Sure about that?

--14:45:45--  http://www.pgbuildfarm.org/mfailures.dump          => `mfailures.dump'
Resolving www.pgbuildfarm.org... 207.173.203.146
Connecting to www.pgbuildfarm.org|207.173.203.146|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 9,184,142 (8.8M) [text/plain]



-- 
BOO!  We changed Coke again!  BLEAH!  BLEAH!


Re: Buildfarm feature request: some way to track/classify failures

From
"Andrew Dunstan"
Date:
Jeremy Drake wrote:
>>
>>
>> The dump is just under 1Mb and can be downloaded from
>> http://www.pgbuildfarm.org/mfailures.dump
>
> Sure about that?
>
> HTTP request sent, awaiting response... 200 OK
> Length: 9,184,142 (8.8M) [text/plain]
>


Damn these new specs. They made me skip a digit.

cheers

andrew




Re: Buildfarm feature request: some way to track/classify failures

From
Josh Berkus
Date:
Andrew,

> Lastly, note that some buildfarm enhancements are on the SOC project 
> list. I have no idea if anyone will express any interest in that, of 
> course. It's not very glamorous work.

On the other hand, I think there are a lot more student perl hackers and 
web people than there are folks with the potential to do backend stuff.  So who knows?

--Josh



Re: Buildfarm feature request: some way to track/classify failures

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> OK, for anyone that wants to play, I have created an extract that
> contains a summary of every non-CVS-related failure we've had. It's a
> single table looking like this:

I did some analysis on this data.  Attached is a text dump of a table
declared as

CREATE TABLE mreasons (
    sysname text,
    snapshot timestamp without time zone,
    branch text,
    reason text,
    known boolean
);

where the sysname/snapshot/branch data is taken from your table,
"reason" is a brief sketch of the failure, and "known" indicates
whether the cause is known ... although as I went along it sort
of evolved into "does this seem worthy of more investigation?".

I looked at every failure back through early December.  I'd intended to
go back further, but decided I'd hit a point of diminishing returns.
However, failures back to the beginning of July that matched grep
searches for recent symptoms are classified in the table.

The gross stats are: 2231 failures classified, 71 distinct reason
codes, 81 failures (with 18 reasons) that seem worthy of closer
investigation:

bfarm=# select reason,branch,max(snapshot) as latest, count(*) from mreasons where not known group by 1,2 order by 1,2
;
                              reason                              |    branch     |       latest        | count
------------------------------------------------------------------+---------------+---------------------+-------
 Input/output error - possible hardware problem                   | HEAD          | 2007-03-06 10:30:01 |     1
 No rule to make target                                           | HEAD          | 2007-02-08 15:30:01 |     6
 No rule to make target                                           | REL8_0_STABLE | 2007-02-28 03:15:02 |     9
 No rule to make target                                           | REL8_2_STABLE | 2006-12-17 20:00:01 |     1
 could not open relation with OID                                 | HEAD          | 2007-03-16 16:45:01 |     2
 could not open relation with OID                                 | REL8_1_STABLE | 2006-08-29 23:30:07 |     2
 createlang not found?                                            | REL8_1_STABLE | 2007-02-28 02:50:00 |     1
 irreproducible contrib/sslinfo build failure, likely not our bug | HEAD          | 2007-02-03 07:03:02 |     1
 irreproducible opr_sanity failure                                | HEAD          | 2006-12-18 19:15:02 |     2
 libintl.h rejected by configure                                  | HEAD          | 2007-01-11 20:35:00 |     3
 libintl.h rejected by configure                                  | REL8_0_STABLE | 2007-03-01 20:28:04 |    22
 postmaster failed to start                                       | REL7_4_STABLE | 2007-02-28 22:23:20 |     1
 postmaster failed to start                                       | REL8_0_STABLE | 2007-02-28 22:30:44 |     1
 random Solaris configure breakage                                | HEAD          | 2007-01-14 05:30:00 |     1
 random Windows breakage                                          | HEAD          | 2007-03-16 09:48:31 |     3
 random Windows breakage                                          | REL8_0_STABLE | 2007-03-15 03:15:09 |     7
 segfault during bootstrap                                        | HEAD          | 2007-03-12 23:03:03 |     1
 server does not shut down                                        | HEAD          | 2007-01-08 03:03:03 |     3
 tablespace is not empty                                          | HEAD          | 2007-02-24 15:00:10 |     6
 tablespace is not empty                                          | REL8_1_STABLE | 2007-01-25 02:30:01 |     2
 unexpected statement_timeout failure                             | HEAD          | 2007-01-25 05:05:06 |     1
 unexplained tsearch2 crash                                       | HEAD          | 2007-01-10 22:05:02 |     1
 weird DST-transition-like timestamp test failure                 | HEAD          | 2007-02-04 07:25:04 |     1
 weird assembler failure, likely not our bug                      | HEAD          | 2006-12-26 17:02:01 |     1
 weird assembler failure, likely not our bug                      | REL8_2_STABLE | 2007-02-03 23:47:01 |     1
 weird install failure                                            | HEAD          | 2007-01-25 12:35:00 |     1
(26 rows)

I think I know the cause of the recent 'could not open relation with
OID' failures in HEAD, but the rest of these maybe need a look.
Any volunteers?

Also, for completeness, the causes I wrote off as not interesting
(anymore, in some cases):

bfarm=# select reason,max(snapshot) as latest, count(*) from mreasons where known group by 1 order by 1 ;
                                reason                                |       latest        | count
----------------------------------------------------------------------+---------------------+-------
 DST transition test failure                                          | 2007-03-13 04:04:47 |    26
 ISO-week-patch regression test breakage                              | 2007-02-16 15:00:08 |    23
 No rule to make Makefile.port                                        | 2007-03-02 12:30:02 |    40
 Out of disk space                                                    | 2007-02-16 22:30:01 |    67
 Out of semaphores                                                    | 2007-02-20 02:03:31 |    14
 Python not installed                                                 | 2007-02-19 22:45:05 |     2
 Solaris random conn-refused bug                                      | 2007-03-06 01:20:00 |    37
 TCP socket already in use                                            | 2007-01-09 07:03:04 |    13
 Too many clients                                                     | 2007-02-26 06:06:02 |    90
 Too many open files in system                                        | 2007-02-27 20:30:59 |    17
 another icc crash                                                    | 2007-02-03 10:50:01 |     1
 apparently a malloc bug                                              | 2007-03-04 23:00:20 |    27
 bogus system clock setting                                           | 1997-12-21 15:20:11 |     6
 breakage from changing := to = in makefiles                          | 2007-02-10 02:15:01 |     4
 broken GUC patch                                                     | 2007-03-13 15:15:01 |    92
 broken float8 hacking                                                | 2007-01-06 20:00:09 |   120
 broken fsync-revoke patch                                            | 2007-01-17 16:21:01 |    77
 broken inet hacking                                                  | 2007-01-03 00:05:01 |     4
 broken log_error patch                                               | 2007-01-28 08:15:01 |    15
 broken money patch                                                   | 2007-01-03 19:05:01 |    78
 broken pg_regress change for msvc support                            | 2007-01-19 22:03:00 |    46
 broken plpython patch                                                | 2007-01-25 14:21:00 |    22
 broken sys_siglist patch                                             | 2007-01-28 06:06:02 |    18
 bug in btree page split patch                                        | 2007-02-08 11:35:03 |     7
 buildfarm pilot error                                                | 2007-01-19 03:28:07 |    69
 cache flush bug in operator-family patch                             | 2006-12-31 10:30:03 |     8
 ccache failure                                                       | 2007-01-25 23:00:34 |     2
 could not create shared memory                                       | 2007-02-13 07:00:05 |    32
 ecpg regression test teething pains                                  | 2007-02-03 13:30:02 |   516
 failure to update PL expected files for may/can/might rewording      | 2007-02-01 20:15:01 |     8
 failure to update contrib expected files for may/can/might rewording | 2007-02-01 21:15:02 |    11
 failure to update expected files for may/can/might rewording         | 2007-02-01 19:35:02 |     3
 icc "internal error"                                                 | 2007-03-16 16:30:01 |    29
 image not found (possibly related to too-many-open-files)            | 2006-10-25 08:05:02 |     1
 largeobject test bugs                                                | 2007-02-17 23:35:03 |     4
 ld segfaulted                                                        | 2007-03-16 15:30:02 |     3
 missing BYTE_ORDER definition for Solaris                            | 2007-01-10 14:18:23 |     1
 pg_regress patch breakage                                            | 2007-02-08 18:30:01 |     1
 plancache test race condition                                        | 2007-03-16 11:15:01 |     5
 pltcl regression test broken by ORDER BY semantics tightening        | 2007-01-09 03:15:01 |     9
 previous contrib test still running                                  | 2007-02-13 20:49:33 |    21
 random Solaris breakage                                              | 2007-01-05 17:20:01 |     1
 random Windows breakage                                              | 2006-12-27 03:15:07 |     1
 random Windows permission-denied failures                            | 2007-02-12 11:00:09 |     5
 random ccache breakage                                               | 2007-01-04 01:34:33 |     1
 readline misconfiguration                                            | 2007-02-12 17:19:41 |    33
 row-ordering discrepancy in rowtypes test                            | 2007-02-10 03:00:02 |     3
 stats test failed                                                    | 2007-03-14 13:00:02 |   319
 threaded Python library                                              | 2007-01-10 04:05:02 |     6
 undefined symbol pg_mic2ascii                                        | 2007-02-03 01:13:40 |   101
 unexpected signal 9                                                  | 2006-12-31 06:30:02 |    15
 unportable uuid patch                                                | 2007-01-31 17:30:01 |    16
 use of // comment                                                    | 2007-02-16 09:23:02 |     1
 xml code teething problems                                           | 2007-02-16 16:01:05 |    79
(54 rows)

Some of these might possibly be interesting to other people ...

            regards, tom lane


Attachment

Re: Buildfarm feature request: some way to track/classify failures

From
"Joshua D. Drake"
Date:
                                    | 2007-01-31 17:30:01 |    16
>  use of // comment                                                    | 2007-02-16 09:23:02 |     1
>  xml code teething problems                                           | 2007-02-16 16:01:05 |    79
> (54 rows)
> 
> Some of these might possibly be interesting to other people ...

If you provide the various greps, etc... I will put it into the website 
proper...

Joshua D. Drake

> 
>             regards, tom lane
> 
> 
> ------------------------------------------------------------------------
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend



Re: Buildfarm feature request: some way to track/classify failures

From
Jeremy Drake
Date:
On Sun, 18 Mar 2007, Tom Lane wrote:

>  another icc crash                                                    | 2007-02-03 10:50:01 |     1
>  icc "internal error"                                                 | 2007-03-16 16:30:01 |    29

These on mongoose are most likely a result of flaky hardware.  They tend
to occur most often when either
a) I am doing something else on the box when the build runs, or
b) the ambient temperature in the room is > ~72degF

I need to bring down this box at some point and try to figure out if it is
bad memory or what.

Anyway, ICC seems to be one of the few things that are really succeptable
to hardware issues (on this box at least, it is mostly ICC and firefox),
and I apologize for the noise this caused in the buildfarm logs...

-- 
American business long ago gave up on demanding that prospective
employees be honest and hardworking.  It has even stopped hoping for
employees who are educated enough that they can tell the difference
between the men's room and the women's room without having little
pictures on the doors.    -- Dave Barry, "Urine Trouble, Mister"


Re: Buildfarm feature request: some way to track/classify failures

From
Tom Lane
Date:
"Joshua D. Drake" <jd@commandprompt.com> writes:
>> Some of these might possibly be interesting to other people ...

> If you provide the various greps, etc... I will put it into the website 
> proper...

Unfortunately I didn't keep notes on exactly what I searched for in each
case.  Some of them were not based on grep at all, but rather "this
failure looks similar to those others and happened in the period between
a known bad patch commit and its fix".  The goal was essentially to
group together failures that probably arose from the same cause --- I
may have made a mistake or two along the way ...
        regards, tom lane


Re: Buildfarm feature request: some way to track/classify failures

From
Tom Lane
Date:
Jeremy Drake <pgsql@jdrake.com> writes:
> These on mongoose are most likely a result of flaky hardware.

Yeah, I saw a pretty fair number of irreproducible issues that are
probably hardware flake-outs.  Of course you can't tell which are those
and which are low-probability software bugs for many moons...

I believe that a large fraction of the buildfarm consists of
semi-retired equipment that is probably more prone to this sort of
problem than newer stuff would be.  But that's the price we must pay
for building such a large test farm on a shoestring.  What we need to do
to deal with it, I think, is institutionalize some kind of long-term
tracking so that we can tell the recurrent from the non-recurrent
issues.  I don't quite know how to do that; what I did over this past
weekend was labor-intensive and not scalable.

SoC project perhaps?
        regards, tom lane


Re: Buildfarm feature request: some way to track/classify failures

From
Tom Lane
Date:
BTW, before I forget, this little project turned up a couple of
small improvements for the current buildfarm infrastructure:

1.  There are half a dozen entries with obviously bogus timestamps:

bfarm=# select sysname,snapshot,branch from mfailures where snapshot < '2004-01-01'; sysname   |      snapshot       |
branch
 
------------+---------------------+--------corgi      | 1997-10-14 14:20:10 | HEADkookaburra | 1970-01-01 01:23:00 |
HEADcorgi     | 1997-09-30 11:47:08 | HEADcorgi      | 1997-10-17 14:20:11 | HEADcorgi      | 1997-12-21 15:20:11 |
HEADcorgi     | 1997-10-15 14:20:10 | HEADcorgi      | 1997-09-28 11:47:09 | HEADcorgi      | 1997-09-28 11:47:08 |
HEAD
(8 rows)

indicating wrong system clock settings on these buildfarm machines.
(Indeed, IIRC these failures were actually caused by the ridiculous
clock settings --- we have at least one regression test that checks
century >= 21 ...)  Perhaps the buildfarm server should bounce
reports with timestamps more than a day in the past or a few minutes in
the future.  I think though that a more useful answer would be to
include "time of receipt of report" in the permanent record, and then
subsequent analysis could make its own decisions about whether to
believe the snapshot timestamp --- plus we could track elapsed times for
builds, which could be interesting in itself.

2. I was annoyed repeatedly that some buildfarm members weren't
reporting log_archive_filenames entries, which forced going the long
way round in the process I was using.  Seems like we need some more
proactive means for getting buildfarm owners to keep their script
versions up-to-date.  Not sure what that should look like exactly,
as long as it's not "you can run an ancient version as long as you
please".
        regards, tom lane


Re: Buildfarm feature request: some way to track/classify failures

From
Gregory Stark
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> Also, for completeness, the causes I wrote off as not interesting
> (anymore, in some cases):
>
>  missing BYTE_ORDER definition for Solaris                            | 2007-01-10 14:18:23 |     1

What is this BYTE_ORDER macro? Should I be using it instead of the
AC_C_BIGENDIAN test in configure for the packed varlena patch?

>  row-ordering discrepancy in rowtypes test                            | 2007-02-10 03:00:02 |     3

Is this because the test is fixed or unfixable? If not shouldn't the test get
an ORDER BY clause so that it will reliably pass on future versions? 

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


Re: Buildfarm feature request: some way to track/classify failures

From
Stefan Kaltenbrunner
Date:
Gregory Stark wrote:
> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
> 
>> Also, for completeness, the causes I wrote off as not interesting
>> (anymore, in some cases):
>>
>>  missing BYTE_ORDER definition for Solaris                            | 2007-01-10 14:18:23 |     1
> 
> What is this BYTE_ORDER macro? Should I be using it instead of the
> AC_C_BIGENDIAN test in configure for the packed varlena patch?

FYI: this is the relevant commit (the affected buildfarm member was 
clownfish) 
http://archives.postgresql.org/pgsql-committers/2007-01/msg00154.php


Stefan


Re: Buildfarm feature request: some way to track/classify failures

From
Andrew Dunstan
Date:
Tom Lane wrote:
> BTW, before I forget, this little project turned up a couple of
> small improvements for the current buildfarm infrastructure:
>
> 1.  There are half a dozen entries with obviously bogus timestamps:
>
> bfarm=# select sysname,snapshot,branch from mfailures where snapshot < '2004-01-01';
>   sysname   |      snapshot       | branch 
> ------------+---------------------+--------
>  corgi      | 1997-10-14 14:20:10 | HEAD
>  kookaburra | 1970-01-01 01:23:00 | HEAD
>  corgi      | 1997-09-30 11:47:08 | HEAD
>  corgi      | 1997-10-17 14:20:11 | HEAD
>  corgi      | 1997-12-21 15:20:11 | HEAD
>  corgi      | 1997-10-15 14:20:10 | HEAD
>  corgi      | 1997-09-28 11:47:09 | HEAD
>  corgi      | 1997-09-28 11:47:08 | HEAD
> (8 rows)
>
> indicating wrong system clock settings on these buildfarm machines.
> (Indeed, IIRC these failures were actually caused by the ridiculous
> clock settings --- we have at least one regression test that checks
> century >= 21 ...)  Perhaps the buildfarm server should bounce
> reports with timestamps more than a day in the past or a few minutes in
> the future.  I think though that a more useful answer would be to
> include "time of receipt of report" in the permanent record, and then
> subsequent analysis could make its own decisions about whether to
> believe the snapshot timestamp --- plus we could track elapsed times for
> builds, which could be interesting in itself.
>   


We actually do timestamp the reports - I just didn't include that in the 
extract. I will alter the view it's based on. We started doing this in 
Nov 2005, so I'm going to restrict the view to cases where the 
report_time is not null - I doubt we're interested in ancient history.

A revised extract is available at 
http://www.pgbuildfarm.org/mfailures2.dump

We already reject snapshot times that are in the future.

Use of NTP is highly recommended to buildfarm members, but I'm reluctant 
to make it mandatory, as they might not have it available. I think we 
can do this: alter the client script to report its idea of current time 
at the time it makes the web transaction. If it's off from the server 
time by more than some small value (say 60 secs), adjust the snapshot 
time accordingly. If they don't report it then we can reject insane 
dates (more than 24hours ago seems about right).

So I agree with both your suggestions ;-)



> 2. I was annoyed repeatedly that some buildfarm members weren't
> reporting log_archive_filenames entries, which forced going the long
> way round in the process I was using.  Seems like we need some more
> proactive means for getting buildfarm owners to keep their script
> versions up-to-date.  Not sure what that should look like exactly,
> as long as it's not "you can run an ancient version as long as you
> please".
>
>             
>   

Modern clients report the versions of the two scripts involved (see 
script_version and web_script_version in reported config) so we could 
easily enforce a minimum version on these.

cheers

andrew



Re: Buildfarm feature request: some way to track/classify failures

From
Tom Lane
Date:
Gregory Stark <stark@enterprisedb.com> writes:
> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
>> missing BYTE_ORDER definition for Solaris                            | 2007-01-10 14:18:23 |     1

> What is this BYTE_ORDER macro? Should I be using it instead of the
> AC_C_BIGENDIAN test in configure for the packed varlena patch?

Actually, if we start to rely on AC_C_BIGENDIAN, I'd prefer to see us
get rid of direct usages of BYTE_ORDER.  It looks like only
contrib/pgcrypto is depending on it today, but we've got lots of
cruft in the include/port/ files supporting that.

>> row-ordering discrepancy in rowtypes test                            | 2007-02-10 03:00:02 |     3

> Is this because the test is fixed or unfixable?

It's fixed.
http://archives.postgresql.org/pgsql-committers/2007-02/msg00228.php
        regards, tom lane


Re: Buildfarm feature request: some way to track/classify failures

From
Gregory Stark
Date:
"Gregory Stark" <stark@enterprisedb.com> writes:

> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
>
>>  row-ordering discrepancy in rowtypes test                            | 2007-02-10 03:00:02 |     3
>
> Is this because the test is fixed or unfixable? If not shouldn't the test get
> an ORDER BY clause so that it will reliably pass on future versions? 

Hm, I took a quick look at this test and while there are a couple tests
missing ORDER BY clauses I can't see how they could possibly generate results
that are out of order. Perhaps the ones that do have ORDER BY clauses only
recently acquired them?

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


Re: Buildfarm feature request: some way to track/classify failures

From
Tom Lane
Date:
"Joshua D. Drake" <jd@commandprompt.com> writes:
> Tom Lane wrote:
>> The current buildfarm webpages make it easy to see when a branch tip
>> is seriously broken, but it's not very easy to investigate transient
>> failures, such as a regression test race condition that only
>> materializes once in awhile.

> If the data is already there and just not represented, just let me know
> exactly what you want and I will implement pages for that data happily.

I think what would be nice is some way to view all the failures for a
given branch, extending back not-sure-how-far.  Right now the only way
to see past failures is to look at individual machines' histories, which
is not real satisfactory when you want a broader view.

Actually what I *really* want is something closer to "show me all the
unexplained failures", but unless Andrew is willing to support some way
of tagging failures in the master database, I suppose that won't happen.
        regards, tom lane


Re: Buildfarm feature request: some way to track/classify failures

From
Andrew Dunstan
Date:
Tom Lane wrote:
> I think what would be nice is some way to view all the failures for a
> given branch, extending back not-sure-how-far.  Right now the only way
> to see past failures is to look at individual machines' histories, which
> is not real satisfactory when you want a broader view.
>
> Actually what I *really* want is something closer to "show me all the
> unexplained failures", but unless Andrew is willing to support some way
> of tagging failures in the master database, I suppose that won't happen.
>
>             
>   

Well, if I understood how it might work it might happen.

Who would do the tagging, and how?

cheers

andrew



Re: Buildfarm feature request: some way to track/classify failures

From
Andrew Dunstan
Date:
I wrote:
>> 2. I was annoyed repeatedly that some buildfarm members weren't
>> reporting log_archive_filenames entries, which forced going the long
>> way round in the process I was using.  Seems like we need some more
>> proactive means for getting buildfarm owners to keep their script
>> versions up-to-date.  Not sure what that should look like exactly,
>> as long as it's not "you can run an ancient version as long as you
>> please".
>>
>>            
>>   
>
> Modern clients report the versions of the two scripts involved (see 
> script_version and web_script_version in reported config) so we could 
> easily enforce a minimum version on these.
>

Meanwhile, the owner of the main 2 offending machines has said he will 
upgrade them.

cheers

andrew


Re: Buildfarm feature request: some way to track/classify failures

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> Tom Lane wrote:
>> Actually what I *really* want is something closer to "show me all the
>> unexplained failures", but unless Andrew is willing to support some way
>> of tagging failures in the master database, I suppose that won't happen.

> Who would do the tagging, and how?

Well, that's the hard part isn't it?  I was sort of envisioning a group
of users who'd be authorized to log in and set tags on database entries
somehow.  I'm not sure about details.  One issue is that the majority
of failures come in batches (when one of us commits a bad patch).
With the current web interface it would be real tedious to verify which
of the failures in a particular time interval matched the symptoms of
a failure.  What I did for my experiment this weekend was to download
the last-stage-log of each failed build, which required an hour or so
of setup time; then I could use grep to confirm which logs matched a
failure that I'd identified.  Doing that through the current webpage
would involve lots of clicking and waiting.  If we could expose a
text-search-style API for grepping the stage logs, it'd be a lot easier
to collect related failures.  Then maybe a few widgets to let authorized
users apply a tag to the search results ...

I'm not entirely sure that this infrastructure would pay for itself,
though.  Without some users willing to take the time to separate
explained from unexplained failures, it'd be a waste of effort.
But we've already had a couple of cases of interesting failures going
unnoticed because of the noise level.  Between duplicate reports about
busted patches and transient problems on particular build machines
(out of disk space, misconfiguration, etc) it's pretty hard to not miss
the once-in-a-while failures.  Is there some other way we could attack
that problem?
        regards, tom lane


Re: Buildfarm feature request: some way to track/classify failures

From
Andrew Dunstan
Date:
Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>   
>> Tom Lane wrote:
>>     
>>> Actually what I *really* want is something closer to "show me all the
>>> unexplained failures", but unless Andrew is willing to support some way
>>> of tagging failures in the master database, I suppose that won't happen.
>>>       
>
>   
>> Who would do the tagging, and how?
>>     
>
> Well, that's the hard part isn't it?  I was sort of envisioning a group
> of users who'd be authorized to log in and set tags on database entries
> somehow.  I'm not sure about details.  One issue is that the majority
> of failures come in batches (when one of us commits a bad patch).
> With the current web interface it would be real tedious to verify which
> of the failures in a particular time interval matched the symptoms of
> a failure.  What I did for my experiment this weekend was to download
> the last-stage-log of each failed build, which required an hour or so
> of setup time; then I could use grep to confirm which logs matched a
> failure that I'd identified.  Doing that through the current webpage
> would involve lots of clicking and waiting.  If we could expose a
> text-search-style API for grepping the stage logs, it'd be a lot easier
> to collect related failures.  Then maybe a few widgets to let authorized
> users apply a tag to the search results ...
>
> I'm not entirely sure that this infrastructure would pay for itself,
> though.  Without some users willing to take the time to separate
> explained from unexplained failures, it'd be a waste of effort.
> But we've already had a couple of cases of interesting failures going
> unnoticed because of the noise level.  Between duplicate reports about
> busted patches and transient problems on particular build machines
> (out of disk space, misconfiguration, etc) it's pretty hard to not miss
> the once-in-a-while failures.  Is there some other way we could attack
> that problem?
>
>     

I'm not too sanguine about having a team of eager taggers.

I think we probably need to work on a usable API for extracting data in 
small or large amounts, and maybe some good text search facilities.

The real issue is the one you identify of stuff getting lost in the 
noise. But I'm not sure there's any realistic cure for that.

cheers

andrew




Re: Buildfarm feature request: some way to track/classify failures

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> Tom Lane wrote:
>> But we've already had a couple of cases of interesting failures going
>> unnoticed because of the noise level.  Between duplicate reports about
>> busted patches and transient problems on particular build machines
>> (out of disk space, misconfiguration, etc) it's pretty hard to not miss
>> the once-in-a-while failures.  Is there some other way we could attack
>> that problem?

> The real issue is the one you identify of stuff getting lost in the 
> noise. But I'm not sure there's any realistic cure for that.

Maybe we should think about filtering the noise.  Like, say, discarding
every report from mongoose that involves an icc core dump ...
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=mongoose&dt=2007-03-20%2006:30:01

That's only semi-serious, but I do think that it's getting harder to
pluck the wheat from the chaff.  My investigations over the weekend
showed that we have got basically three categories of reports:

1. genuine code breakage from unportable patches: normally multiple
reports over a short period until we fix or revert the cause.
2. failures on a single buildfarm member due to misconfiguration,
hardware flakiness, etc.  These are sometimes repeatable and sometimes
not.
3. all the rest, of which some fraction represents bugs we need to fix,
only we don't know they're there.

In category 1 the buildfarm certainly pays for itself, but we'd hoped
that it would help us spot less-reproducible errors too.  The problem
I'm seeing is that category 2 is overwhelming our ability to recognize
patterns within category 3.  How can we dial down the noise level?
        regards, tom lane


Re: Buildfarm feature request: some way to track/classify failures

From
Martijn van Oosterhout
Date:
On Tue, Mar 20, 2007 at 02:57:13AM -0400, Tom Lane wrote:
> Maybe we should think about filtering the noise.  Like, say, discarding
> every report from mongoose that involves an icc core dump ...
> http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=mongoose&dt=2007-03-20%2006:30:01

Maybe a simple compromise would be being able to setup a set of regexes
that search the output and set a flag it that string is found. If you
find the string, it gets marked with a flag, which means that when you
look at mongoose, any failures that don't have the flag become easier
to spot.

It also means that once you've found a common failure, you can create
the regex and then any other failures with the same string get tagged
also, making unexplained ones easier to spot.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: Buildfarm feature request: some way to track/classify failures

From
Andrew Dunstan
Date:
Martijn van Oosterhout wrote:
> On Tue, Mar 20, 2007 at 02:57:13AM -0400, Tom Lane wrote:
>   
>> Maybe we should think about filtering the noise.  Like, say, discarding
>> every report from mongoose that involves an icc core dump ...
>> http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=mongoose&dt=2007-03-20%2006:30:01
>>     
>
> Maybe a simple compromise would be being able to setup a set of regexes
> that search the output and set a flag it that string is found. If you
> find the string, it gets marked with a flag, which means that when you
> look at mongoose, any failures that don't have the flag become easier
> to spot.
>
> It also means that once you've found a common failure, you can create
> the regex and then any other failures with the same string get tagged
> also, making unexplained ones easier to spot.
>
>
>   

You need to show first that this is an adequate tagging mechanism, both 
in tagging things adequately and in not picking up false positives, 
which would make things worse, not better. And even then you need 
someone to do the analysis to create the regex.

The buildfarm works because it leverages our strength, namely automating 
things. But all the tagging suggestions I've seen will involve regular, 
repetitive and possibly boring work, precisely the thing we are not good 
at as a group.

If we had some staff they could be given this task (among others), 
assuming we show that it actually works. We don't, so they can't.

cheers

andrew


Re: Buildfarm feature request: some way to track/classify failures

From
Stefan Kaltenbrunner
Date:
Andrew Dunstan wrote:
> Martijn van Oosterhout wrote:
>> On Tue, Mar 20, 2007 at 02:57:13AM -0400, Tom Lane wrote:
>>  
>>> Maybe we should think about filtering the noise.  Like, say, discarding
>>> every report from mongoose that involves an icc core dump ...
>>> http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=mongoose&dt=2007-03-20%2006:30:01 
>>>
>>>     
>>
>> Maybe a simple compromise would be being able to setup a set of regexes
>> that search the output and set a flag it that string is found. If you
>> find the string, it gets marked with a flag, which means that when you
>> look at mongoose, any failures that don't have the flag become easier
>> to spot.
>>
>> It also means that once you've found a common failure, you can create
>> the regex and then any other failures with the same string get tagged
>> also, making unexplained ones easier to spot.
>>
>>
>>   
> 
> You need to show first that this is an adequate tagging mechanism, both 
> in tagging things adequately and in not picking up false positives, 
> which would make things worse, not better. And even then you need 
> someone to do the analysis to create the regex.
> 
> The buildfarm works because it leverages our strength, namely automating 
> things. But all the tagging suggestions I've seen will involve regular, 
> repetitive and possibly boring work, precisely the thing we are not good 
> at as a group.

this is probably true - however as a buildfarm admin I occasionally 
wished i had a way to invalidate reports generated from my boxes to 
prevent someone wasting time to investigate them (like errors caused by 
system upgrades,configuration problems or other local issues).

But I agree that it might be difficult to make that "manual tagging" 
process scalable and reliable enough so that it really is an improvment 
over what we have now.

Stefan


Re: Buildfarm feature request: some way to track/classify failures

From
Andrew Dunstan
Date:
Stefan Kaltenbrunner wrote:
> however as a buildfarm admin I occasionally wished i had a way to 
> invalidate reports generated from my boxes to prevent someone wasting 
> time to investigate them (like errors caused by system 
> upgrades,configuration problems or other local issues).
>

It would be extremely simply to provide a 'revoke report' API and 
client. Good idea.

But that's quite different from what we have been discussing.

cheers

andrew



Re: Buildfarm feature request: some way to track/classify failures

From
Alvaro Herrera
Date:
Andrew Dunstan wrote:

> The buildfarm works because it leverages our strength, namely automating 
> things. But all the tagging suggestions I've seen will involve regular, 
> repetitive and possibly boring work, precisely the thing we are not good 
> at as a group.

You may be forgetting that Martijn and others tagged the
scan.coverity.com database.  Now, there are some untagged errors, but
I'd say that that's because we don't control the tool, so we cannot fix
it if there are false positives.  We do control the buildfarm however,
so we can develop systematic solutions for widespread problems (instead
of forcing us to checking and tagging every single occurance of
widespread problems).

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: Buildfarm feature request: some way to track/classify failures

From
Andrew Dunstan
Date:
Alvaro Herrera wrote:
> Andrew Dunstan wrote:
>
>   
>> The buildfarm works because it leverages our strength, namely automating 
>> things. But all the tagging suggestions I've seen will involve regular, 
>> repetitive and possibly boring work, precisely the thing we are not good 
>> at as a group.
>>     
>
> You may be forgetting that Martijn and others tagged the
> scan.coverity.com database.  Now, there are some untagged errors, but
> I'd say that that's because we don't control the tool, so we cannot fix
> it if there are false positives.  We do control the buildfarm however,
> so we can develop systematic solutions for widespread problems (instead
> of forcing us to checking and tagging every single occurance of
> widespread problems).
>
>   

Well, I'm sure we can provide appropriate access or data for anyone who 
wants to do research in this area and prove me wrong.

cheers

andrew




Re: Buildfarm feature request: some way to track/classify failures

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> Martijn van Oosterhout wrote:
>> Maybe a simple compromise would be being able to setup a set of regexes
>> that search the output and set a flag it that string is found. If you
>> find the string, it gets marked with a flag, which means that when you
>> look at mongoose, any failures that don't have the flag become easier
>> to spot.
>> 
>> It also means that once you've found a common failure, you can create
>> the regex and then any other failures with the same string get tagged
>> also, making unexplained ones easier to spot.

> You need to show first that this is an adequate tagging mechanism, both 
> in tagging things adequately and in not picking up false positives, 
> which would make things worse, not better. And even then you need 
> someone to do the analysis to create the regex.

Well, my experiment over the weekend with doing exactly that convinced
me that regexes could be used successfully to identify common-mode
failures.  So I think Martijn has a fine idea here.  And I don't see a
problem with lack of motivation, at least for those of us who try to pay
attention to buildfarm results --- once you've looked at a couple of
reports of the same issue, you really don't want to have to repeat the
analysis over and over.  But just assuming that every report on a
particular day reflects the same breakage is exactly the risk I wish
we didn't have to take.

For a lot of cases there is not a need for an ongoing filter: we break
something, we get a pile of reports, we fix it, and then we want to tag
all the reports of that something so that we can see if anything else
happened in the same interval.  So for this, something based on an
interactive search API would work fine.  You could even use that for
repetitive problems such as buildfarm misconfigurations, though having
to repeat the search every so often would get old in the end.  The main
thing though is for the database to remember the tags once made.

> The buildfarm works because it leverages our strength, namely automating 
> things. But all the tagging suggestions I've seen will involve regular, 
> repetitive and possibly boring work, precisely the thing we are not good 
> at as a group.

Well, responding to bug reports could be called regular and repetitive
work --- in reality I don't find it so, because every bug is different.
The point I think you are missing is that having something like this
will *eliminate* repetitive, boring work, namely recognizing multiple
reports of the same problem.  The buildfarm has gotten big enough that
some way of dealing with that is desperately needed, else our ability
to spot infrequently-reported issues will disappear entirely.
        regards, tom lane


Re: Buildfarm feature request: some way to track/classify failures

From
Andrew Dunstan
Date:
Tom Lane wrote:
> The point I think you are missing is that having something like this
> will *eliminate* repetitive, boring work, namely recognizing multiple
> reports of the same problem.  The buildfarm has gotten big enough that
> some way of dealing with that is desperately needed, else our ability
> to spot infrequently-reported issues will disappear entirely.
>
>     


OK. How about if we have a table of <branch, failure_stage, regex, tag, 
description, start_date> plus some webby transactions for approved users 
to edit this?

The wrinkle is that applying the tags on the fly is probably not a great 
idea - the status page query is already in desperate need of overhauling 
because it's too slow. So we'd need a daemon to set up the tags in the 
background. But that's an implementation detail. Screen real estate on 
the dashboard page is also in very short supply. Maybe we could play 
with the background colour, so that a tagged failure had, say, a blue 
background, as opposed to the red/pink/yellow we use for failures now. 
Again - an implementation detail.

My biggest worry apart from maintenance (which doesn't matter that much 
- if people don't enter the regexes they don't get the tags they want) 
is that the regexes will not be specific enough, and so give false 
positives on the tags. Then if you're looking for things that aren't 
tagged you be even more likely than today to miss the outliers. Lord 
knows that regexes are hard to get right - I've been using them for a 
couple of decades and they've earned me lots of money, and I still get 
them wrong regularly (including several cases on the buildfarm). but 
maybe we need to take the plunge and see how it works.

This would be a fine SOC project - I at least won't have time to develop 
it for quite some time.

cheers

andrew


Re: Buildfarm feature request: some way to track/classify failures

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> The wrinkle is that applying the tags on the fly is probably not a great 
> idea - the status page query is already in desperate need of overhauling 
> because it's too slow. So we'd need a daemon to set up the tags in the 
> background. But that's an implementation detail. Screen real estate on 
> the dashboard page is also in very short supply. Maybe we could play 
> with the background colour, so that a tagged failure had, say, a blue 
> background, as opposed to the red/pink/yellow we use for failures now. 
> Again - an implementation detail.

I'm not sure that the current status dashboard needs to pay any attention
to the tags.  The view that I would like to have of "recent failures
across all machines in a branch" is the one that needs to be tag-aware,
and perhaps also the existing display of a given machine's branch history.

> My biggest worry apart from maintenance (which doesn't matter that much 
> - if people don't enter the regexes they don't get the tags they want) 
> is that the regexes will not be specific enough, and so give false 
> positives on the tags.

True.  I strongly suggest that we want an interactive search-and-tag
capability *before* worrying about automatic tagging --- one of the
reasons for that is to provide a way to test a regex that you might
then consider adding to the automatic filter for future reports.

> This would be a fine SOC project - I at least won't have time to develop 
> it for quite some time.

Agreed.  Who's maintaining the SOC project list page?
        regards, tom lane


Re: Buildfarm feature request: some way to track/classify failures

From
Arturo Perez
Date:
I don't know if this has come up yet but....

In terms of tagging errors we might be able to use some machine  
learning techniques.


There are NLP/learning systems that interpret logs.  They learn over  
time what is normal and what isn't and can flag things that are  
abnormal.

For example, people are using support vector machines (SVM) analysis  
on log files to do intrusion detection.  Here's a link for intrusion  
detection called Robust Anomaly Detection Using Support Vector  
Machines  http://wwwcsif.cs.ucdavis.edu/~liaoy/research/ 
RSVM_Anomaly_journal.pdf

This paper from IBM gives some more background information on how  
such a thing might work.  http://www.research.ibm.com/journal/sj/413/ 
johnson.html

I have previously used an open source toolkit from CMU called rainbow  
to do these types of analysis.

-arturo



Re: Buildfarm feature request: some way to track/classify failures

From
Andrew Dunstan
Date:
Arturo Perez wrote:
> I don't know if this has come up yet but....
>
> In terms of tagging errors we might be able to use some machine 
> learning techniques.
>
>
> There are NLP/learning systems that interpret logs.  They learn over 
> time what is normal and what isn't and can flag things that are abnormal.
>
>

We can make extracts of the database (including the log data) available 
to anyone who wants to do research using any learning technique that 
appeals to them.

cheers

andrew



Re: Buildfarm feature request: some way to track/classify failures

From
Martijn van Oosterhout
Date:
On Tue, Mar 20, 2007 at 11:36:09AM -0400, Andrew Dunstan wrote:
> My biggest worry apart from maintenance (which doesn't matter that much
> - if people don't enter the regexes they don't get the tags they want)
> is that the regexes will not be specific enough, and so give false
> positives on the tags. Then if you're looking for things that aren't
> tagged you be even more likely than today to miss the outliers. Lord

I think you could solve that by displaying the text that matched the
regex. If it starts matching odd things it'd be visible.

But I'm just sprouting ideas here, the proof is in the pudding. If the
logs are easily available (or a subset of, say the last month) then
people could play with that and see what happens...

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: Buildfarm feature request: some way to track/classify failures

From
Tom Lane
Date:
Martijn van Oosterhout <kleptog@svana.org> writes:
> But I'm just sprouting ideas here, the proof is in the pudding. If the
> logs are easily available (or a subset of, say the last month) then
> people could play with that and see what happens...

Anyone who wants to play around can replicate what I did, which was to
download the table that Andrew made available upthread, and then pull
the log files matching interesting rows.  I used the attached functions
to generate URLs for the failing stage logs, and then a shell script
looping over lwp-download ...

CREATE FUNCTION lastfile(mfailures) RETURNS text   AS $$
select replace(   'show_stage_log.pl?nm=' || $1.sysname || '&dt=' || $1.snapshot ||   '&stg=' ||
replace($1.log_archive_filenames[array_upper($1.log_archive_filenames,1)],       '.log', ''), ' ', '%20')
 
$$   LANGUAGE sql;

CREATE FUNCTION lastlog(mfailures) RETURNS text   AS $$
select 'http://www.pgbuildfarm.org/cgi-bin/' || lastfile($1)
$$   LANGUAGE sql;

        regards, tom lane


Re: Buildfarm feature request: some way to track/classify failures

From
Andrew Dunstan
Date:
Tom Lane wrote:
> Martijn van Oosterhout <kleptog@svana.org> writes:
>   
>> But I'm just sprouting ideas here, the proof is in the pudding. If the
>> logs are easily available (or a subset of, say the last month) then
>> people could play with that and see what happens...
>>     
>
> Anyone who wants to play around can replicate what I did, which was to
> download the table that Andrew made available upthread, and then pull
> the log files matching interesting rows.  
>   
[snip]


To save people this trouble, I have made an extract for the last 3 
months, augmented by log field, which is pretty much the last stage log. 
The dump is 27Mb and can be got at

http://www.pgbuildfarm.org/tfailures.dmp

cheers

andrew





Re: Buildfarm feature request: some way to track/classify failures

From
"Joshua D. Drake"
Date:
Andrew Dunstan wrote:
> Tom Lane wrote:
>> Martijn van Oosterhout <kleptog@svana.org> writes:
>>  
>>> But I'm just sprouting ideas here, the proof is in the pudding. If the
>>> logs are easily available (or a subset of, say the last month) then
>>> people could play with that and see what happens...
>>>     
>>
>> Anyone who wants to play around can replicate what I did, which was to
>> download the table that Andrew made available upthread, and then pull
>> the log files matching interesting rows.    
> [snip]
> 
> 
> To save people this trouble, I have made an extract for the last 3
> months, augmented by log field, which is pretty much the last stage log.
> The dump is 27Mb and can be got at
> 
> http://www.pgbuildfarm.org/tfailures.dmp

Should we just automate this and make it a weekly?

> 
> cheers
> 
> andrew
> 
> 
> 


-- 
     === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997            http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/



Re: Buildfarm feature request: some way to track/classify failures

From
Andrew Dunstan
Date:
Joshua D. Drake wrote:
> Andrew Dunstan wrote:
>   
>> Tom Lane wrote:
>>     
>>> Martijn van Oosterhout <kleptog@svana.org> writes:
>>>  
>>>       
>>>> But I'm just sprouting ideas here, the proof is in the pudding. If the
>>>> logs are easily available (or a subset of, say the last month) then
>>>> people could play with that and see what happens...
>>>>     
>>>>         
>>> Anyone who wants to play around can replicate what I did, which was to
>>> download the table that Andrew made available upthread, and then pull
>>> the log files matching interesting rows.    
>>>       
>> [snip]
>>
>>
>> To save people this trouble, I have made an extract for the last 3
>> months, augmented by log field, which is pretty much the last stage log.
>> The dump is 27Mb and can be got at
>>
>> http://www.pgbuildfarm.org/tfailures.dmp
>>     
>
> Should we just automate this and make it a weekly?
>
>   

Sure. Talk to me offline about it - very simple to do.

cheers

andrew