Thread: Buildfarm feature request: some way to track/classify failures
The current buildfarm webpages make it easy to see when a branch tip is seriously broken, but it's not very easy to investigate transient failures, such as a regression test race condition that only materializes once in awhile. I would like to have a way of seeing just the failed build attempts across all machines running a given branch. Ideally it would be possible to tag failures as to the cause (if known) and/or symptom pattern, and then be able to examine just the ones without known cause or having similar symptoms. I'm not sure how much of this is reasonable to try to do with webpages similar to what we've got. But the data is all in a database AIUI, so another possibility is to do this work via SQL. That'd require having the ability to pull the information from the buildfarm database so someone else could manipulate it. So I guess the first question is can you make the build data available, and the second is whether you're interested in building more flexible views or just want to let someone else do that. Also, if anyone does make an effort to tag failures, it'd be good to somehow push that data back into the master database, so that we don't end up duplicating such work. regards, tom lane
Tom Lane wrote: > The current buildfarm webpages make it easy to see when a branch tip > is seriously broken, but it's not very easy to investigate transient > failures, such as a regression test race condition that only > materializes once in awhile. I would like to have a way of seeing > just the failed build attempts across all machines running a given > branch. Ideally it would be possible to tag failures as to the cause > (if known) and/or symptom pattern, and then be able to examine just > the ones without known cause or having similar symptoms. > > I'm not sure how much of this is reasonable to try to do with webpages > similar to what we've got. But the data is all in a database AIUI, > so another possibility is to do this work via SQL. That'd require > having the ability to pull the information from the buildfarm database > so someone else could manipulate it. > > So I guess the first question is can you make the build data available, > and the second is whether you're interested in building more flexible > views or just want to let someone else do that. Also, if anyone does > make an effort to tag failures, it'd be good to somehow push that data > back into the master database, so that we don't end up duplicating such > work. If the data is already there and just not represented, just let me know exactly what you want and I will implement pages for that data happily. Joshua D. Drake > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/
Tom Lane wrote: > The current buildfarm webpages make it easy to see when a branch tip > is seriously broken, but it's not very easy to investigate transient > failures, such as a regression test race condition that only > materializes once in awhile. I would like to have a way of seeing > just the failed build attempts across all machines running a given > branch. Ideally it would be possible to tag failures as to the cause > (if known) and/or symptom pattern, and then be able to examine just > the ones without known cause or having similar symptoms. > > I'm not sure how much of this is reasonable to try to do with webpages > similar to what we've got. But the data is all in a database AIUI, > so another possibility is to do this work via SQL. That'd require > having the ability to pull the information from the buildfarm database > so someone else could manipulate it. > > So I guess the first question is can you make the build data available, > and the second is whether you're interested in building more flexible > views or just want to let someone else do that. Also, if anyone does > make an effort to tag failures, it'd be good to somehow push that data > back into the master database, so that we don't end up duplicating such > work. > > > Well, the db is currently running around 13Gb, so that's not something to be exported lightly ;-) If we upgraded from Postgres 8.0.x to 8.2.x we could make use of some features, like dynamic partitioning and copy from queries, that might make life easier (CP people: that's a hint :-) ) I don't want to fragment effort, but I also know CP don't want open access, for obvious reasons. We can also look at a safe API that we could make available freely. I've already done this over SOAP (see example client at http://people.planetpostgresql.org/andrew/index.php?/archives/14-SOAP-server-for-Buildfarm-dashboard.html ). Doing updates is a whole other matter, of course. Lastly, note that some buildfarm enhancements are on the SOC project list. I have no idea if anyone will express any interest in that, of course. It's not very glamorous work. cheers andrew
> Well, the db is currently running around 13Gb, so that's not something > to be exported lightly ;-) > > If we upgraded from Postgres 8.0.x to 8.2.x we could make use of some > features, like dynamic partitioning and copy from queries, that might > make life easier (CP people: that's a hint :-) ) Yeah, Yeah... I need to get you off that machine as a whole :) Which is on the list but I am waiting for 8.3 *badda bing*. Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/
Andrew Dunstan <andrew@dunslane.net> writes: > Well, the db is currently running around 13Gb, so that's not something > to be exported lightly ;-) Yeah. I would assume though that the vast bulk of that is captured log files. For the purposes I'm imagining, it'd be sufficient to export only the rest of the database --- or ideally, records including all the other fields and a URL for each log file. For the small number of log files you actually need to examine, you'd chase the URL. regards, tom lane
Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: > >> Well, the db is currently running around 13Gb, so that's not something >> to be exported lightly ;-) >> > > Yeah. I would assume though that the vast bulk of that is captured log > files. For the purposes I'm imagining, it'd be sufficient to export > only the rest of the database --- or ideally, records including all the > other fields and a URL for each log file. For the small number of log > files you actually need to examine, you'd chase the URL. > > OK, for anyone that wants to play, I have created an extract that contains a summary of every non-CVS-related failure we've had. It's a single table looking like this: CREATE TABLE mfailures ( sysname text, snapshot timestamp without time zone, stage text, conf_sum text, branchtext, changed_this_run text, changed_since_success text, log_archive_filenames text[], build_flags text[] ); The dump is just under 1Mb and can be downloaded from http://www.pgbuildfarm.org/mfailures.dump If this is useful we can create it or something like it on a regular basis (say nightly). The summary log for a given build can be got from: http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=<sysname>&dt=<snapshot> To look at the log for a given run stage select http://www.pgbuildfarm.org/cgi-bin/show_stage_log.pl?nm=<sysname>&dt=<snapshot>&stg=<stagename> - the stage names available (if any) are the entries in log_archive_filenames, stripped of the ".log" suffix. We can make these available over an API that isn't plain http is people want. Or we can provide a version of the buildlog that is tripped of the html. cheers andrew
On Fri, 16 Mar 2007, Andrew Dunstan wrote: > OK, for anyone that wants to play, I have created an extract that contains a > summary of every non-CVS-related failure we've had. It's a single table > looking like this: > > CREATE TABLE mfailures ( > sysname text, > snapshot timestamp without time zone, > stage text, > conf_sum text, > branch text, > changed_this_run text, > changed_since_success text, > log_archive_filenames text[], > build_flags text[] > ); Sweet. Should be interesting to look at. > > > The dump is just under 1Mb and can be downloaded from > http://www.pgbuildfarm.org/mfailures.dump Sure about that? --14:45:45-- http://www.pgbuildfarm.org/mfailures.dump => `mfailures.dump' Resolving www.pgbuildfarm.org... 207.173.203.146 Connecting to www.pgbuildfarm.org|207.173.203.146|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 9,184,142 (8.8M) [text/plain] -- BOO! We changed Coke again! BLEAH! BLEAH!
Jeremy Drake wrote: >> >> >> The dump is just under 1Mb and can be downloaded from >> http://www.pgbuildfarm.org/mfailures.dump > > Sure about that? > > HTTP request sent, awaiting response... 200 OK > Length: 9,184,142 (8.8M) [text/plain] > Damn these new specs. They made me skip a digit. cheers andrew
Andrew, > Lastly, note that some buildfarm enhancements are on the SOC project > list. I have no idea if anyone will express any interest in that, of > course. It's not very glamorous work. On the other hand, I think there are a lot more student perl hackers and web people than there are folks with the potential to do backend stuff. So who knows? --Josh
Andrew Dunstan <andrew@dunslane.net> writes: > OK, for anyone that wants to play, I have created an extract that > contains a summary of every non-CVS-related failure we've had. It's a > single table looking like this: I did some analysis on this data. Attached is a text dump of a table declared as CREATE TABLE mreasons ( sysname text, snapshot timestamp without time zone, branch text, reason text, known boolean ); where the sysname/snapshot/branch data is taken from your table, "reason" is a brief sketch of the failure, and "known" indicates whether the cause is known ... although as I went along it sort of evolved into "does this seem worthy of more investigation?". I looked at every failure back through early December. I'd intended to go back further, but decided I'd hit a point of diminishing returns. However, failures back to the beginning of July that matched grep searches for recent symptoms are classified in the table. The gross stats are: 2231 failures classified, 71 distinct reason codes, 81 failures (with 18 reasons) that seem worthy of closer investigation: bfarm=# select reason,branch,max(snapshot) as latest, count(*) from mreasons where not known group by 1,2 order by 1,2 ; reason | branch | latest | count ------------------------------------------------------------------+---------------+---------------------+------- Input/output error - possible hardware problem | HEAD | 2007-03-06 10:30:01 | 1 No rule to make target | HEAD | 2007-02-08 15:30:01 | 6 No rule to make target | REL8_0_STABLE | 2007-02-28 03:15:02 | 9 No rule to make target | REL8_2_STABLE | 2006-12-17 20:00:01 | 1 could not open relation with OID | HEAD | 2007-03-16 16:45:01 | 2 could not open relation with OID | REL8_1_STABLE | 2006-08-29 23:30:07 | 2 createlang not found? | REL8_1_STABLE | 2007-02-28 02:50:00 | 1 irreproducible contrib/sslinfo build failure, likely not our bug | HEAD | 2007-02-03 07:03:02 | 1 irreproducible opr_sanity failure | HEAD | 2006-12-18 19:15:02 | 2 libintl.h rejected by configure | HEAD | 2007-01-11 20:35:00 | 3 libintl.h rejected by configure | REL8_0_STABLE | 2007-03-01 20:28:04 | 22 postmaster failed to start | REL7_4_STABLE | 2007-02-28 22:23:20 | 1 postmaster failed to start | REL8_0_STABLE | 2007-02-28 22:30:44 | 1 random Solaris configure breakage | HEAD | 2007-01-14 05:30:00 | 1 random Windows breakage | HEAD | 2007-03-16 09:48:31 | 3 random Windows breakage | REL8_0_STABLE | 2007-03-15 03:15:09 | 7 segfault during bootstrap | HEAD | 2007-03-12 23:03:03 | 1 server does not shut down | HEAD | 2007-01-08 03:03:03 | 3 tablespace is not empty | HEAD | 2007-02-24 15:00:10 | 6 tablespace is not empty | REL8_1_STABLE | 2007-01-25 02:30:01 | 2 unexpected statement_timeout failure | HEAD | 2007-01-25 05:05:06 | 1 unexplained tsearch2 crash | HEAD | 2007-01-10 22:05:02 | 1 weird DST-transition-like timestamp test failure | HEAD | 2007-02-04 07:25:04 | 1 weird assembler failure, likely not our bug | HEAD | 2006-12-26 17:02:01 | 1 weird assembler failure, likely not our bug | REL8_2_STABLE | 2007-02-03 23:47:01 | 1 weird install failure | HEAD | 2007-01-25 12:35:00 | 1 (26 rows) I think I know the cause of the recent 'could not open relation with OID' failures in HEAD, but the rest of these maybe need a look. Any volunteers? Also, for completeness, the causes I wrote off as not interesting (anymore, in some cases): bfarm=# select reason,max(snapshot) as latest, count(*) from mreasons where known group by 1 order by 1 ; reason | latest | count ----------------------------------------------------------------------+---------------------+------- DST transition test failure | 2007-03-13 04:04:47 | 26 ISO-week-patch regression test breakage | 2007-02-16 15:00:08 | 23 No rule to make Makefile.port | 2007-03-02 12:30:02 | 40 Out of disk space | 2007-02-16 22:30:01 | 67 Out of semaphores | 2007-02-20 02:03:31 | 14 Python not installed | 2007-02-19 22:45:05 | 2 Solaris random conn-refused bug | 2007-03-06 01:20:00 | 37 TCP socket already in use | 2007-01-09 07:03:04 | 13 Too many clients | 2007-02-26 06:06:02 | 90 Too many open files in system | 2007-02-27 20:30:59 | 17 another icc crash | 2007-02-03 10:50:01 | 1 apparently a malloc bug | 2007-03-04 23:00:20 | 27 bogus system clock setting | 1997-12-21 15:20:11 | 6 breakage from changing := to = in makefiles | 2007-02-10 02:15:01 | 4 broken GUC patch | 2007-03-13 15:15:01 | 92 broken float8 hacking | 2007-01-06 20:00:09 | 120 broken fsync-revoke patch | 2007-01-17 16:21:01 | 77 broken inet hacking | 2007-01-03 00:05:01 | 4 broken log_error patch | 2007-01-28 08:15:01 | 15 broken money patch | 2007-01-03 19:05:01 | 78 broken pg_regress change for msvc support | 2007-01-19 22:03:00 | 46 broken plpython patch | 2007-01-25 14:21:00 | 22 broken sys_siglist patch | 2007-01-28 06:06:02 | 18 bug in btree page split patch | 2007-02-08 11:35:03 | 7 buildfarm pilot error | 2007-01-19 03:28:07 | 69 cache flush bug in operator-family patch | 2006-12-31 10:30:03 | 8 ccache failure | 2007-01-25 23:00:34 | 2 could not create shared memory | 2007-02-13 07:00:05 | 32 ecpg regression test teething pains | 2007-02-03 13:30:02 | 516 failure to update PL expected files for may/can/might rewording | 2007-02-01 20:15:01 | 8 failure to update contrib expected files for may/can/might rewording | 2007-02-01 21:15:02 | 11 failure to update expected files for may/can/might rewording | 2007-02-01 19:35:02 | 3 icc "internal error" | 2007-03-16 16:30:01 | 29 image not found (possibly related to too-many-open-files) | 2006-10-25 08:05:02 | 1 largeobject test bugs | 2007-02-17 23:35:03 | 4 ld segfaulted | 2007-03-16 15:30:02 | 3 missing BYTE_ORDER definition for Solaris | 2007-01-10 14:18:23 | 1 pg_regress patch breakage | 2007-02-08 18:30:01 | 1 plancache test race condition | 2007-03-16 11:15:01 | 5 pltcl regression test broken by ORDER BY semantics tightening | 2007-01-09 03:15:01 | 9 previous contrib test still running | 2007-02-13 20:49:33 | 21 random Solaris breakage | 2007-01-05 17:20:01 | 1 random Windows breakage | 2006-12-27 03:15:07 | 1 random Windows permission-denied failures | 2007-02-12 11:00:09 | 5 random ccache breakage | 2007-01-04 01:34:33 | 1 readline misconfiguration | 2007-02-12 17:19:41 | 33 row-ordering discrepancy in rowtypes test | 2007-02-10 03:00:02 | 3 stats test failed | 2007-03-14 13:00:02 | 319 threaded Python library | 2007-01-10 04:05:02 | 6 undefined symbol pg_mic2ascii | 2007-02-03 01:13:40 | 101 unexpected signal 9 | 2006-12-31 06:30:02 | 15 unportable uuid patch | 2007-01-31 17:30:01 | 16 use of // comment | 2007-02-16 09:23:02 | 1 xml code teething problems | 2007-02-16 16:01:05 | 79 (54 rows) Some of these might possibly be interesting to other people ... regards, tom lane
Attachment
| 2007-01-31 17:30:01 | 16 > use of // comment | 2007-02-16 09:23:02 | 1 > xml code teething problems | 2007-02-16 16:01:05 | 79 > (54 rows) > > Some of these might possibly be interesting to other people ... If you provide the various greps, etc... I will put it into the website proper... Joshua D. Drake > > regards, tom lane > > > ------------------------------------------------------------------------ > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend
On Sun, 18 Mar 2007, Tom Lane wrote: > another icc crash | 2007-02-03 10:50:01 | 1 > icc "internal error" | 2007-03-16 16:30:01 | 29 These on mongoose are most likely a result of flaky hardware. They tend to occur most often when either a) I am doing something else on the box when the build runs, or b) the ambient temperature in the room is > ~72degF I need to bring down this box at some point and try to figure out if it is bad memory or what. Anyway, ICC seems to be one of the few things that are really succeptable to hardware issues (on this box at least, it is mostly ICC and firefox), and I apologize for the noise this caused in the buildfarm logs... -- American business long ago gave up on demanding that prospective employees be honest and hardworking. It has even stopped hoping for employees who are educated enough that they can tell the difference between the men's room and the women's room without having little pictures on the doors. -- Dave Barry, "Urine Trouble, Mister"
"Joshua D. Drake" <jd@commandprompt.com> writes: >> Some of these might possibly be interesting to other people ... > If you provide the various greps, etc... I will put it into the website > proper... Unfortunately I didn't keep notes on exactly what I searched for in each case. Some of them were not based on grep at all, but rather "this failure looks similar to those others and happened in the period between a known bad patch commit and its fix". The goal was essentially to group together failures that probably arose from the same cause --- I may have made a mistake or two along the way ... regards, tom lane
Jeremy Drake <pgsql@jdrake.com> writes: > These on mongoose are most likely a result of flaky hardware. Yeah, I saw a pretty fair number of irreproducible issues that are probably hardware flake-outs. Of course you can't tell which are those and which are low-probability software bugs for many moons... I believe that a large fraction of the buildfarm consists of semi-retired equipment that is probably more prone to this sort of problem than newer stuff would be. But that's the price we must pay for building such a large test farm on a shoestring. What we need to do to deal with it, I think, is institutionalize some kind of long-term tracking so that we can tell the recurrent from the non-recurrent issues. I don't quite know how to do that; what I did over this past weekend was labor-intensive and not scalable. SoC project perhaps? regards, tom lane
BTW, before I forget, this little project turned up a couple of small improvements for the current buildfarm infrastructure: 1. There are half a dozen entries with obviously bogus timestamps: bfarm=# select sysname,snapshot,branch from mfailures where snapshot < '2004-01-01'; sysname | snapshot | branch ------------+---------------------+--------corgi | 1997-10-14 14:20:10 | HEADkookaburra | 1970-01-01 01:23:00 | HEADcorgi | 1997-09-30 11:47:08 | HEADcorgi | 1997-10-17 14:20:11 | HEADcorgi | 1997-12-21 15:20:11 | HEADcorgi | 1997-10-15 14:20:10 | HEADcorgi | 1997-09-28 11:47:09 | HEADcorgi | 1997-09-28 11:47:08 | HEAD (8 rows) indicating wrong system clock settings on these buildfarm machines. (Indeed, IIRC these failures were actually caused by the ridiculous clock settings --- we have at least one regression test that checks century >= 21 ...) Perhaps the buildfarm server should bounce reports with timestamps more than a day in the past or a few minutes in the future. I think though that a more useful answer would be to include "time of receipt of report" in the permanent record, and then subsequent analysis could make its own decisions about whether to believe the snapshot timestamp --- plus we could track elapsed times for builds, which could be interesting in itself. 2. I was annoyed repeatedly that some buildfarm members weren't reporting log_archive_filenames entries, which forced going the long way round in the process I was using. Seems like we need some more proactive means for getting buildfarm owners to keep their script versions up-to-date. Not sure what that should look like exactly, as long as it's not "you can run an ancient version as long as you please". regards, tom lane
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > Also, for completeness, the causes I wrote off as not interesting > (anymore, in some cases): > > missing BYTE_ORDER definition for Solaris | 2007-01-10 14:18:23 | 1 What is this BYTE_ORDER macro? Should I be using it instead of the AC_C_BIGENDIAN test in configure for the packed varlena patch? > row-ordering discrepancy in rowtypes test | 2007-02-10 03:00:02 | 3 Is this because the test is fixed or unfixable? If not shouldn't the test get an ORDER BY clause so that it will reliably pass on future versions? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory Stark wrote: > "Tom Lane" <tgl@sss.pgh.pa.us> writes: > >> Also, for completeness, the causes I wrote off as not interesting >> (anymore, in some cases): >> >> missing BYTE_ORDER definition for Solaris | 2007-01-10 14:18:23 | 1 > > What is this BYTE_ORDER macro? Should I be using it instead of the > AC_C_BIGENDIAN test in configure for the packed varlena patch? FYI: this is the relevant commit (the affected buildfarm member was clownfish) http://archives.postgresql.org/pgsql-committers/2007-01/msg00154.php Stefan
Tom Lane wrote: > BTW, before I forget, this little project turned up a couple of > small improvements for the current buildfarm infrastructure: > > 1. There are half a dozen entries with obviously bogus timestamps: > > bfarm=# select sysname,snapshot,branch from mfailures where snapshot < '2004-01-01'; > sysname | snapshot | branch > ------------+---------------------+-------- > corgi | 1997-10-14 14:20:10 | HEAD > kookaburra | 1970-01-01 01:23:00 | HEAD > corgi | 1997-09-30 11:47:08 | HEAD > corgi | 1997-10-17 14:20:11 | HEAD > corgi | 1997-12-21 15:20:11 | HEAD > corgi | 1997-10-15 14:20:10 | HEAD > corgi | 1997-09-28 11:47:09 | HEAD > corgi | 1997-09-28 11:47:08 | HEAD > (8 rows) > > indicating wrong system clock settings on these buildfarm machines. > (Indeed, IIRC these failures were actually caused by the ridiculous > clock settings --- we have at least one regression test that checks > century >= 21 ...) Perhaps the buildfarm server should bounce > reports with timestamps more than a day in the past or a few minutes in > the future. I think though that a more useful answer would be to > include "time of receipt of report" in the permanent record, and then > subsequent analysis could make its own decisions about whether to > believe the snapshot timestamp --- plus we could track elapsed times for > builds, which could be interesting in itself. > We actually do timestamp the reports - I just didn't include that in the extract. I will alter the view it's based on. We started doing this in Nov 2005, so I'm going to restrict the view to cases where the report_time is not null - I doubt we're interested in ancient history. A revised extract is available at http://www.pgbuildfarm.org/mfailures2.dump We already reject snapshot times that are in the future. Use of NTP is highly recommended to buildfarm members, but I'm reluctant to make it mandatory, as they might not have it available. I think we can do this: alter the client script to report its idea of current time at the time it makes the web transaction. If it's off from the server time by more than some small value (say 60 secs), adjust the snapshot time accordingly. If they don't report it then we can reject insane dates (more than 24hours ago seems about right). So I agree with both your suggestions ;-) > 2. I was annoyed repeatedly that some buildfarm members weren't > reporting log_archive_filenames entries, which forced going the long > way round in the process I was using. Seems like we need some more > proactive means for getting buildfarm owners to keep their script > versions up-to-date. Not sure what that should look like exactly, > as long as it's not "you can run an ancient version as long as you > please". > > > Modern clients report the versions of the two scripts involved (see script_version and web_script_version in reported config) so we could easily enforce a minimum version on these. cheers andrew
Gregory Stark <stark@enterprisedb.com> writes: > "Tom Lane" <tgl@sss.pgh.pa.us> writes: >> missing BYTE_ORDER definition for Solaris | 2007-01-10 14:18:23 | 1 > What is this BYTE_ORDER macro? Should I be using it instead of the > AC_C_BIGENDIAN test in configure for the packed varlena patch? Actually, if we start to rely on AC_C_BIGENDIAN, I'd prefer to see us get rid of direct usages of BYTE_ORDER. It looks like only contrib/pgcrypto is depending on it today, but we've got lots of cruft in the include/port/ files supporting that. >> row-ordering discrepancy in rowtypes test | 2007-02-10 03:00:02 | 3 > Is this because the test is fixed or unfixable? It's fixed. http://archives.postgresql.org/pgsql-committers/2007-02/msg00228.php regards, tom lane
"Gregory Stark" <stark@enterprisedb.com> writes: > "Tom Lane" <tgl@sss.pgh.pa.us> writes: > >> row-ordering discrepancy in rowtypes test | 2007-02-10 03:00:02 | 3 > > Is this because the test is fixed or unfixable? If not shouldn't the test get > an ORDER BY clause so that it will reliably pass on future versions? Hm, I took a quick look at this test and while there are a couple tests missing ORDER BY clauses I can't see how they could possibly generate results that are out of order. Perhaps the ones that do have ORDER BY clauses only recently acquired them? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
"Joshua D. Drake" <jd@commandprompt.com> writes: > Tom Lane wrote: >> The current buildfarm webpages make it easy to see when a branch tip >> is seriously broken, but it's not very easy to investigate transient >> failures, such as a regression test race condition that only >> materializes once in awhile. > If the data is already there and just not represented, just let me know > exactly what you want and I will implement pages for that data happily. I think what would be nice is some way to view all the failures for a given branch, extending back not-sure-how-far. Right now the only way to see past failures is to look at individual machines' histories, which is not real satisfactory when you want a broader view. Actually what I *really* want is something closer to "show me all the unexplained failures", but unless Andrew is willing to support some way of tagging failures in the master database, I suppose that won't happen. regards, tom lane
Tom Lane wrote: > I think what would be nice is some way to view all the failures for a > given branch, extending back not-sure-how-far. Right now the only way > to see past failures is to look at individual machines' histories, which > is not real satisfactory when you want a broader view. > > Actually what I *really* want is something closer to "show me all the > unexplained failures", but unless Andrew is willing to support some way > of tagging failures in the master database, I suppose that won't happen. > > > Well, if I understood how it might work it might happen. Who would do the tagging, and how? cheers andrew
I wrote: >> 2. I was annoyed repeatedly that some buildfarm members weren't >> reporting log_archive_filenames entries, which forced going the long >> way round in the process I was using. Seems like we need some more >> proactive means for getting buildfarm owners to keep their script >> versions up-to-date. Not sure what that should look like exactly, >> as long as it's not "you can run an ancient version as long as you >> please". >> >> >> > > Modern clients report the versions of the two scripts involved (see > script_version and web_script_version in reported config) so we could > easily enforce a minimum version on these. > Meanwhile, the owner of the main 2 offending machines has said he will upgrade them. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > Tom Lane wrote: >> Actually what I *really* want is something closer to "show me all the >> unexplained failures", but unless Andrew is willing to support some way >> of tagging failures in the master database, I suppose that won't happen. > Who would do the tagging, and how? Well, that's the hard part isn't it? I was sort of envisioning a group of users who'd be authorized to log in and set tags on database entries somehow. I'm not sure about details. One issue is that the majority of failures come in batches (when one of us commits a bad patch). With the current web interface it would be real tedious to verify which of the failures in a particular time interval matched the symptoms of a failure. What I did for my experiment this weekend was to download the last-stage-log of each failed build, which required an hour or so of setup time; then I could use grep to confirm which logs matched a failure that I'd identified. Doing that through the current webpage would involve lots of clicking and waiting. If we could expose a text-search-style API for grepping the stage logs, it'd be a lot easier to collect related failures. Then maybe a few widgets to let authorized users apply a tag to the search results ... I'm not entirely sure that this infrastructure would pay for itself, though. Without some users willing to take the time to separate explained from unexplained failures, it'd be a waste of effort. But we've already had a couple of cases of interesting failures going unnoticed because of the noise level. Between duplicate reports about busted patches and transient problems on particular build machines (out of disk space, misconfiguration, etc) it's pretty hard to not miss the once-in-a-while failures. Is there some other way we could attack that problem? regards, tom lane
Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: > >> Tom Lane wrote: >> >>> Actually what I *really* want is something closer to "show me all the >>> unexplained failures", but unless Andrew is willing to support some way >>> of tagging failures in the master database, I suppose that won't happen. >>> > > >> Who would do the tagging, and how? >> > > Well, that's the hard part isn't it? I was sort of envisioning a group > of users who'd be authorized to log in and set tags on database entries > somehow. I'm not sure about details. One issue is that the majority > of failures come in batches (when one of us commits a bad patch). > With the current web interface it would be real tedious to verify which > of the failures in a particular time interval matched the symptoms of > a failure. What I did for my experiment this weekend was to download > the last-stage-log of each failed build, which required an hour or so > of setup time; then I could use grep to confirm which logs matched a > failure that I'd identified. Doing that through the current webpage > would involve lots of clicking and waiting. If we could expose a > text-search-style API for grepping the stage logs, it'd be a lot easier > to collect related failures. Then maybe a few widgets to let authorized > users apply a tag to the search results ... > > I'm not entirely sure that this infrastructure would pay for itself, > though. Without some users willing to take the time to separate > explained from unexplained failures, it'd be a waste of effort. > But we've already had a couple of cases of interesting failures going > unnoticed because of the noise level. Between duplicate reports about > busted patches and transient problems on particular build machines > (out of disk space, misconfiguration, etc) it's pretty hard to not miss > the once-in-a-while failures. Is there some other way we could attack > that problem? > > I'm not too sanguine about having a team of eager taggers. I think we probably need to work on a usable API for extracting data in small or large amounts, and maybe some good text search facilities. The real issue is the one you identify of stuff getting lost in the noise. But I'm not sure there's any realistic cure for that. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > Tom Lane wrote: >> But we've already had a couple of cases of interesting failures going >> unnoticed because of the noise level. Between duplicate reports about >> busted patches and transient problems on particular build machines >> (out of disk space, misconfiguration, etc) it's pretty hard to not miss >> the once-in-a-while failures. Is there some other way we could attack >> that problem? > The real issue is the one you identify of stuff getting lost in the > noise. But I'm not sure there's any realistic cure for that. Maybe we should think about filtering the noise. Like, say, discarding every report from mongoose that involves an icc core dump ... http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=mongoose&dt=2007-03-20%2006:30:01 That's only semi-serious, but I do think that it's getting harder to pluck the wheat from the chaff. My investigations over the weekend showed that we have got basically three categories of reports: 1. genuine code breakage from unportable patches: normally multiple reports over a short period until we fix or revert the cause. 2. failures on a single buildfarm member due to misconfiguration, hardware flakiness, etc. These are sometimes repeatable and sometimes not. 3. all the rest, of which some fraction represents bugs we need to fix, only we don't know they're there. In category 1 the buildfarm certainly pays for itself, but we'd hoped that it would help us spot less-reproducible errors too. The problem I'm seeing is that category 2 is overwhelming our ability to recognize patterns within category 3. How can we dial down the noise level? regards, tom lane
Re: Buildfarm feature request: some way to track/classify failures
From
Martijn van Oosterhout
Date:
On Tue, Mar 20, 2007 at 02:57:13AM -0400, Tom Lane wrote: > Maybe we should think about filtering the noise. Like, say, discarding > every report from mongoose that involves an icc core dump ... > http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=mongoose&dt=2007-03-20%2006:30:01 Maybe a simple compromise would be being able to setup a set of regexes that search the output and set a flag it that string is found. If you find the string, it gets marked with a flag, which means that when you look at mongoose, any failures that don't have the flag become easier to spot. It also means that once you've found a common failure, you can create the regex and then any other failures with the same string get tagged also, making unexplained ones easier to spot. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Martijn van Oosterhout wrote: > On Tue, Mar 20, 2007 at 02:57:13AM -0400, Tom Lane wrote: > >> Maybe we should think about filtering the noise. Like, say, discarding >> every report from mongoose that involves an icc core dump ... >> http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=mongoose&dt=2007-03-20%2006:30:01 >> > > Maybe a simple compromise would be being able to setup a set of regexes > that search the output and set a flag it that string is found. If you > find the string, it gets marked with a flag, which means that when you > look at mongoose, any failures that don't have the flag become easier > to spot. > > It also means that once you've found a common failure, you can create > the regex and then any other failures with the same string get tagged > also, making unexplained ones easier to spot. > > > You need to show first that this is an adequate tagging mechanism, both in tagging things adequately and in not picking up false positives, which would make things worse, not better. And even then you need someone to do the analysis to create the regex. The buildfarm works because it leverages our strength, namely automating things. But all the tagging suggestions I've seen will involve regular, repetitive and possibly boring work, precisely the thing we are not good at as a group. If we had some staff they could be given this task (among others), assuming we show that it actually works. We don't, so they can't. cheers andrew
Andrew Dunstan wrote: > Martijn van Oosterhout wrote: >> On Tue, Mar 20, 2007 at 02:57:13AM -0400, Tom Lane wrote: >> >>> Maybe we should think about filtering the noise. Like, say, discarding >>> every report from mongoose that involves an icc core dump ... >>> http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=mongoose&dt=2007-03-20%2006:30:01 >>> >>> >> >> Maybe a simple compromise would be being able to setup a set of regexes >> that search the output and set a flag it that string is found. If you >> find the string, it gets marked with a flag, which means that when you >> look at mongoose, any failures that don't have the flag become easier >> to spot. >> >> It also means that once you've found a common failure, you can create >> the regex and then any other failures with the same string get tagged >> also, making unexplained ones easier to spot. >> >> >> > > You need to show first that this is an adequate tagging mechanism, both > in tagging things adequately and in not picking up false positives, > which would make things worse, not better. And even then you need > someone to do the analysis to create the regex. > > The buildfarm works because it leverages our strength, namely automating > things. But all the tagging suggestions I've seen will involve regular, > repetitive and possibly boring work, precisely the thing we are not good > at as a group. this is probably true - however as a buildfarm admin I occasionally wished i had a way to invalidate reports generated from my boxes to prevent someone wasting time to investigate them (like errors caused by system upgrades,configuration problems or other local issues). But I agree that it might be difficult to make that "manual tagging" process scalable and reliable enough so that it really is an improvment over what we have now. Stefan
Stefan Kaltenbrunner wrote: > however as a buildfarm admin I occasionally wished i had a way to > invalidate reports generated from my boxes to prevent someone wasting > time to investigate them (like errors caused by system > upgrades,configuration problems or other local issues). > It would be extremely simply to provide a 'revoke report' API and client. Good idea. But that's quite different from what we have been discussing. cheers andrew
Andrew Dunstan wrote: > The buildfarm works because it leverages our strength, namely automating > things. But all the tagging suggestions I've seen will involve regular, > repetitive and possibly boring work, precisely the thing we are not good > at as a group. You may be forgetting that Martijn and others tagged the scan.coverity.com database. Now, there are some untagged errors, but I'd say that that's because we don't control the tool, so we cannot fix it if there are false positives. We do control the buildfarm however, so we can develop systematic solutions for widespread problems (instead of forcing us to checking and tagging every single occurance of widespread problems). -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera wrote: > Andrew Dunstan wrote: > > >> The buildfarm works because it leverages our strength, namely automating >> things. But all the tagging suggestions I've seen will involve regular, >> repetitive and possibly boring work, precisely the thing we are not good >> at as a group. >> > > You may be forgetting that Martijn and others tagged the > scan.coverity.com database. Now, there are some untagged errors, but > I'd say that that's because we don't control the tool, so we cannot fix > it if there are false positives. We do control the buildfarm however, > so we can develop systematic solutions for widespread problems (instead > of forcing us to checking and tagging every single occurance of > widespread problems). > > Well, I'm sure we can provide appropriate access or data for anyone who wants to do research in this area and prove me wrong. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > Martijn van Oosterhout wrote: >> Maybe a simple compromise would be being able to setup a set of regexes >> that search the output and set a flag it that string is found. If you >> find the string, it gets marked with a flag, which means that when you >> look at mongoose, any failures that don't have the flag become easier >> to spot. >> >> It also means that once you've found a common failure, you can create >> the regex and then any other failures with the same string get tagged >> also, making unexplained ones easier to spot. > You need to show first that this is an adequate tagging mechanism, both > in tagging things adequately and in not picking up false positives, > which would make things worse, not better. And even then you need > someone to do the analysis to create the regex. Well, my experiment over the weekend with doing exactly that convinced me that regexes could be used successfully to identify common-mode failures. So I think Martijn has a fine idea here. And I don't see a problem with lack of motivation, at least for those of us who try to pay attention to buildfarm results --- once you've looked at a couple of reports of the same issue, you really don't want to have to repeat the analysis over and over. But just assuming that every report on a particular day reflects the same breakage is exactly the risk I wish we didn't have to take. For a lot of cases there is not a need for an ongoing filter: we break something, we get a pile of reports, we fix it, and then we want to tag all the reports of that something so that we can see if anything else happened in the same interval. So for this, something based on an interactive search API would work fine. You could even use that for repetitive problems such as buildfarm misconfigurations, though having to repeat the search every so often would get old in the end. The main thing though is for the database to remember the tags once made. > The buildfarm works because it leverages our strength, namely automating > things. But all the tagging suggestions I've seen will involve regular, > repetitive and possibly boring work, precisely the thing we are not good > at as a group. Well, responding to bug reports could be called regular and repetitive work --- in reality I don't find it so, because every bug is different. The point I think you are missing is that having something like this will *eliminate* repetitive, boring work, namely recognizing multiple reports of the same problem. The buildfarm has gotten big enough that some way of dealing with that is desperately needed, else our ability to spot infrequently-reported issues will disappear entirely. regards, tom lane
Tom Lane wrote: > The point I think you are missing is that having something like this > will *eliminate* repetitive, boring work, namely recognizing multiple > reports of the same problem. The buildfarm has gotten big enough that > some way of dealing with that is desperately needed, else our ability > to spot infrequently-reported issues will disappear entirely. > > OK. How about if we have a table of <branch, failure_stage, regex, tag, description, start_date> plus some webby transactions for approved users to edit this? The wrinkle is that applying the tags on the fly is probably not a great idea - the status page query is already in desperate need of overhauling because it's too slow. So we'd need a daemon to set up the tags in the background. But that's an implementation detail. Screen real estate on the dashboard page is also in very short supply. Maybe we could play with the background colour, so that a tagged failure had, say, a blue background, as opposed to the red/pink/yellow we use for failures now. Again - an implementation detail. My biggest worry apart from maintenance (which doesn't matter that much - if people don't enter the regexes they don't get the tags they want) is that the regexes will not be specific enough, and so give false positives on the tags. Then if you're looking for things that aren't tagged you be even more likely than today to miss the outliers. Lord knows that regexes are hard to get right - I've been using them for a couple of decades and they've earned me lots of money, and I still get them wrong regularly (including several cases on the buildfarm). but maybe we need to take the plunge and see how it works. This would be a fine SOC project - I at least won't have time to develop it for quite some time. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > The wrinkle is that applying the tags on the fly is probably not a great > idea - the status page query is already in desperate need of overhauling > because it's too slow. So we'd need a daemon to set up the tags in the > background. But that's an implementation detail. Screen real estate on > the dashboard page is also in very short supply. Maybe we could play > with the background colour, so that a tagged failure had, say, a blue > background, as opposed to the red/pink/yellow we use for failures now. > Again - an implementation detail. I'm not sure that the current status dashboard needs to pay any attention to the tags. The view that I would like to have of "recent failures across all machines in a branch" is the one that needs to be tag-aware, and perhaps also the existing display of a given machine's branch history. > My biggest worry apart from maintenance (which doesn't matter that much > - if people don't enter the regexes they don't get the tags they want) > is that the regexes will not be specific enough, and so give false > positives on the tags. True. I strongly suggest that we want an interactive search-and-tag capability *before* worrying about automatic tagging --- one of the reasons for that is to provide a way to test a regex that you might then consider adding to the automatic filter for future reports. > This would be a fine SOC project - I at least won't have time to develop > it for quite some time. Agreed. Who's maintaining the SOC project list page? regards, tom lane
I don't know if this has come up yet but.... In terms of tagging errors we might be able to use some machine learning techniques. There are NLP/learning systems that interpret logs. They learn over time what is normal and what isn't and can flag things that are abnormal. For example, people are using support vector machines (SVM) analysis on log files to do intrusion detection. Here's a link for intrusion detection called Robust Anomaly Detection Using Support Vector Machines http://wwwcsif.cs.ucdavis.edu/~liaoy/research/ RSVM_Anomaly_journal.pdf This paper from IBM gives some more background information on how such a thing might work. http://www.research.ibm.com/journal/sj/413/ johnson.html I have previously used an open source toolkit from CMU called rainbow to do these types of analysis. -arturo
Arturo Perez wrote: > I don't know if this has come up yet but.... > > In terms of tagging errors we might be able to use some machine > learning techniques. > > > There are NLP/learning systems that interpret logs. They learn over > time what is normal and what isn't and can flag things that are abnormal. > > We can make extracts of the database (including the log data) available to anyone who wants to do research using any learning technique that appeals to them. cheers andrew
Re: Buildfarm feature request: some way to track/classify failures
From
Martijn van Oosterhout
Date:
On Tue, Mar 20, 2007 at 11:36:09AM -0400, Andrew Dunstan wrote: > My biggest worry apart from maintenance (which doesn't matter that much > - if people don't enter the regexes they don't get the tags they want) > is that the regexes will not be specific enough, and so give false > positives on the tags. Then if you're looking for things that aren't > tagged you be even more likely than today to miss the outliers. Lord I think you could solve that by displaying the text that matched the regex. If it starts matching odd things it'd be visible. But I'm just sprouting ideas here, the proof is in the pudding. If the logs are easily available (or a subset of, say the last month) then people could play with that and see what happens... Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Martijn van Oosterhout <kleptog@svana.org> writes: > But I'm just sprouting ideas here, the proof is in the pudding. If the > logs are easily available (or a subset of, say the last month) then > people could play with that and see what happens... Anyone who wants to play around can replicate what I did, which was to download the table that Andrew made available upthread, and then pull the log files matching interesting rows. I used the attached functions to generate URLs for the failing stage logs, and then a shell script looping over lwp-download ... CREATE FUNCTION lastfile(mfailures) RETURNS text AS $$ select replace( 'show_stage_log.pl?nm=' || $1.sysname || '&dt=' || $1.snapshot || '&stg=' || replace($1.log_archive_filenames[array_upper($1.log_archive_filenames,1)], '.log', ''), ' ', '%20') $$ LANGUAGE sql; CREATE FUNCTION lastlog(mfailures) RETURNS text AS $$ select 'http://www.pgbuildfarm.org/cgi-bin/' || lastfile($1) $$ LANGUAGE sql; regards, tom lane
Tom Lane wrote: > Martijn van Oosterhout <kleptog@svana.org> writes: > >> But I'm just sprouting ideas here, the proof is in the pudding. If the >> logs are easily available (or a subset of, say the last month) then >> people could play with that and see what happens... >> > > Anyone who wants to play around can replicate what I did, which was to > download the table that Andrew made available upthread, and then pull > the log files matching interesting rows. > [snip] To save people this trouble, I have made an extract for the last 3 months, augmented by log field, which is pretty much the last stage log. The dump is 27Mb and can be got at http://www.pgbuildfarm.org/tfailures.dmp cheers andrew
Andrew Dunstan wrote: > Tom Lane wrote: >> Martijn van Oosterhout <kleptog@svana.org> writes: >> >>> But I'm just sprouting ideas here, the proof is in the pudding. If the >>> logs are easily available (or a subset of, say the last month) then >>> people could play with that and see what happens... >>> >> >> Anyone who wants to play around can replicate what I did, which was to >> download the table that Andrew made available upthread, and then pull >> the log files matching interesting rows. > [snip] > > > To save people this trouble, I have made an extract for the last 3 > months, augmented by log field, which is pretty much the last stage log. > The dump is 27Mb and can be got at > > http://www.pgbuildfarm.org/tfailures.dmp Should we just automate this and make it a weekly? > > cheers > > andrew > > > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/
Joshua D. Drake wrote: > Andrew Dunstan wrote: > >> Tom Lane wrote: >> >>> Martijn van Oosterhout <kleptog@svana.org> writes: >>> >>> >>>> But I'm just sprouting ideas here, the proof is in the pudding. If the >>>> logs are easily available (or a subset of, say the last month) then >>>> people could play with that and see what happens... >>>> >>>> >>> Anyone who wants to play around can replicate what I did, which was to >>> download the table that Andrew made available upthread, and then pull >>> the log files matching interesting rows. >>> >> [snip] >> >> >> To save people this trouble, I have made an extract for the last 3 >> months, augmented by log field, which is pretty much the last stage log. >> The dump is 27Mb and can be got at >> >> http://www.pgbuildfarm.org/tfailures.dmp >> > > Should we just automate this and make it a weekly? > > Sure. Talk to me offline about it - very simple to do. cheers andrew