Thread: Buildfarm TAP testing is useless as currently implemented
I challenge anybody to figure out what happened here: http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet&dt=2015-07-27%2010%3A25%3A17 or here: http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamster&dt=2015-07-04%2016%3A00%3A23 or here: http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&dt=2015-07-07%2016%3A35%3A06 With no visibility of pg_ctl's output, and no copy of the postmaster log, there is no chance of debugging intermittent failures like this one. This isn't entirely the buildfarm's fault --- AFAICS, prove-based testing has inadequate error reporting by design. If "not ok" isn't enough information for you, tough beans. (It might help if the farm script captured the postmaster log after a failure, but that would do nothing for prove's unwillingness to pass through client-side messages.) I think we should disable TAP testing in the buildfarm until there is some credible form of error reporting for it. I've grown tired of looking into buildfarm failure reports only to meet a dead end. Aside from the wasted investigation time, which admittedly isn't huge, there's an opportunity cost in that subsequent test steps didn't get run. regards, tom lane
On 07/27/2015 05:06 PM, Tom Lane wrote: > I challenge anybody to figure out what happened here: > http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet&dt=2015-07-27%2010%3A25%3A17 > or here: > http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamster&dt=2015-07-04%2016%3A00%3A23 > or here: > http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&dt=2015-07-07%2016%3A35%3A06 > > With no visibility of pg_ctl's output, and no copy of the postmaster log, > there is no chance of debugging intermittent failures like this one. > This isn't entirely the buildfarm's fault --- AFAICS, prove-based testing > has inadequate error reporting by design. If "not ok" isn't enough > information for you, tough beans. (It might help if the farm script > captured the postmaster log after a failure, but that would do nothing > for prove's unwillingness to pass through client-side messages.) Yep. > I think we should disable TAP testing in the buildfarm until there is > some credible form of error reporting for it. I've grown tired of > looking into buildfarm failure reports only to meet a dead end. > Aside from the wasted investigation time, which admittedly isn't huge, > there's an opportunity cost in that subsequent test steps didn't get run. Commit 1ea06203b - Improve logging of TAP tests - made it a lot better. The pg_ctl log should be in the log file now. The buildfarm doesn't seem to capture those logs at the moment, but that should be easy to fix. - Heikki
On 07/27/2015 10:06 AM, Tom Lane wrote: > I challenge anybody to figure out what happened here: > http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet&dt=2015-07-27%2010%3A25%3A17 > or here: > http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamster&dt=2015-07-04%2016%3A00%3A23 > or here: > http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&dt=2015-07-07%2016%3A35%3A06 > > With no visibility of pg_ctl's output, and no copy of the postmaster log, > there is no chance of debugging intermittent failures like this one. > This isn't entirely the buildfarm's fault --- AFAICS, prove-based testing > has inadequate error reporting by design. If "not ok" isn't enough > information for you, tough beans. (It might help if the farm script > captured the postmaster log after a failure, but that would do nothing > for prove's unwillingness to pass through client-side messages.) > > I think we should disable TAP testing in the buildfarm until there is > some credible form of error reporting for it. I've grown tired of > looking into buildfarm failure reports only to meet a dead end. > Aside from the wasted investigation time, which admittedly isn't huge, > there's an opportunity cost in that subsequent test steps didn't get run. Well, it does create a lot of files that we don't pick up. An example list is show below, and I am attaching their contents in a single gzipped attachment. However, these are in the wrong location. This was a vpath build and yet these tmp_check directories are all created in the source tree. Let's fix that and then I'll set about having the buildfarm collect them. That should get us further down the track. cheers andrew /home/andrew/pgl/pg_head/src/bin/pg_controldata/tmp_check/log/regress_log_001_pg_controldata /home/andrew/pgl/pg_head/src/bin/pg_basebackup/tmp_check/log/regress_log_020_pg_receivexlog /home/andrew/pgl/pg_head/src/bin/pg_basebackup/tmp_check/log/regress_log_010_pg_basebackup /home/andrew/pgl/pg_head/src/bin/pg_rewind/regress_log /home/andrew/pgl/pg_head/src/bin/pg_rewind/tmp_check/log/regress_log_003_extrafiles /home/andrew/pgl/pg_head/src/bin/pg_rewind/tmp_check/log/regress_log_001_basic /home/andrew/pgl/pg_head/src/bin/pg_rewind/tmp_check/log/regress_log_002_databases /home/andrew/pgl/pg_head/src/bin/scripts/tmp_check/log/regress_log_100_vacuumdb /home/andrew/pgl/pg_head/src/bin/scripts/tmp_check/log/regress_log_091_reindexdb_all /home/andrew/pgl/pg_head/src/bin/scripts/tmp_check/log/regress_log_050_dropdb /home/andrew/pgl/pg_head/src/bin/scripts/tmp_check/log/regress_log_070_dropuser /home/andrew/pgl/pg_head/src/bin/scripts/tmp_check/log/regress_log_020_createdb /home/andrew/pgl/pg_head/src/bin/scripts/tmp_check/log/regress_log_102_vacuumdb_stages /home/andrew/pgl/pg_head/src/bin/scripts/tmp_check/log/regress_log_030_createlang /home/andrew/pgl/pg_head/src/bin/scripts/tmp_check/log/regress_log_060_droplang /home/andrew/pgl/pg_head/src/bin/scripts/tmp_check/log/regress_log_040_createuser /home/andrew/pgl/pg_head/src/bin/scripts/tmp_check/log/regress_log_010_clusterdb /home/andrew/pgl/pg_head/src/bin/scripts/tmp_check/log/regress_log_011_clusterdb_all /home/andrew/pgl/pg_head/src/bin/scripts/tmp_check/log/regress_log_101_vacuumdb_all /home/andrew/pgl/pg_head/src/bin/scripts/tmp_check/log/regress_log_090_reindexdb /home/andrew/pgl/pg_head/src/bin/scripts/tmp_check/log/regress_log_080_pg_isready /home/andrew/pgl/pg_head/src/bin/pg_config/tmp_check/log/regress_log_001_pg_config /home/andrew/pgl/pg_head/src/bin/pg_ctl/tmp_check/log/regress_log_001_start_stop /home/andrew/pgl/pg_head/src/bin/pg_ctl/tmp_check/log/regress_log_002_status /home/andrew/pgl/pg_head/src/bin/initdb/tmp_check/log/regress_log_001_initdb
Attachment
On Tue, Jul 28, 2015 at 1:15 AM, Andrew Dunstan <andrew@dunslane.net> wrote: > Well, it does create a lot of files that we don't pick up. An example list > is show below, and I am attaching their contents in a single gzipped > attachment. However, these are in the wrong location. This was a vpath build > and yet these tmp_check directories are all created in the source tree. > Let's fix that and then I'll set about having the buildfarm collect them. > That should get us further down the track. > > [log list] The patch attached fixes that. I suggest that we use env{TESTDIR}/log as a location for the logs so as even a vpath build will locate correctly the log files. -- Michael
Attachment
On 07/27/2015 12:15 PM, Andrew Dunstan wrote: > > On 07/27/2015 10:06 AM, Tom Lane wrote: >> I challenge anybody to figure out what happened here: >> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet&dt=2015-07-27%2010%3A25%3A17 >> >> or here: >> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamster&dt=2015-07-04%2016%3A00%3A23 >> >> or here: >> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&dt=2015-07-07%2016%3A35%3A06 >> >> >> With no visibility of pg_ctl's output, and no copy of the postmaster >> log, >> there is no chance of debugging intermittent failures like this one. >> This isn't entirely the buildfarm's fault --- AFAICS, prove-based >> testing >> has inadequate error reporting by design. If "not ok" isn't enough >> information for you, tough beans. (It might help if the farm script >> captured the postmaster log after a failure, but that would do nothing >> for prove's unwillingness to pass through client-side messages.) >> >> I think we should disable TAP testing in the buildfarm until there is >> some credible form of error reporting for it. I've grown tired of >> looking into buildfarm failure reports only to meet a dead end. >> Aside from the wasted investigation time, which admittedly isn't huge, >> there's an opportunity cost in that subsequent test steps didn't get >> run. > > > > Well, it does create a lot of files that we don't pick up. An example > list is show below, and I am attaching their contents in a single > gzipped attachment. However, these are in the wrong location. This was > a vpath build and yet these tmp_check directories are all created in > the source tree. Let's fix that and then I'll set about having the > buildfarm collect them. That should get us further down the track. > The situation should now be substantially improved. This buildfarm change <https://github.com/PGBuildFarm/client-code/commit/e684baacf9cb9f9d821be5088b15b336dc6aae07> uses today's core changes to pick up log files. See <http://www.pgbuildfarm.org/cgi-bin/show_stage_log.pl?nm=crake&dt=2015-07-28%2023%3A08%3A54&stg=bin-check> for an example. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: >> On 07/27/2015 10:06 AM, Tom Lane wrote: >>> I think we should disable TAP testing in the buildfarm until there is >>> some credible form of error reporting for it. > The situation should now be substantially improved. Hm, I was just thinking we weren't there yet, because: http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=axolotl&dt=2015-07-28%2023%3A03%3A39 > This buildfarm change > <https://github.com/PGBuildFarm/client-code/commit/e684baacf9cb9f9d821be5088b15b336dc6aae07> > uses today's core changes to pick up log files. Ah, so we need a new buildfarm script release? regards, tom lane
On 07/28/2015 08:58 PM, Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: >>> On 07/27/2015 10:06 AM, Tom Lane wrote: >>>> I think we should disable TAP testing in the buildfarm until there is >>>> some credible form of error reporting for it. >> The situation should now be substantially improved. > Hm, I was just thinking we weren't there yet, because: > http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=axolotl&dt=2015-07-28%2023%3A03%3A39 > >> This buildfarm change >> <https://github.com/PGBuildFarm/client-code/commit/e684baacf9cb9f9d821be5088b15b336dc6aae07> >> uses today's core changes to pick up log files. > Ah, so we need a new buildfarm script release? > > Yeah. I'll push one in the next couple of days. cheers andrew