Re: The real reason why TAP testing isn't ready for prime time - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: The real reason why TAP testing isn't ready for prime time
Date
Msg-id CAB7nPqQ-=Cve1xB0OQpQJPm+3RSjz=0BuR-eRerUZK83GkC13A@mail.gmail.com
Whole thread Raw
In response to The real reason why TAP testing isn't ready for prime time  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: The real reason why TAP testing isn't ready for prime time  (Michael Paquier <michael.paquier@gmail.com>)
List pgsql-hackers
On Mon, Jun 15, 2015 at 3:37 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Buildfarm member hamster has failed a pretty significant fraction of
> its recent runs in the BinInstallCheck step:
> http://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=hamster&br=HEAD
>
> Since other critters aren't equally distressed, it seems likely that
> this is just an out-of-disk-space type of problem.  But maybe it's
> trying to tell us that there's a genuine platform-specific bug there.
> In any case, I challenge anyone to figure out what's happening from
> the information available from the buildfarm logs.
>
> I don't know whether this is just that the buildfarm script isn't
> collecting data that it should be.  But my experiences with the
> TAP test scripts haven't been very positive.  When they fail, it
> takes a lot of digging to find out why.  Basically, that entire
> mechanism sucks as far as debuggability is concerned.

Indeed. I think that one step in the good direction would be to
replace all the calls to system and system_or_bail with a wrapper
routine that calls IPC::Run able to catch the logs and store those
logs in each test's base path. The same applies to pg_rewind tests.

> I think there is a good argument for turning this off in the buildfarm
> until there is a better way of identifying and solving problems.  It is
> not helping us that hamster is red half the time for undiscoverable
> reasons.  That just conditions people to ignore it, and it may well be
> masking real problems that the machine could be finding if it weren't
> failing at this step.

hamster is legendary slow and has a slow disc, hence it improves
chances of catching race conditions, and it is the only slow buildfarm
machine enabling the TAP tests (by comparison dangomushi has never
failed with the TAP tests) hence I would prefer thinking that the
problem is not specific to ArchLinux ARM. In this case the failure
seems to be related to the timing test servers stop and start even if
-w switch is used with pg_ctl, particularly that PGPORT is set to the
same value for all servers... Still, for the time being I don't mind
disabling them and just did so now. I will try to investigate further
on the machine itself.
-- 
Michael



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Memory Accounting v11
Next
From: David Kamholz
Date:
Subject: query execution time faster with geqo on than off: bug?