Thread: Testing "workers launched" in expected output? Really?

Testing "workers launched" in expected output? Really?

From

Tom Lane

Date:

03 March 2018, 01:28:54

So buildfarm member piculet just fell over like this:

================== pgsql.build/src/test/regress/regression.diffs ==================
*** /home/andres/build/buildfarm-piculet/HEAD/pgsql.build/../pgsql/src/test/regress/expected/select_parallel.out
2018-02-2816:10:01.986941733 +0000 
--- /home/andres/build/buildfarm-piculet/HEAD/pgsql.build/src/test/regress/results/select_parallel.out    2018-03-02
19:13:57.843939790+0000 
***************
*** 485,495 ****
                                  QUERY PLAN
  --------------------------------------------------------------------------
   Aggregate (actual rows=1 loops=1)
!    ->  Nested Loop (actual rows=98000 loops=1)
           ->  Seq Scan on tenk2 (actual rows=10 loops=1)
                 Filter: (thousand = 0)
                 Rows Removed by Filter: 9990
!          ->  Gather (actual rows=9800 loops=10)
                 Workers Planned: 4
                 Workers Launched: 4
                 ->  Parallel Seq Scan on tenk1 (actual rows=1960 loops=50)
--- 485,495 ----
                                  QUERY PLAN
  --------------------------------------------------------------------------
   Aggregate (actual rows=1 loops=1)
!    ->  Nested Loop (actual rows=97836 loops=1)
           ->  Seq Scan on tenk2 (actual rows=10 loops=1)
                 Filter: (thousand = 0)
                 Rows Removed by Filter: 9990
!          ->  Gather (actual rows=9784 loops=10)
                 Workers Planned: 4
                 Workers Launched: 4
                 ->  Parallel Seq Scan on tenk1 (actual rows=1960 loops=50)

======================================================================

and now I am on the warpath.  I have no idea whether or not the diff
here is significant --- maybe it is --- but I am desperately unhappy
that we have expected-output files that will fail if fewer than the
expected number of workers launched.  I find that absolutely
unacceptable.  It reminds me entirely too much of when I had to package
MySQL for Red Hat, and half the time the package builds failed in
Red Hat's buildfarm, because their tests weren't robust about passing
on heavily loaded machines.  I won't stand for our tests becoming
like that.

Perhaps we could deal with this by suppressing the Workers Planned/
Launched lines when we are suppressing costs?

            regards, tom lane

Re: Testing "workers launched" in expected output? Really?

From

Robert Haas

Date:

03 March 2018, 01:58:07

On Fri, Mar 2, 2018 at 2:28 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> So buildfarm member piculet just fell over like this:
>
> ================== pgsql.build/src/test/regress/regression.diffs ==================
> *** /home/andres/build/buildfarm-piculet/HEAD/pgsql.build/../pgsql/src/test/regress/expected/select_parallel.out
 2018-02-28 16:10:01.986941733 +0000
 
> --- /home/andres/build/buildfarm-piculet/HEAD/pgsql.build/src/test/regress/results/select_parallel.out  2018-03-02
19:13:57.843939790+0000
 
> ***************
> *** 485,495 ****
>                                   QUERY PLAN
>   --------------------------------------------------------------------------
>    Aggregate (actual rows=1 loops=1)
> !    ->  Nested Loop (actual rows=98000 loops=1)
>            ->  Seq Scan on tenk2 (actual rows=10 loops=1)
>                  Filter: (thousand = 0)
>                  Rows Removed by Filter: 9990
> !          ->  Gather (actual rows=9800 loops=10)
>                  Workers Planned: 4
>                  Workers Launched: 4
>                  ->  Parallel Seq Scan on tenk1 (actual rows=1960 loops=50)
> --- 485,495 ----
>                                   QUERY PLAN
>   --------------------------------------------------------------------------
>    Aggregate (actual rows=1 loops=1)
> !    ->  Nested Loop (actual rows=97836 loops=1)
>            ->  Seq Scan on tenk2 (actual rows=10 loops=1)
>                  Filter: (thousand = 0)
>                  Rows Removed by Filter: 9990
> !          ->  Gather (actual rows=9784 loops=10)
>                  Workers Planned: 4
>                  Workers Launched: 4
>                  ->  Parallel Seq Scan on tenk1 (actual rows=1960 loops=50)
>
> ======================================================================
>
> and now I am on the warpath.  I have no idea whether or not the diff
> here is significant --- maybe it is --- but I am desperately unhappy
> that we have expected-output files that will fail if fewer than the
> expected number of workers launched.  I find that absolutely
> unacceptable.  It reminds me entirely too much of when I had to package
> MySQL for Red Hat, and half the time the package builds failed in
> Red Hat's buildfarm, because their tests weren't robust about passing
> on heavily loaded machines.  I won't stand for our tests becoming
> like that.
>
> Perhaps we could deal with this by suppressing the Workers Planned/
> Launched lines when we are suppressing costs?

/me blinks.

So you're not upset about the thing that actually caused the failure,
which looks very likely to be a bug in the commits I pushed today, but
are instead upset about something that didn't cause a failure but
which is shiny and nearby?

Unless this is causing actual failures I don't think we should change
it.  It would be very sad if we started routinely getting Workers
Launched < Workers Planned due to some newly-introduced bug and had no
idea it was happening because we'd hidden it to avoid imaginary
buildfarm failures.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Testing "workers launched" in expected output? Really?

From

Tom Lane

Date:

03 March 2018, 02:13:13

Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, Mar 2, 2018 at 2:28 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> ... now I am on the warpath.  I have no idea whether or not the diff
>> here is significant --- maybe it is --- but I am desperately unhappy
>> that we have expected-output files that will fail if fewer than the
>> expected number of workers launched.

> Unless this is causing actual failures I don't think we should change
> it.  It would be very sad if we started routinely getting Workers
> Launched < Workers Planned due to some newly-introduced bug and had no
> idea it was happening because we'd hidden it to avoid imaginary
> buildfarm failures.

My point is that just because it isn't falling over on
relatively-lightly-loaded buildfarm machines doesn't mean that it won't
fall over in other environments.  You don't want packagers cursing your
name while trying to push out an emergency security fix because the
regression tests are unstable in their environment.  Or worse, they might
adopt the expedient of skipping "make check" in their package recipes,
which is not a good idea from anyone's standpoint.

I should think that your recent experience with postgres_fdw (which is
evidently still broken, BTW, see rhinoceros) would have convinced you
that environmentally dependent regression test results are something to
studiously avoid.  We have enough trouble with unforeseen test deltas;
putting in tests that have a blatantly obvious failure mechanism is
just taunting the software gods.  Who tend to take their revenge in
unpleasant ways.

If you insist on actual failures, perhaps I'll go set up about four
concurrently-scheduled animals on prairiedog's host, and wait to see
what happens ...

            regards, tom lane

Re: Testing "workers launched" in expected output? Really?

From

Robert Haas

Date:

03 March 2018, 02:27:57

On Fri, Mar 2, 2018 at 3:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> My point is that just because it isn't falling over on
> relatively-lightly-loaded buildfarm machines doesn't mean that it won't
> fall over in other environments.

It doesn't mean it will, either.

> I should think that your recent experience with postgres_fdw (which is
> evidently still broken, BTW, see rhinoceros) would have convinced you
> that environmentally dependent regression test results are something to
> studiously avoid.

I suspect the problem with the test is that it's testing for a plan
that is only slightly better than the next-best plan, and so it takes
very little to push it over the edge.  That's sort of an environment
issue, but it's not exactly the same thing you're worried about here.

> We have enough trouble with unforeseen test deltas;
> putting in tests that have a blatantly obvious failure mechanism is
> just taunting the software gods.  Who tend to take their revenge in
> unpleasant ways.
>
> If you insist on actual failures, perhaps I'll go set up about four
> concurrently-scheduled animals on prairiedog's host, and wait to see
> what happens ...

I will be curious to see the results of that experiment.  Unless you
press the machine so hard that fork() starts returning EAGAIN, I don't
see why that should cause this to fail.  And if you do that, then it's
going to fail far beyond the ability of suppressing Workers Launched
to cover the problem up.  However, it's certainly possible that you've
thought of some failure mode that I've missed, or that there is one
which neither of us has considered.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Testing "workers launched" in expected output? Really?

From

Craig Ringer

Date:

03 March 2018, 05:13:26

On 3 March 2018 at 04:27, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Mar 2, 2018 at 3:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> My point is that just because it isn't falling over on
> relatively-lightly-loaded buildfarm machines doesn't mean that it won't
> fall over in other environments.

It doesn't mean it will, either.

> I should think that your recent experience with postgres_fdw (which is
> evidently still broken, BTW, see rhinoceros) would have convinced you
> that environmentally dependent regression test results are something to
> studiously avoid.

I suspect the problem with the test is that it's testing for a plan
that is only slightly better than the next-best plan, and so it takes
very little to push it over the edge. That's sort of an environment
issue, but it's not exactly the same thing you're worried about here.

> We have enough trouble with unforeseen test deltas;
> putting in tests that have a blatantly obvious failure mechanism is
> just taunting the software gods. Who tend to take their revenge in
> unpleasant ways.
>
> If you insist on actual failures, perhaps I'll go set up about four
> concurrently-scheduled animals on prairiedog's host, and wait to see
> what happens ...

I will be curious to see the results of that experiment. Unless you
press the machine so hard that fork() starts returning EAGAIN, I don't
see why that should cause this to fail. And if you do that, then it's
going to fail far beyond the ability of suppressing Workers Launched
to cover the problem up. However, it's certainly possible that you've
thought of some failure mode that I've missed, or that there is one
which neither of us has considered.

We already perpetrate all sorts of horrid things to work around pg_regress's literal whole-file expected output matching. Hard-to-maintain and impossible-to-document alternates files being high on my list.

Rather than adding to the list of things we hide and suppress to cover the deficiencies in our regression test results pattern matching I wonder if it's time to fix the pattern matching. Provide for some limited wildcard features in pg_regress output pattern files, for example.

Easier said than done without requiring a total change in how all expected files are written, unfortunately. "magic tags" to open a regexp section of expected file are all well and good, but then we can't just use diff anymore...

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services