Thread: Testing "workers launched" in expected output? Really?
So buildfarm member piculet just fell over like this: ================== pgsql.build/src/test/regress/regression.diffs ================== *** /home/andres/build/buildfarm-piculet/HEAD/pgsql.build/../pgsql/src/test/regress/expected/select_parallel.out 2018-02-2816:10:01.986941733 +0000 --- /home/andres/build/buildfarm-piculet/HEAD/pgsql.build/src/test/regress/results/select_parallel.out 2018-03-02 19:13:57.843939790+0000 *************** *** 485,495 **** QUERY PLAN -------------------------------------------------------------------------- Aggregate (actual rows=1 loops=1) ! -> Nested Loop (actual rows=98000 loops=1) -> Seq Scan on tenk2 (actual rows=10 loops=1) Filter: (thousand = 0) Rows Removed by Filter: 9990 ! -> Gather (actual rows=9800 loops=10) Workers Planned: 4 Workers Launched: 4 -> Parallel Seq Scan on tenk1 (actual rows=1960 loops=50) --- 485,495 ---- QUERY PLAN -------------------------------------------------------------------------- Aggregate (actual rows=1 loops=1) ! -> Nested Loop (actual rows=97836 loops=1) -> Seq Scan on tenk2 (actual rows=10 loops=1) Filter: (thousand = 0) Rows Removed by Filter: 9990 ! -> Gather (actual rows=9784 loops=10) Workers Planned: 4 Workers Launched: 4 -> Parallel Seq Scan on tenk1 (actual rows=1960 loops=50) ====================================================================== and now I am on the warpath. I have no idea whether or not the diff here is significant --- maybe it is --- but I am desperately unhappy that we have expected-output files that will fail if fewer than the expected number of workers launched. I find that absolutely unacceptable. It reminds me entirely too much of when I had to package MySQL for Red Hat, and half the time the package builds failed in Red Hat's buildfarm, because their tests weren't robust about passing on heavily loaded machines. I won't stand for our tests becoming like that. Perhaps we could deal with this by suppressing the Workers Planned/ Launched lines when we are suppressing costs? regards, tom lane
On Fri, Mar 2, 2018 at 2:28 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > So buildfarm member piculet just fell over like this: > > ================== pgsql.build/src/test/regress/regression.diffs ================== > *** /home/andres/build/buildfarm-piculet/HEAD/pgsql.build/../pgsql/src/test/regress/expected/select_parallel.out 2018-02-28 16:10:01.986941733 +0000 > --- /home/andres/build/buildfarm-piculet/HEAD/pgsql.build/src/test/regress/results/select_parallel.out 2018-03-02 19:13:57.843939790+0000 > *************** > *** 485,495 **** > QUERY PLAN > -------------------------------------------------------------------------- > Aggregate (actual rows=1 loops=1) > ! -> Nested Loop (actual rows=98000 loops=1) > -> Seq Scan on tenk2 (actual rows=10 loops=1) > Filter: (thousand = 0) > Rows Removed by Filter: 9990 > ! -> Gather (actual rows=9800 loops=10) > Workers Planned: 4 > Workers Launched: 4 > -> Parallel Seq Scan on tenk1 (actual rows=1960 loops=50) > --- 485,495 ---- > QUERY PLAN > -------------------------------------------------------------------------- > Aggregate (actual rows=1 loops=1) > ! -> Nested Loop (actual rows=97836 loops=1) > -> Seq Scan on tenk2 (actual rows=10 loops=1) > Filter: (thousand = 0) > Rows Removed by Filter: 9990 > ! -> Gather (actual rows=9784 loops=10) > Workers Planned: 4 > Workers Launched: 4 > -> Parallel Seq Scan on tenk1 (actual rows=1960 loops=50) > > ====================================================================== > > and now I am on the warpath. I have no idea whether or not the diff > here is significant --- maybe it is --- but I am desperately unhappy > that we have expected-output files that will fail if fewer than the > expected number of workers launched. I find that absolutely > unacceptable. It reminds me entirely too much of when I had to package > MySQL for Red Hat, and half the time the package builds failed in > Red Hat's buildfarm, because their tests weren't robust about passing > on heavily loaded machines. I won't stand for our tests becoming > like that. > > Perhaps we could deal with this by suppressing the Workers Planned/ > Launched lines when we are suppressing costs? /me blinks. So you're not upset about the thing that actually caused the failure, which looks very likely to be a bug in the commits I pushed today, but are instead upset about something that didn't cause a failure but which is shiny and nearby? Unless this is causing actual failures I don't think we should change it. It would be very sad if we started routinely getting Workers Launched < Workers Planned due to some newly-introduced bug and had no idea it was happening because we'd hidden it to avoid imaginary buildfarm failures. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Fri, Mar 2, 2018 at 2:28 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> ... now I am on the warpath. I have no idea whether or not the diff >> here is significant --- maybe it is --- but I am desperately unhappy >> that we have expected-output files that will fail if fewer than the >> expected number of workers launched. > Unless this is causing actual failures I don't think we should change > it. It would be very sad if we started routinely getting Workers > Launched < Workers Planned due to some newly-introduced bug and had no > idea it was happening because we'd hidden it to avoid imaginary > buildfarm failures. My point is that just because it isn't falling over on relatively-lightly-loaded buildfarm machines doesn't mean that it won't fall over in other environments. You don't want packagers cursing your name while trying to push out an emergency security fix because the regression tests are unstable in their environment. Or worse, they might adopt the expedient of skipping "make check" in their package recipes, which is not a good idea from anyone's standpoint. I should think that your recent experience with postgres_fdw (which is evidently still broken, BTW, see rhinoceros) would have convinced you that environmentally dependent regression test results are something to studiously avoid. We have enough trouble with unforeseen test deltas; putting in tests that have a blatantly obvious failure mechanism is just taunting the software gods. Who tend to take their revenge in unpleasant ways. If you insist on actual failures, perhaps I'll go set up about four concurrently-scheduled animals on prairiedog's host, and wait to see what happens ... regards, tom lane
On Fri, Mar 2, 2018 at 3:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > My point is that just because it isn't falling over on > relatively-lightly-loaded buildfarm machines doesn't mean that it won't > fall over in other environments. It doesn't mean it will, either. > I should think that your recent experience with postgres_fdw (which is > evidently still broken, BTW, see rhinoceros) would have convinced you > that environmentally dependent regression test results are something to > studiously avoid. I suspect the problem with the test is that it's testing for a plan that is only slightly better than the next-best plan, and so it takes very little to push it over the edge. That's sort of an environment issue, but it's not exactly the same thing you're worried about here. > We have enough trouble with unforeseen test deltas; > putting in tests that have a blatantly obvious failure mechanism is > just taunting the software gods. Who tend to take their revenge in > unpleasant ways. > > If you insist on actual failures, perhaps I'll go set up about four > concurrently-scheduled animals on prairiedog's host, and wait to see > what happens ... I will be curious to see the results of that experiment. Unless you press the machine so hard that fork() starts returning EAGAIN, I don't see why that should cause this to fail. And if you do that, then it's going to fail far beyond the ability of suppressing Workers Launched to cover the problem up. However, it's certainly possible that you've thought of some failure mode that I've missed, or that there is one which neither of us has considered. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 3 March 2018 at 04:27, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Mar 2, 2018 at 3:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> My point is that just because it isn't falling over on
> relatively-lightly-loaded buildfarm machines doesn't mean that it won't
> fall over in other environments.
It doesn't mean it will, either.
> I should think that your recent experience with postgres_fdw (which is
> evidently still broken, BTW, see rhinoceros) would have convinced you
> that environmentally dependent regression test results are something to
> studiously avoid.
I suspect the problem with the test is that it's testing for a plan
that is only slightly better than the next-best plan, and so it takes
very little to push it over the edge. That's sort of an environment
issue, but it's not exactly the same thing you're worried about here.
> We have enough trouble with unforeseen test deltas;
> putting in tests that have a blatantly obvious failure mechanism is
> just taunting the software gods. Who tend to take their revenge in
> unpleasant ways.
>
> If you insist on actual failures, perhaps I'll go set up about four
> concurrently-scheduled animals on prairiedog's host, and wait to see
> what happens ...
I will be curious to see the results of that experiment. Unless you
press the machine so hard that fork() starts returning EAGAIN, I don't
see why that should cause this to fail. And if you do that, then it's
going to fail far beyond the ability of suppressing Workers Launched
to cover the problem up. However, it's certainly possible that you've
thought of some failure mode that I've missed, or that there is one
which neither of us has considered.
We already perpetrate all sorts of horrid things to work around pg_regress's literal whole-file expected output matching. Hard-to-maintain and impossible-to-document alternates files being high on my list.
Rather than adding to the list of things we hide and suppress to cover the deficiencies in our regression test results pattern matching I wonder if it's time to fix the pattern matching. Provide for some limited wildcard features in pg_regress output pattern files, for example.
Easier said than done without requiring a total change in how all expected files are written, unfortunately. "magic tags" to open a regexp section of expected file are all well and good, but then we can't just use diff anymore...