Thread: New regression test time

New regression test time

From
Josh Berkus
Date:
Hackers,

Per discussion on these tests, I ran "make check" against 9.4 head,
applied all of the regression tests other than DISCARD.

Time for 3 "make check" runs without new tests: 65.9s

Time for 3 "make check runs with new tests: 71.7s

So that's an increase of about 10% in test runtime (or 2 seconds per run
on my laptop), in order to greatly improve regression test coverage.
I'd say that splitting the tests is not warranted, and that we should go
ahead with these tests on their testing merits, not based on any extra
check time they might add.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: New regression test time

From
Andres Freund
Date:
On 2013-06-28 14:01:23 -0700, Josh Berkus wrote:
> Per discussion on these tests, I ran "make check" against 9.4 head,
> applied all of the regression tests other than DISCARD.
> 
> Time for 3 "make check" runs without new tests: 65.9s
> 
> Time for 3 "make check runs with new tests: 71.7s
> 
> So that's an increase of about 10% in test runtime (or 2 seconds per run
> on my laptop), in order to greatly improve regression test coverage.

How did you evaluate that coverage increased "greatly"? I am not
generally against these tests but I'd be surprised if the overall test
coverage improved noticeably by this. Which makes 10% runtime overhead
pretty hefty if the goal is to actually achieve a high coverage.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: New regression test time

From
Josh Berkus
Date:
> How did you evaluate that coverage increased "greatly"? I am not
> generally against these tests but I'd be surprised if the overall test
> coverage improved noticeably by this. Which makes 10% runtime overhead
> pretty hefty if the goal is to actually achieve a high coverage.

I was relying on Robins' numbers of coverage:

Patch to add more regression tests to ASYNC (/src/backend/commands/async).
Take code-coverage from 39% to 75%.

Please find attached a patch to take code-coverage of ALTER OPERATOR
FAMILY.. ADD / DROP (src/backend/commands/opclasscmds.c) from 50% to 87%.

Please find attached a patch to take code-coverage of LOCK TABLE (
src/backend/commands/lockcmds.c) from 57% to 84%.

... etc.

If we someday add so many tests that "make check" takes over a minute on
a modern laptop, then maybe it'll be worth talking about splitting the
test suite into "regular" and "extended".  However, it would require 15
more patch sets the size of Robins' to get there, so I don't see that
it's worth the trouble any time soon.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: New regression test time

From
Andres Freund
Date:
On 2013-06-28 14:46:10 -0700, Josh Berkus wrote:
> 
> > How did you evaluate that coverage increased "greatly"? I am not
> > generally against these tests but I'd be surprised if the overall test
> > coverage improved noticeably by this. Which makes 10% runtime overhead
> > pretty hefty if the goal is to actually achieve a high coverage.
> 
> I was relying on Robins' numbers of coverage:

Those improvements rather likely end up being an improvement a good bit
less than one percent for the whole binary.

> If we someday add so many tests that "make check" takes over a minute on
> a modern laptop, then maybe it'll be worth talking about splitting the
> test suite into "regular" and "extended".  However, it would require 15
> more patch sets the size of Robins' to get there, so I don't see that
> it's worth the trouble any time soon.

Was it actually an assert enabled build that you tested?

We currently can run make check with stuff like CLOBBER_CACHE_ALWAYS
which finds bugs pretty regularly. If we achieve a high coverage we
quite possibly can't anymore, at least not regularly.
So I actually think having two modes makes sense. Then we could also
link stuff like isolationtester automatically into the longer running
mode...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: New regression test time

From
Stephen Frost
Date:
* Josh Berkus (josh@agliodbs.com) wrote:
> So that's an increase of about 10% in test runtime (or 2 seconds per run
> on my laptop), in order to greatly improve regression test coverage.
> I'd say that splitting the tests is not warranted, and that we should go
> ahead with these tests on their testing merits, not based on any extra
> check time they might add.

For my 2c, +1 on this in general, in spite of the concerns.  Covering
cases that we don't is valuable in general and if we get a bit more
coverage for a few more seconds, it's worth it.

Also, if someone wants to split the test up, then they need to provide a
patch which does so.  I'm not against that, but I do not feel this
addition should be held up waiting for someone to implement such a
seperation- if anything, having the two things done independently would
probably be cleaner anyway.
Thanks,
    Stephen

Re: New regression test time

From
Fabien COELHO
Date:
>>> How did you evaluate that coverage increased "greatly"? I am not
>>> generally against these tests but I'd be surprised if the overall test
>>> coverage improved noticeably by this. Which makes 10% runtime overhead
>>> pretty hefty if the goal is to actually achieve a high coverage.
>>
>> I was relying on Robins' numbers of coverage:
>
> Those improvements rather likely end up being an improvement a good bit
> less than one percent for the whole binary.

Yes, but it is a valuable percent nevertheless.

As I understand it, the coverage is about the tested command logic. A lot 
this logic is dedicated to check permissions (can you add an operator to 
this database? ...) and to verify required conditions (is the function 
proposed for operator has the right signature? does the operator overwrite 
an existing one? ...).

-- 
Fabien.



Re: New regression test time

From
Alvaro Herrera
Date:
Josh Berkus escribió:
> Hackers,
> 
> Per discussion on these tests, I ran "make check" against 9.4 head,
> applied all of the regression tests other than DISCARD.
> 
> Time for 3 "make check" runs without new tests: 65.9s
> 
> Time for 3 "make check runs with new tests: 71.7s
> 
> So that's an increase of about 10% in test runtime (or 2 seconds per run
> on my laptop),

I see two problems with this report:
1. it creates a new installation for each run,
2. it only uses the serial schedule.

I care more about the parallel schedule than the serial one, because
since it obviously runs in less time, then I can run it often and not
worry about how long it takes.  On the other hand, the cost of the extra
initdb obviously means that the percentage is a bit lower than if you
were to compare test run time without considering the initdb step.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: New regression test time

From
Robert Haas
Date:
On Jun 29, 2013, at 12:36 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> I see two problems with this report:
> 1. it creates a new installation for each run,

But that's the normal way of running the tests anyway, isn't it?

> 2. it only uses the serial schedule.

make check uses the parallel schedule - did Josh indicate he did anything else?

...Robert


Re: New regression test time

From
Josh Berkus
Date:
> I see two problems with this report:
> 1. it creates a new installation for each run,

Yes, I'm running "make check"

> 2. it only uses the serial schedule.

Um, no:

parallel group (19 tests):  limit prepare copy2 plancache xml returning
conversion rowtypes largeobject temp truncate polymorphism with
without_oid sequence domain rangefuncs alter_table plpgsql

Out of curiosity, I tried a serial run (MAX_CONNECTIONS=1), which took
about 39s (with patches).

> I care more about the parallel schedule than the serial one, because
> since it obviously runs in less time, then I can run it often and not
> worry about how long it takes.  On the other hand, the cost of the extra
> initdb obviously means that the percentage is a bit lower than if you
> were to compare test run time without considering the initdb step.

Possibly, but I know I run "make check" not "make installcheck" when I'm
testing new code.  And the buildfarm, afaik, runs "make check".  And,
for that matter, who the heck cares?

The important thing is that "make check" still runs in < 30 seconds on a
standard laptop (an ultraportable, even), so it's not holding up a
change-test-change-test cycle.  If you were looking to prove something
else, then go for it.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: New regression test time

From
Andrew Dunstan
Date:
On 06/29/2013 03:57 PM, Josh Berkus wrote:
>> I see two problems with this report:
>> 1. it creates a new installation for each run,
> Yes, I'm running "make check"
>
>> 2. it only uses the serial schedule.
> Um, no:
>
> parallel group (19 tests):  limit prepare copy2 plancache xml returning
> conversion rowtypes largeobject temp truncate polymorphism with
> without_oid sequence domain rangefuncs alter_table plpgsql
>
> Out of curiosity, I tried a serial run (MAX_CONNECTIONS=1), which took
> about 39s (with patches).
>
>> I care more about the parallel schedule than the serial one, because
>> since it obviously runs in less time, then I can run it often and not
>> worry about how long it takes.  On the other hand, the cost of the extra
>> initdb obviously means that the percentage is a bit lower than if you
>> were to compare test run time without considering the initdb step.
> Possibly, but I know I run "make check" not "make installcheck" when I'm
> testing new code.  And the buildfarm, afaik, runs "make check".  And,
> for that matter, who the heck cares?

It runs both :-) We run "make check" early in the process to make sure 
we can at least get that far. We run "make installcheck" later, among 
other things to check that the tests work in different locales.


I think we need to have a better understanding of just what our standard 
regression tests do.

AIUI: They do test feature use and errors that have cropped up in the 
past that we need to beware of. They don't test every bug we've ever 
had, nor do they exercise every piece of code.

Maybe there is a good case for these last two in a different set of tests.

cheers

andrew




Re: New regression test time

From
Josh Berkus
Date:
On 06/29/2013 02:14 PM, Andrew Dunstan wrote:
> AIUI: They do test feature use and errors that have cropped up in the
> past that we need to beware of. They don't test every bug we've ever
> had, nor do they exercise every piece of code.

If we don't have a test for it, then we can break it in the future and
not know we've broken it until .0 is released.  Is that really a
direction we're happy going in?

> 
> Maybe there is a good case for these last two in a different set of tests.

If we had a different set of tests, that would be a valid argument.  But
we don't, so it's not.  And nobody has offered to write a feature to
split our tests either.

I have to say, I'm really surprised at the level of resistance people on
this list are showing to the idea of increasing test coverage. I thought
that Postgres was all about reliability?   For a project as mature as we
are, our test coverage is abysmal, and I think I'm starting to see why.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: New regression test time

From
Dann Corbit
Date:
-----Original Message-----
From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Josh Berkus
Sent: Saturday, June 29, 2013 3:00 PM
To: Andrew Dunstan
Cc: Alvaro Herrera; pgsql-hackers@postgresql.org; Robins Tharakan
Subject: Re: [HACKERS] New regression test time

On 06/29/2013 02:14 PM, Andrew Dunstan wrote:
> AIUI: They do test feature use and errors that have cropped up in the 
> past that we need to beware of. They don't test every bug we've ever 
> had, nor do they exercise every piece of code.

If we don't have a test for it, then we can break it in the future and not know we've broken it until .0 is released.
Isthat really a direction we're happy going in?
 

> 
> Maybe there is a good case for these last two in a different set of tests.

If we had a different set of tests, that would be a valid argument.  But we don't, so it's not.  And nobody has offered
towrite a feature to split our tests either.
 

I have to say, I'm really surprised at the level of resistance people on this list are showing to the idea of
increasingtest coverage. I thought
 
that Postgres was all about reliability?   For a project as mature as we
are, our test coverage is abysmal, and I think I'm starting to see why.
>>
An ounce of prevention is worth a pound of cure.
The cost of a bug rises exponentially, starting at requirements gathering and ending at the customer.
Where I work, we have two computer  rooms full of machines that run tests around the clock, 24x365.
Even so, a full regression takes well over a week because we perform hundreds of thousands of tests.
All choices of this kind are trade-offs.  But in such situations, my motto is:
"Do whatever will make the customer prosper the most."
IMO-YMMV
<<

Re: New regression test time

From
Andrew Dunstan
Date:
On 06/29/2013 05:59 PM, Josh Berkus wrote:

>> Maybe there is a good case for these last two in a different set of tests.
> If we had a different set of tests, that would be a valid argument.  But
> we don't, so it's not.  And nobody has offered to write a feature to
> split our tests either.
>
> I have to say, I'm really surprised at the level of resistance people on
> this list are showing to the idea of increasing test coverage. I thought
> that Postgres was all about reliability?   For a project as mature as we
> are, our test coverage is abysmal, and I think I'm starting to see why.
>


Dividing the tests into different sections is as simple as creating one 
schedule file per section.

I'm not at all resistant to it. In fact, of someone wants to set up 
separate sections and add new tests to the different sections I'll be 
more than happy to provide buildfarm support for it. Obvious candidates 
could include:
 * code coverage * bugs * tests too big to run in everyday developer use


cheers

andrew




Re: New regression test time

From
Josh Berkus
Date:
> 
> Dividing the tests into different sections is as simple as creating one
> schedule file per section.

Oh?  Huh.  I'd thought it would be much more complicated.  Well, by all
means, let's do it then.

I'm not personally convinced that the existing regression tests all
belong in the "default" section -- I think a lot of what we've chosen to
test or not test to date has been fairly arbitrary -- but we can tinker
with that later.

> I'm not at all resistant to it. In fact, of someone wants to set up
> separate sections and add new tests to the different sections I'll be
> more than happy to provide buildfarm support for it. Obvious candidates
> could include:
> 
>  * code coverage
>  * bugs
>  * tests too big to run in everyday developer use

So we'd want codecoverage and codecoverage-parallel then, presumably,
for Robins tests?  Is there really a reason to supply a non-parallel
version?  Would we want to include the core tests in those?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: New regression test time

From
Claudio Freire
Date:
On Sat, Jun 29, 2013 at 7:58 PM, Josh Berkus <josh@agliodbs.com> wrote:
>
>>
>> Dividing the tests into different sections is as simple as creating one
>> schedule file per section.
>
> Oh?  Huh.  I'd thought it would be much more complicated.  Well, by all
> means, let's do it then.

I think I should point out, since it seems to have gone unnoticed,
that there's a thread with a patch about exactly this (splitting
tests), the "big test separation POC".



Re: New regression test time

From
Stephen Frost
Date:
Josh,

* Josh Berkus (josh@agliodbs.com) wrote:
> If we don't have a test for it, then we can break it in the future and
> not know we've broken it until .0 is released.  Is that really a
> direction we're happy going in?

To be fair, AIUI anyway, certain companies have much larger regression
suites that they run new versions of PG against.  I'm sure they don't
have the platform coverage that the buildfarm does, but I expect no
small number of build farm animals would fall over under the strain of
that regression suite (or at least, they'd have trouble keeping up with
the commit rate).

I'm definitely pro-more-tests when they add more code/feature coverage
and are nominal in terms of additional time taken during 'make check',
but I seriously doubt we're ever going to have anything close to
complete code coverage in such a suite of tests.

> I have to say, I'm really surprised at the level of resistance people on
> this list are showing to the idea of increasing test coverage. I thought
> that Postgres was all about reliability?   For a project as mature as we
> are, our test coverage is abysmal, and I think I'm starting to see why.

I do think having these tests split into different groups would be
valuable.  It might even be good to split them into groups based on what
code they exercise and then devs might be able to decide "I'm working in
this area right now, I want the tests that are applicable to that code".
Might be a bit too much effort though too, dunno.  Would be really neat
if it was automated in some way. ;)
Thanks,        Stephen

Re: New regression test time

From
Fabien COELHO
Date:
> If we had a different set of tests, that would be a valid argument.  But
> we don't, so it's not.  And nobody has offered to write a feature to
> split our tests either.

I have done a POC. See:

https://commitfest.postgresql.org/action/patch_view?id=1170

What I have not done is to decide how to split tests in two buckets.

-- 
Fabien.



Re: New regression test time

From
Amit kapila
Date:
On Sunday, June 30, 2013 11:37 AM Fabien COELHO wrote:
>> If we had a different set of tests, that would be a valid argument.  But
>> we don't, so it's not.  And nobody has offered to write a feature to
>> split our tests either.

>I have done a POC. See:

> https://commitfest.postgresql.org/action/patch_view?id=1170

I think it is better to submit for next commit fest which is at below link:

https://commitfest.postgresql.org/action/commitfest_view?id=19


With Regards,
Amit Kapila.


Re: New regression test time

From
Fabien COELHO
Date:
>> https://commitfest.postgresql.org/action/patch_view?id=1170
>
> I think it is better to submit for next commit fest which is at below link:
>
> https://commitfest.postgresql.org/action/commitfest_view?id=19

I put it there as the discussion whether to accept or not Robins patches 
because of their possible impact on non-regression test time is right now, 
so it may make sense to look at it now, and it is a rather small patch. 
Otherwise, next commit fest is fine.

-- 
Fabien.



Re: New regression test time

From
Robins Tharakan
Date:

On 30 June 2013 02:33, Amit kapila <amit.kapila@huawei.com> wrote:

On Sunday, June 30, 2013 11:37 AM Fabien COELHO wrote:
>> If we had a different set of tests, that would be a valid argument.  But
>> we don't, so it's not.  And nobody has offered to write a feature to
>> split our tests either.

>I have done a POC. See:

> https://commitfest.postgresql.org/action/patch_view?id=1170

I think it is better to submit for next commit fest which is at below link:

https://commitfest.postgresql.org/action/commitfest_view?id=19


Hi,

- There is a certain value in having separate tests, just that for the big-tests to be any meaningful, if the buildfarm could run on a periodic (daily?) basis and send some kind of automated bug-reports. Without an automatic feedback, most may not inclined to run all tests before submitting a patch and there'd be a big pile up near a release.

- For now, the new tests that I submit for review (for next CF) would be for 'make check', until a 'make bigcheck' or whatever is up and running.

--
Robins Tharakan

Re: New regression test time

From
Jeff Janes
Date:
On Sat, Jun 29, 2013 at 3:43 PM, Andrew Dunstan <andrew@dunslane.net> wrote:

On 06/29/2013 05:59 PM, Josh Berkus wrote:

Maybe there is a good case for these last two in a different set of tests.
If we had a different set of tests, that would be a valid argument.  But
we don't, so it's not.  And nobody has offered to write a feature to
split our tests either.

I have to say, I'm really surprised at the level of resistance people on
this list are showing to the idea of increasing test coverage. I thought
that Postgres was all about reliability?   For a project as mature as we
are, our test coverage is abysmal, and I think I'm starting to see why.



Dividing the tests into different sections is as simple as creating one schedule file per section.

I'm not at all resistant to it. In fact, of someone wants to set up separate sections and add new tests to the different sections I'll be more than happy to provide buildfarm support for it. Obvious candidates could include:

 * code coverage
 * bugs
 * tests too big to run in everyday developer use


I don't really see a difference in the first two.  If we were sure the uncovered code had no bugs, we wouldn't need to cover it.  At least if you consider unintended behavior changes to be bugs.  I think it would make more sense to split them up by what computers it makes sense to run them on.

Tests that take too much RAM to be run by everyone.
Tests that take too many CPUs (in order to be meaningful) to run by everyone most of the time.
Tests that take too much disk space...
Tests that take too much wall-clock time....
And maybe that tests that take too much wall-clock time specifically under CLOBBER_CACHE_ALWAYS...

Some of these sets would probably be empty currently, because candidates that belong in them were not committed at all since they were not wanted in the default and they there was no other place to add them.

If we are very worried about how long the tests take, we should probably also spend some time trying to make the existing ones faster.  Parallelization does not cut the test time very much (~20% with 8 CPUs), because the tests are poorly packed.  In a parallel group all the tests finish fast except one, and the whole group is then dominated by that one test.  (The main goal of parallelization is probably not to make the test faster, but to make them more realistic from a concurrency perspective, but if there is little actual parallelism, it doesn't achieve that very well, either).  I don't know how much freedom there is to re-order the tests without breaking dependencies, though.  I think prepared_xacts and stats could be usefully run together, as both take a long time sleeping but impose little real load that would interfere with each other.  Perhaps prepared_xacts could be re-written to get what it needs without the long statement_timeouts.  Testing the timeout itself doesn't seem to be the goal.

Cheers,

Jeff

Re: New regression test time

From
Josh Berkus
Date:
On 06/30/2013 12:33 AM, Amit kapila wrote:
> 
> On Sunday, June 30, 2013 11:37 AM Fabien COELHO wrote:
>>> If we had a different set of tests, that would be a valid argument.  But
>>> we don't, so it's not.  And nobody has offered to write a feature to
>>> split our tests either.
> 
>> I have done a POC. See:
> 
>> https://commitfest.postgresql.org/action/patch_view?id=1170
> 
> I think it is better to submit for next commit fest which is at below link:
> 
> https://commitfest.postgresql.org/action/commitfest_view?id=19

I would argue for doing this in this CF, just so that we can have the
benefit of the extra tests for the next 3 months, and so that Andrew can
work on the buildfarm additions.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: New regression test time

From
Amit kapila
Date:
On Monday, July 01, 2013 8:37 AM Josh Berkus wrote:
On 06/30/2013 12:33 AM, Amit kapila wrote:
>
> On Sunday, June 30, 2013 11:37 AM Fabien COELHO wrote:
>>>> If we had a different set of tests, that would be a valid argument.  But
>>>> we don't, so it's not.  And nobody has offered to write a feature to
>>>> split our tests either.
>
>>> I have done a POC. See:
>
>>> https://commitfest.postgresql.org/action/patch_view?id=1170
>
>> I think it is better to submit for next commit fest which is at below link:
>>
>> https://commitfest.postgresql.org/action/commitfest_view?id=19

> I would argue for doing this in this CF, just so that we can have the
> benefit of the extra tests for the next 3 months, and so that Andrew can
> work on the buildfarm additions.

My mail was just a suggestion, as in general after start of CF all new patch submissions for review
are done in next CF.
However as for this particular patch, there are quite a few other dependent patches, so it will be good
if this patch gets committed in this CF only.
Also as a CFM, you are better person to decide about it.


With Regards,
Amit Kapila.


Re: New regression test time

From
David Fetter
Date:
On Sat, Jun 29, 2013 at 02:59:35PM -0700, Josh Berkus wrote:
> On 06/29/2013 02:14 PM, Andrew Dunstan wrote:
> > AIUI: They do test feature use and errors that have cropped up in the
> > past that we need to beware of. They don't test every bug we've ever
> > had, nor do they exercise every piece of code.
> 
> If we don't have a test for it, then we can break it in the future and
> not know we've broken it until .0 is released.  Is that really a
> direction we're happy going in?
> 
> > Maybe there is a good case for these last two in a different set of tests.
> 
> If we had a different set of tests, that would be a valid argument.  But
> we don't, so it's not.  And nobody has offered to write a feature to
> split our tests either.

With utmost respect, this just isn't true.  There is a "make coverage"
target that probably doesn't get enough exercise, but it's just the
kind of infrastructure you're describing.

> I have to say, I'm really surprised at the level of resistance
> people on this list are showing to the idea of increasing test
> coverage. I thought that Postgres was all about reliability?   For a
> project as mature as we are, our test coverage is abysmal, and I
> think I'm starting to see why.

Burdening hackers with extra time in ordinary compile cycles is the
wrong direction.  If anything, we should probably look at what tests
only routinely get run by our CI system--currently the buildfarm--and
which ones developers could reasonably be expected to wait until
post-push to run in day-to-day development.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: New regression test time

From
Andres Freund
Date:
On 2013-07-01 07:14:23 -0700, David Fetter wrote:
> > If we had a different set of tests, that would be a valid argument.  But
> > we don't, so it's not.  And nobody has offered to write a feature to
> > split our tests either.

> With utmost respect, this just isn't true.  There is a "make coverage"
> target that probably doesn't get enough exercise, but it's just the
> kind of infrastructure you're describing.

Uh? Isn't make coverage a target for collecting the generated coverage
data? Afaik it itself does *NOT* depend on any checks being run. And it
only does something sensible if --enable-coverage is passed to
./configure anyway.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: New regression test time

From
Robert Haas
Date:
Reviewing this thread, I think that the following people are in favor
of adding the tests to the existing schedule: Josh Berkus, Stephen
Frost, Fabien Coelho, Dann Corbit, and Jeff Janes.  And I think that
the following people are in favor of a new schedule: Andres Freund,
Andrew Dunstan, David Fetter, and possibly Alvaro Herrera.  I believe
Tom Lane has also expressed objections, though not on this thread.

So I think the first question we need to answer is: Should we just
reject Robins' patches en masse?  If we do that, then the rest of this
is moot.  If we don't do that, then the second question is whether we
should try to introduce a new schedule, and if so, whether we should
split out that new schedule before or after committing these patches
as they stand.

Here are my opinions, for what they are worth.  First, I think that
rejecting these new tests is a bad idea, although I've looked them
over a bit and I think there might be individual things we might want
to take out.  Second, I think that creating a new schedule is likely
to cost developers more time than it saves them.  Sure, you'll be able
to run the tests slightly faster, but when you commit something, break
the buildfarm (which does run those tests), and then have to go back
and fix the tests (or your patch), you'll waste more time doing that
than you saved by avoiding those few extra seconds of runtime.  Third,
I think if we do want to create a new schedule, it makes more sense to
commit the tests first and then split out the new schedule according
to some criteria that we devise.  There should be a principled reason
for putting tests in one schedule or the other; "all the tests that
Robins Tharakan wrote" is not a good filter criteria.

I'm willing to put effort into going through these patches and
figuring out which parts are worth committing, and commit them.
However, I don't want to (and should not) do that if the consensus is
to reject the patches altogether; or if people are not OK with the
approach proposed above, namely, commit it first and then, if it
causes problems, decide how to fix it.  Please help me understand what
the way forward is for this patch set.

Thanks,

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: New regression test time

From
Andrew Dunstan
Date:
On 07/02/2013 10:17 AM, Robert Haas wrote:
> Reviewing this thread, I think that the following people are in favor
> of adding the tests to the existing schedule: Josh Berkus, Stephen
> Frost, Fabien Coelho, Dann Corbit, and Jeff Janes.  And I think that
> the following people are in favor of a new schedule: Andres Freund,
> Andrew Dunstan, David Fetter, and possibly Alvaro Herrera.  I believe
> Tom Lane has also expressed objections, though not on this thread.
>
> So I think the first question we need to answer is: Should we just
> reject Robins' patches en masse?  If we do that, then the rest of this
> is moot.  If we don't do that, then the second question is whether we
> should try to introduce a new schedule, and if so, whether we should
> split out that new schedule before or after committing these patches
> as they stand.
>
> Here are my opinions, for what they are worth.  First, I think that
> rejecting these new tests is a bad idea, although I've looked them
> over a bit and I think there might be individual things we might want
> to take out.  Second, I think that creating a new schedule is likely
> to cost developers more time than it saves them.  Sure, you'll be able
> to run the tests slightly faster, but when you commit something, break
> the buildfarm (which does run those tests), and then have to go back
> and fix the tests (or your patch), you'll waste more time doing that
> than you saved by avoiding those few extra seconds of runtime.  Third,
> I think if we do want to create a new schedule, it makes more sense to
> commit the tests first and then split out the new schedule according
> to some criteria that we devise.  There should be a principled reason
> for putting tests in one schedule or the other; "all the tests that
> Robins Tharakan wrote" is not a good filter criteria.
>
> I'm willing to put effort into going through these patches and
> figuring out which parts are worth committing, and commit them.
> However, I don't want to (and should not) do that if the consensus is
> to reject the patches altogether; or if people are not OK with the
> approach proposed above, namely, commit it first and then, if it
> causes problems, decide how to fix it.  Please help me understand what
> the way forward is for this patch set.
>


I think I'm probably a bit misrepresented here. The question of what 
tests we have is distinct from the question of what schedule(s) they are 
in. We already have tests that are in NO schedule, IIRC. What is more, 
it's entirely possibly to invoke pg_regress with multiple --schedule 
arguments, so we could, for example, have a makefile target that would 
run both the check and some other schedule of longer running tests.

So my $0.02 says you should assess these tests on their own merits, and 
we can debate the schedule arrangements later.

cheers

andrew



Re: New regression test time

From
Fabien COELHO
Date:
> What is more, it's entirely possibly to invoke pg_regress with multiple 
> --schedule arguments, so we could, for example, have a makefile target 
> that would run both the check and some other schedule of longer running 
> tests.

I missed this fact, because I've not seen any example of multiple schedule 
option in regress, so in the split test POC I concatenated files for the 
same purpose, but multiple schedule options looks like a much better 
alternative.

-- 
Fabien.



Re: New regression test time

From
Noah Misch
Date:
On Tue, Jul 02, 2013 at 10:17:08AM -0400, Robert Haas wrote:
> So I think the first question we need to answer is: Should we just
> reject Robins' patches en masse?  If we do that, then the rest of this
> is moot.  If we don't do that, then the second question is whether we
> should try to introduce a new schedule, and if so, whether we should
> split out that new schedule before or after committing these patches
> as they stand.

It's sad to simply reject meaningful automated tests on the basis of doubt
that they're important enough to belong in every human-in-the-loop test run.
I share the broader vision for automated testing represented by these patches.

> Here are my opinions, for what they are worth.  First, I think that
> rejecting these new tests is a bad idea, although I've looked them
> over a bit and I think there might be individual things we might want
> to take out.  Second, I think that creating a new schedule is likely
> to cost developers more time than it saves them.  Sure, you'll be able
> to run the tests slightly faster, but when you commit something, break
> the buildfarm (which does run those tests), and then have to go back
> and fix the tests (or your patch), you'll waste more time doing that
> than you saved by avoiding those few extra seconds of runtime.  Third,
> I think if we do want to create a new schedule, it makes more sense to
> commit the tests first and then split out the new schedule according
> to some criteria that we devise.  There should be a principled reason
> for putting tests in one schedule or the other; "all the tests that
> Robins Tharakan wrote" is not a good filter criteria.

+1 for that plan.  I don't know whether placing certain tests outside the main
test sequence would indeed cost more time than it saves.  I definitely agree
that if these new tests should appear elsewhere, some of our existing tests
should also move there.

Thanks,
nm

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com



Re: New regression test time

From
Simon Riggs
Date:
On 2 July 2013 18:43, Noah Misch <noah@leadboat.com> wrote:
On Tue, Jul 02, 2013 at 10:17:08AM -0400, Robert Haas wrote:
> So I think the first question we need to answer is: Should we just
> reject Robins' patches en masse?  If we do that, then the rest of this
> is moot.  If we don't do that, then the second question is whether we
> should try to introduce a new schedule, and if so, whether we should
> split out that new schedule before or after committing these patches
> as they stand.

It's sad to simply reject meaningful automated tests on the basis of doubt
that they're important enough to belong in every human-in-the-loop test run.
I share the broader vision for automated testing represented by these patches.

+1  We should be encouraging people to submit automated tests. 

I share the annoyance of increasing the length of the automated test runs, and have watched them get longer and longer over time. But I run the tests because I want to see whether I broke anything and I stopped sitting and watching them run long ago.

Automated testing is about x10-100 faster than manual testing, so I see new tests as saving me time not wasting it.
 
> Here are my opinions, for what they are worth.  First, I think that
> rejecting these new tests is a bad idea, although I've looked them
> over a bit and I think there might be individual things we might want
> to take out.  Second, I think that creating a new schedule is likely
> to cost developers more time than it saves them.  Sure, you'll be able
> to run the tests slightly faster, but when you commit something, break
> the buildfarm (which does run those tests), and then have to go back
> and fix the tests (or your patch), you'll waste more time doing that
> than you saved by avoiding those few extra seconds of runtime.  Third,
> I think if we do want to create a new schedule, it makes more sense to
> commit the tests first and then split out the new schedule according
> to some criteria that we devise.  There should be a principled reason
> for putting tests in one schedule or the other; "all the tests that
> Robins Tharakan wrote" is not a good filter criteria.

+1 for that plan.  I don't know whether placing certain tests outside the main
test sequence would indeed cost more time than it saves.  I definitely agree
that if these new tests should appear elsewhere, some of our existing tests
should also move there.

Let's have a new schedule called minute-check with the objective to run the common tests in 60 secs.

We can continue to expand the normal schedules from here.

Anybody that wants short tests can run that, everyone else can run the full test suite.

We should be encouraging people to run the full test suite, not the fast one. 

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: New regression test time

From
Robert Haas
Date:
On Wed, Jul 3, 2013 at 2:28 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> It's sad to simply reject meaningful automated tests on the basis of doubt
>> that they're important enough to belong in every human-in-the-loop test
>> run.
>> I share the broader vision for automated testing represented by these
>> patches.
>
> +1  We should be encouraging people to submit automated tests.

It seems like there is now a clear consensus to proceed here, so I'll
start looking at committing some of these tests.

> Automated testing is about x10-100 faster than manual testing, so I see new
> tests as saving me time not wasting it.

+1.

> Let's have a new schedule called minute-check with the objective to run the
> common tests in 60 secs.
>
> We can continue to expand the normal schedules from here.
>
> Anybody that wants short tests can run that, everyone else can run the full
> test suite.
>
> We should be encouraging people to run the full test suite, not the fast
> one.

I like this idea, or some variant of it (fastcheck?).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: New regression test time

From
Simon Riggs
Date:
On 3 July 2013 15:43, Robert Haas <robertmhaas@gmail.com> wrote:
> Let's have a new schedule called minute-check with the objective to run the
> common tests in 60 secs.
>
> We can continue to expand the normal schedules from here.
>
> Anybody that wants short tests can run that, everyone else can run the full
> test suite.
>
> We should be encouraging people to run the full test suite, not the fast
> one.

I like this idea, or some variant of it (fastcheck?).

I thought about that, but then people will spend time trying to tune it.

If its called minute-check and it runs in 60 secs then they'll leave it alone... 

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: New regression test time

From
Josh Berkus
Date:
On 07/03/2013 07:43 AM, Robert Haas wrote:
> Let's have a new schedule called minute-check with the objective to run the
>> common tests in 60 secs.

Note that we're below 60s even with assert and CLOBBER_CACHE_ALWAYS, at
least on my laptop.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: New regression test time

From
Andrew Dunstan
Date:
On 07/03/2013 02:50 PM, Josh Berkus wrote:
> On 07/03/2013 07:43 AM, Robert Haas wrote:
>> Let's have a new schedule called minute-check with the objective to run the
>>> common tests in 60 secs.
> Note that we're below 60s even with assert and CLOBBER_CACHE_ALWAYS, at
> least on my laptop.
>

I find that hard to believe. On friarbird, which has 
CLOBBER_CACHE_ALWAYS turned on, "make check" takes 110 minutes. The same 
machine runs nightjar which takes 34 seconds.

cheers

andrew