Thread: New regression test time
Hackers, Per discussion on these tests, I ran "make check" against 9.4 head, applied all of the regression tests other than DISCARD. Time for 3 "make check" runs without new tests: 65.9s Time for 3 "make check runs with new tests: 71.7s So that's an increase of about 10% in test runtime (or 2 seconds per run on my laptop), in order to greatly improve regression test coverage. I'd say that splitting the tests is not warranted, and that we should go ahead with these tests on their testing merits, not based on any extra check time they might add. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 2013-06-28 14:01:23 -0700, Josh Berkus wrote: > Per discussion on these tests, I ran "make check" against 9.4 head, > applied all of the regression tests other than DISCARD. > > Time for 3 "make check" runs without new tests: 65.9s > > Time for 3 "make check runs with new tests: 71.7s > > So that's an increase of about 10% in test runtime (or 2 seconds per run > on my laptop), in order to greatly improve regression test coverage. How did you evaluate that coverage increased "greatly"? I am not generally against these tests but I'd be surprised if the overall test coverage improved noticeably by this. Which makes 10% runtime overhead pretty hefty if the goal is to actually achieve a high coverage. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
> How did you evaluate that coverage increased "greatly"? I am not > generally against these tests but I'd be surprised if the overall test > coverage improved noticeably by this. Which makes 10% runtime overhead > pretty hefty if the goal is to actually achieve a high coverage. I was relying on Robins' numbers of coverage: Patch to add more regression tests to ASYNC (/src/backend/commands/async). Take code-coverage from 39% to 75%. Please find attached a patch to take code-coverage of ALTER OPERATOR FAMILY.. ADD / DROP (src/backend/commands/opclasscmds.c) from 50% to 87%. Please find attached a patch to take code-coverage of LOCK TABLE ( src/backend/commands/lockcmds.c) from 57% to 84%. ... etc. If we someday add so many tests that "make check" takes over a minute on a modern laptop, then maybe it'll be worth talking about splitting the test suite into "regular" and "extended". However, it would require 15 more patch sets the size of Robins' to get there, so I don't see that it's worth the trouble any time soon. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 2013-06-28 14:46:10 -0700, Josh Berkus wrote: > > > How did you evaluate that coverage increased "greatly"? I am not > > generally against these tests but I'd be surprised if the overall test > > coverage improved noticeably by this. Which makes 10% runtime overhead > > pretty hefty if the goal is to actually achieve a high coverage. > > I was relying on Robins' numbers of coverage: Those improvements rather likely end up being an improvement a good bit less than one percent for the whole binary. > If we someday add so many tests that "make check" takes over a minute on > a modern laptop, then maybe it'll be worth talking about splitting the > test suite into "regular" and "extended". However, it would require 15 > more patch sets the size of Robins' to get there, so I don't see that > it's worth the trouble any time soon. Was it actually an assert enabled build that you tested? We currently can run make check with stuff like CLOBBER_CACHE_ALWAYS which finds bugs pretty regularly. If we achieve a high coverage we quite possibly can't anymore, at least not regularly. So I actually think having two modes makes sense. Then we could also link stuff like isolationtester automatically into the longer running mode... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
* Josh Berkus (josh@agliodbs.com) wrote: > So that's an increase of about 10% in test runtime (or 2 seconds per run > on my laptop), in order to greatly improve regression test coverage. > I'd say that splitting the tests is not warranted, and that we should go > ahead with these tests on their testing merits, not based on any extra > check time they might add. For my 2c, +1 on this in general, in spite of the concerns. Covering cases that we don't is valuable in general and if we get a bit more coverage for a few more seconds, it's worth it. Also, if someone wants to split the test up, then they need to provide a patch which does so. I'm not against that, but I do not feel this addition should be held up waiting for someone to implement such a seperation- if anything, having the two things done independently would probably be cleaner anyway. Thanks, Stephen
>>> How did you evaluate that coverage increased "greatly"? I am not >>> generally against these tests but I'd be surprised if the overall test >>> coverage improved noticeably by this. Which makes 10% runtime overhead >>> pretty hefty if the goal is to actually achieve a high coverage. >> >> I was relying on Robins' numbers of coverage: > > Those improvements rather likely end up being an improvement a good bit > less than one percent for the whole binary. Yes, but it is a valuable percent nevertheless. As I understand it, the coverage is about the tested command logic. A lot this logic is dedicated to check permissions (can you add an operator to this database? ...) and to verify required conditions (is the function proposed for operator has the right signature? does the operator overwrite an existing one? ...). -- Fabien.
Josh Berkus escribió: > Hackers, > > Per discussion on these tests, I ran "make check" against 9.4 head, > applied all of the regression tests other than DISCARD. > > Time for 3 "make check" runs without new tests: 65.9s > > Time for 3 "make check runs with new tests: 71.7s > > So that's an increase of about 10% in test runtime (or 2 seconds per run > on my laptop), I see two problems with this report: 1. it creates a new installation for each run, 2. it only uses the serial schedule. I care more about the parallel schedule than the serial one, because since it obviously runs in less time, then I can run it often and not worry about how long it takes. On the other hand, the cost of the extra initdb obviously means that the percentage is a bit lower than if you were to compare test run time without considering the initdb step. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Jun 29, 2013, at 12:36 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > I see two problems with this report: > 1. it creates a new installation for each run, But that's the normal way of running the tests anyway, isn't it? > 2. it only uses the serial schedule. make check uses the parallel schedule - did Josh indicate he did anything else? ...Robert
> I see two problems with this report: > 1. it creates a new installation for each run, Yes, I'm running "make check" > 2. it only uses the serial schedule. Um, no: parallel group (19 tests): limit prepare copy2 plancache xml returning conversion rowtypes largeobject temp truncate polymorphism with without_oid sequence domain rangefuncs alter_table plpgsql Out of curiosity, I tried a serial run (MAX_CONNECTIONS=1), which took about 39s (with patches). > I care more about the parallel schedule than the serial one, because > since it obviously runs in less time, then I can run it often and not > worry about how long it takes. On the other hand, the cost of the extra > initdb obviously means that the percentage is a bit lower than if you > were to compare test run time without considering the initdb step. Possibly, but I know I run "make check" not "make installcheck" when I'm testing new code. And the buildfarm, afaik, runs "make check". And, for that matter, who the heck cares? The important thing is that "make check" still runs in < 30 seconds on a standard laptop (an ultraportable, even), so it's not holding up a change-test-change-test cycle. If you were looking to prove something else, then go for it. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 06/29/2013 03:57 PM, Josh Berkus wrote: >> I see two problems with this report: >> 1. it creates a new installation for each run, > Yes, I'm running "make check" > >> 2. it only uses the serial schedule. > Um, no: > > parallel group (19 tests): limit prepare copy2 plancache xml returning > conversion rowtypes largeobject temp truncate polymorphism with > without_oid sequence domain rangefuncs alter_table plpgsql > > Out of curiosity, I tried a serial run (MAX_CONNECTIONS=1), which took > about 39s (with patches). > >> I care more about the parallel schedule than the serial one, because >> since it obviously runs in less time, then I can run it often and not >> worry about how long it takes. On the other hand, the cost of the extra >> initdb obviously means that the percentage is a bit lower than if you >> were to compare test run time without considering the initdb step. > Possibly, but I know I run "make check" not "make installcheck" when I'm > testing new code. And the buildfarm, afaik, runs "make check". And, > for that matter, who the heck cares? It runs both :-) We run "make check" early in the process to make sure we can at least get that far. We run "make installcheck" later, among other things to check that the tests work in different locales. I think we need to have a better understanding of just what our standard regression tests do. AIUI: They do test feature use and errors that have cropped up in the past that we need to beware of. They don't test every bug we've ever had, nor do they exercise every piece of code. Maybe there is a good case for these last two in a different set of tests. cheers andrew
On 06/29/2013 02:14 PM, Andrew Dunstan wrote: > AIUI: They do test feature use and errors that have cropped up in the > past that we need to beware of. They don't test every bug we've ever > had, nor do they exercise every piece of code. If we don't have a test for it, then we can break it in the future and not know we've broken it until .0 is released. Is that really a direction we're happy going in? > > Maybe there is a good case for these last two in a different set of tests. If we had a different set of tests, that would be a valid argument. But we don't, so it's not. And nobody has offered to write a feature to split our tests either. I have to say, I'm really surprised at the level of resistance people on this list are showing to the idea of increasing test coverage. I thought that Postgres was all about reliability? For a project as mature as we are, our test coverage is abysmal, and I think I'm starting to see why. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
-----Original Message----- From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Josh Berkus Sent: Saturday, June 29, 2013 3:00 PM To: Andrew Dunstan Cc: Alvaro Herrera; pgsql-hackers@postgresql.org; Robins Tharakan Subject: Re: [HACKERS] New regression test time On 06/29/2013 02:14 PM, Andrew Dunstan wrote: > AIUI: They do test feature use and errors that have cropped up in the > past that we need to beware of. They don't test every bug we've ever > had, nor do they exercise every piece of code. If we don't have a test for it, then we can break it in the future and not know we've broken it until .0 is released. Isthat really a direction we're happy going in? > > Maybe there is a good case for these last two in a different set of tests. If we had a different set of tests, that would be a valid argument. But we don't, so it's not. And nobody has offered towrite a feature to split our tests either. I have to say, I'm really surprised at the level of resistance people on this list are showing to the idea of increasingtest coverage. I thought that Postgres was all about reliability? For a project as mature as we are, our test coverage is abysmal, and I think I'm starting to see why. >> An ounce of prevention is worth a pound of cure. The cost of a bug rises exponentially, starting at requirements gathering and ending at the customer. Where I work, we have two computer rooms full of machines that run tests around the clock, 24x365. Even so, a full regression takes well over a week because we perform hundreds of thousands of tests. All choices of this kind are trade-offs. But in such situations, my motto is: "Do whatever will make the customer prosper the most." IMO-YMMV <<
On 06/29/2013 05:59 PM, Josh Berkus wrote: >> Maybe there is a good case for these last two in a different set of tests. > If we had a different set of tests, that would be a valid argument. But > we don't, so it's not. And nobody has offered to write a feature to > split our tests either. > > I have to say, I'm really surprised at the level of resistance people on > this list are showing to the idea of increasing test coverage. I thought > that Postgres was all about reliability? For a project as mature as we > are, our test coverage is abysmal, and I think I'm starting to see why. > Dividing the tests into different sections is as simple as creating one schedule file per section. I'm not at all resistant to it. In fact, of someone wants to set up separate sections and add new tests to the different sections I'll be more than happy to provide buildfarm support for it. Obvious candidates could include: * code coverage * bugs * tests too big to run in everyday developer use cheers andrew
> > Dividing the tests into different sections is as simple as creating one > schedule file per section. Oh? Huh. I'd thought it would be much more complicated. Well, by all means, let's do it then. I'm not personally convinced that the existing regression tests all belong in the "default" section -- I think a lot of what we've chosen to test or not test to date has been fairly arbitrary -- but we can tinker with that later. > I'm not at all resistant to it. In fact, of someone wants to set up > separate sections and add new tests to the different sections I'll be > more than happy to provide buildfarm support for it. Obvious candidates > could include: > > * code coverage > * bugs > * tests too big to run in everyday developer use So we'd want codecoverage and codecoverage-parallel then, presumably, for Robins tests? Is there really a reason to supply a non-parallel version? Would we want to include the core tests in those? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Sat, Jun 29, 2013 at 7:58 PM, Josh Berkus <josh@agliodbs.com> wrote: > >> >> Dividing the tests into different sections is as simple as creating one >> schedule file per section. > > Oh? Huh. I'd thought it would be much more complicated. Well, by all > means, let's do it then. I think I should point out, since it seems to have gone unnoticed, that there's a thread with a patch about exactly this (splitting tests), the "big test separation POC".
Josh, * Josh Berkus (josh@agliodbs.com) wrote: > If we don't have a test for it, then we can break it in the future and > not know we've broken it until .0 is released. Is that really a > direction we're happy going in? To be fair, AIUI anyway, certain companies have much larger regression suites that they run new versions of PG against. I'm sure they don't have the platform coverage that the buildfarm does, but I expect no small number of build farm animals would fall over under the strain of that regression suite (or at least, they'd have trouble keeping up with the commit rate). I'm definitely pro-more-tests when they add more code/feature coverage and are nominal in terms of additional time taken during 'make check', but I seriously doubt we're ever going to have anything close to complete code coverage in such a suite of tests. > I have to say, I'm really surprised at the level of resistance people on > this list are showing to the idea of increasing test coverage. I thought > that Postgres was all about reliability? For a project as mature as we > are, our test coverage is abysmal, and I think I'm starting to see why. I do think having these tests split into different groups would be valuable. It might even be good to split them into groups based on what code they exercise and then devs might be able to decide "I'm working in this area right now, I want the tests that are applicable to that code". Might be a bit too much effort though too, dunno. Would be really neat if it was automated in some way. ;) Thanks, Stephen
> If we had a different set of tests, that would be a valid argument. But > we don't, so it's not. And nobody has offered to write a feature to > split our tests either. I have done a POC. See: https://commitfest.postgresql.org/action/patch_view?id=1170 What I have not done is to decide how to split tests in two buckets. -- Fabien.
On Sunday, June 30, 2013 11:37 AM Fabien COELHO wrote: >> If we had a different set of tests, that would be a valid argument. But >> we don't, so it's not. And nobody has offered to write a feature to >> split our tests either. >I have done a POC. See: > https://commitfest.postgresql.org/action/patch_view?id=1170 I think it is better to submit for next commit fest which is at below link: https://commitfest.postgresql.org/action/commitfest_view?id=19 With Regards, Amit Kapila.
>> https://commitfest.postgresql.org/action/patch_view?id=1170 > > I think it is better to submit for next commit fest which is at below link: > > https://commitfest.postgresql.org/action/commitfest_view?id=19 I put it there as the discussion whether to accept or not Robins patches because of their possible impact on non-regression test time is right now, so it may make sense to look at it now, and it is a rather small patch. Otherwise, next commit fest is fine. -- Fabien.
On 30 June 2013 02:33, Amit kapila <amit.kapila@huawei.com> wrote:
I think it is better to submit for next commit fest which is at below link:
On Sunday, June 30, 2013 11:37 AM Fabien COELHO wrote:
>> If we had a different set of tests, that would be a valid argument. But
>> we don't, so it's not. And nobody has offered to write a feature to
>> split our tests either.
>I have done a POC. See:
> https://commitfest.postgresql.org/action/patch_view?id=1170
https://commitfest.postgresql.org/action/commitfest_view?id=19
Hi,
- There is a certain value in having separate tests, just that for the big-tests to be any meaningful, if the buildfarm could run on a periodic (daily?) basis and send some kind of automated bug-reports. Without an automatic feedback, most may not inclined to run all tests before submitting a patch and there'd be a big pile up near a release.
- For now, the new tests that I submit for review (for next CF) would be for 'make check', until a 'make bigcheck' or whatever is up and running.
--
Robins Tharakan
On Sat, Jun 29, 2013 at 3:43 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
Dividing the tests into different sections is as simple as creating one schedule file per section.
On 06/29/2013 05:59 PM, Josh Berkus wrote:Maybe there is a good case for these last two in a different set of tests.If we had a different set of tests, that would be a valid argument. But
we don't, so it's not. And nobody has offered to write a feature to
split our tests either.
I have to say, I'm really surprised at the level of resistance people on
this list are showing to the idea of increasing test coverage. I thought
that Postgres was all about reliability? For a project as mature as we
are, our test coverage is abysmal, and I think I'm starting to see why.
I'm not at all resistant to it. In fact, of someone wants to set up separate sections and add new tests to the different sections I'll be more than happy to provide buildfarm support for it. Obvious candidates could include:
* code coverage
* bugs
* tests too big to run in everyday developer use
I don't really see a difference in the first two. If we were sure the uncovered code had no bugs, we wouldn't need to cover it. At least if you consider unintended behavior changes to be bugs. I think it would make more sense to split them up by what computers it makes sense to run them on.
Tests that take too much RAM to be run by everyone.
Tests that take too many CPUs (in order to be meaningful) to run by everyone most of the time.
Tests that take too much disk space...
Tests that take too much wall-clock time....
And maybe that tests that take too much wall-clock time specifically under CLOBBER_CACHE_ALWAYS...
Some of these sets would probably be empty currently, because candidates that belong in them were not committed at all since they were not wanted in the default and they there was no other place to add them.
If we are very worried about how long the tests take, we should probably also spend some time trying to make the existing ones faster. Parallelization does not cut the test time very much (~20% with 8 CPUs), because the tests are poorly packed. In a parallel group all the tests finish fast except one, and the whole group is then dominated by that one test. (The main goal of parallelization is probably not to make the test faster, but to make them more realistic from a concurrency perspective, but if there is little actual parallelism, it doesn't achieve that very well, either). I don't know how much freedom there is to re-order the tests without breaking dependencies, though. I think prepared_xacts and stats could be usefully run together, as both take a long time sleeping but impose little real load that would interfere with each other. Perhaps prepared_xacts could be re-written to get what it needs without the long statement_timeouts. Testing the timeout itself doesn't seem to be the goal.
Cheers,
Jeff
On 06/30/2013 12:33 AM, Amit kapila wrote: > > On Sunday, June 30, 2013 11:37 AM Fabien COELHO wrote: >>> If we had a different set of tests, that would be a valid argument. But >>> we don't, so it's not. And nobody has offered to write a feature to >>> split our tests either. > >> I have done a POC. See: > >> https://commitfest.postgresql.org/action/patch_view?id=1170 > > I think it is better to submit for next commit fest which is at below link: > > https://commitfest.postgresql.org/action/commitfest_view?id=19 I would argue for doing this in this CF, just so that we can have the benefit of the extra tests for the next 3 months, and so that Andrew can work on the buildfarm additions. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Monday, July 01, 2013 8:37 AM Josh Berkus wrote: On 06/30/2013 12:33 AM, Amit kapila wrote: > > On Sunday, June 30, 2013 11:37 AM Fabien COELHO wrote: >>>> If we had a different set of tests, that would be a valid argument. But >>>> we don't, so it's not. And nobody has offered to write a feature to >>>> split our tests either. > >>> I have done a POC. See: > >>> https://commitfest.postgresql.org/action/patch_view?id=1170 > >> I think it is better to submit for next commit fest which is at below link: >> >> https://commitfest.postgresql.org/action/commitfest_view?id=19 > I would argue for doing this in this CF, just so that we can have the > benefit of the extra tests for the next 3 months, and so that Andrew can > work on the buildfarm additions. My mail was just a suggestion, as in general after start of CF all new patch submissions for review are done in next CF. However as for this particular patch, there are quite a few other dependent patches, so it will be good if this patch gets committed in this CF only. Also as a CFM, you are better person to decide about it. With Regards, Amit Kapila.
On Sat, Jun 29, 2013 at 02:59:35PM -0700, Josh Berkus wrote: > On 06/29/2013 02:14 PM, Andrew Dunstan wrote: > > AIUI: They do test feature use and errors that have cropped up in the > > past that we need to beware of. They don't test every bug we've ever > > had, nor do they exercise every piece of code. > > If we don't have a test for it, then we can break it in the future and > not know we've broken it until .0 is released. Is that really a > direction we're happy going in? > > > Maybe there is a good case for these last two in a different set of tests. > > If we had a different set of tests, that would be a valid argument. But > we don't, so it's not. And nobody has offered to write a feature to > split our tests either. With utmost respect, this just isn't true. There is a "make coverage" target that probably doesn't get enough exercise, but it's just the kind of infrastructure you're describing. > I have to say, I'm really surprised at the level of resistance > people on this list are showing to the idea of increasing test > coverage. I thought that Postgres was all about reliability? For a > project as mature as we are, our test coverage is abysmal, and I > think I'm starting to see why. Burdening hackers with extra time in ordinary compile cycles is the wrong direction. If anything, we should probably look at what tests only routinely get run by our CI system--currently the buildfarm--and which ones developers could reasonably be expected to wait until post-push to run in day-to-day development. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On 2013-07-01 07:14:23 -0700, David Fetter wrote: > > If we had a different set of tests, that would be a valid argument. But > > we don't, so it's not. And nobody has offered to write a feature to > > split our tests either. > With utmost respect, this just isn't true. There is a "make coverage" > target that probably doesn't get enough exercise, but it's just the > kind of infrastructure you're describing. Uh? Isn't make coverage a target for collecting the generated coverage data? Afaik it itself does *NOT* depend on any checks being run. And it only does something sensible if --enable-coverage is passed to ./configure anyway. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Reviewing this thread, I think that the following people are in favor of adding the tests to the existing schedule: Josh Berkus, Stephen Frost, Fabien Coelho, Dann Corbit, and Jeff Janes. And I think that the following people are in favor of a new schedule: Andres Freund, Andrew Dunstan, David Fetter, and possibly Alvaro Herrera. I believe Tom Lane has also expressed objections, though not on this thread. So I think the first question we need to answer is: Should we just reject Robins' patches en masse? If we do that, then the rest of this is moot. If we don't do that, then the second question is whether we should try to introduce a new schedule, and if so, whether we should split out that new schedule before or after committing these patches as they stand. Here are my opinions, for what they are worth. First, I think that rejecting these new tests is a bad idea, although I've looked them over a bit and I think there might be individual things we might want to take out. Second, I think that creating a new schedule is likely to cost developers more time than it saves them. Sure, you'll be able to run the tests slightly faster, but when you commit something, break the buildfarm (which does run those tests), and then have to go back and fix the tests (or your patch), you'll waste more time doing that than you saved by avoiding those few extra seconds of runtime. Third, I think if we do want to create a new schedule, it makes more sense to commit the tests first and then split out the new schedule according to some criteria that we devise. There should be a principled reason for putting tests in one schedule or the other; "all the tests that Robins Tharakan wrote" is not a good filter criteria. I'm willing to put effort into going through these patches and figuring out which parts are worth committing, and commit them. However, I don't want to (and should not) do that if the consensus is to reject the patches altogether; or if people are not OK with the approach proposed above, namely, commit it first and then, if it causes problems, decide how to fix it. Please help me understand what the way forward is for this patch set. Thanks, -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 07/02/2013 10:17 AM, Robert Haas wrote: > Reviewing this thread, I think that the following people are in favor > of adding the tests to the existing schedule: Josh Berkus, Stephen > Frost, Fabien Coelho, Dann Corbit, and Jeff Janes. And I think that > the following people are in favor of a new schedule: Andres Freund, > Andrew Dunstan, David Fetter, and possibly Alvaro Herrera. I believe > Tom Lane has also expressed objections, though not on this thread. > > So I think the first question we need to answer is: Should we just > reject Robins' patches en masse? If we do that, then the rest of this > is moot. If we don't do that, then the second question is whether we > should try to introduce a new schedule, and if so, whether we should > split out that new schedule before or after committing these patches > as they stand. > > Here are my opinions, for what they are worth. First, I think that > rejecting these new tests is a bad idea, although I've looked them > over a bit and I think there might be individual things we might want > to take out. Second, I think that creating a new schedule is likely > to cost developers more time than it saves them. Sure, you'll be able > to run the tests slightly faster, but when you commit something, break > the buildfarm (which does run those tests), and then have to go back > and fix the tests (or your patch), you'll waste more time doing that > than you saved by avoiding those few extra seconds of runtime. Third, > I think if we do want to create a new schedule, it makes more sense to > commit the tests first and then split out the new schedule according > to some criteria that we devise. There should be a principled reason > for putting tests in one schedule or the other; "all the tests that > Robins Tharakan wrote" is not a good filter criteria. > > I'm willing to put effort into going through these patches and > figuring out which parts are worth committing, and commit them. > However, I don't want to (and should not) do that if the consensus is > to reject the patches altogether; or if people are not OK with the > approach proposed above, namely, commit it first and then, if it > causes problems, decide how to fix it. Please help me understand what > the way forward is for this patch set. > I think I'm probably a bit misrepresented here. The question of what tests we have is distinct from the question of what schedule(s) they are in. We already have tests that are in NO schedule, IIRC. What is more, it's entirely possibly to invoke pg_regress with multiple --schedule arguments, so we could, for example, have a makefile target that would run both the check and some other schedule of longer running tests. So my $0.02 says you should assess these tests on their own merits, and we can debate the schedule arrangements later. cheers andrew
> What is more, it's entirely possibly to invoke pg_regress with multiple > --schedule arguments, so we could, for example, have a makefile target > that would run both the check and some other schedule of longer running > tests. I missed this fact, because I've not seen any example of multiple schedule option in regress, so in the split test POC I concatenated files for the same purpose, but multiple schedule options looks like a much better alternative. -- Fabien.
On Tue, Jul 02, 2013 at 10:17:08AM -0400, Robert Haas wrote: > So I think the first question we need to answer is: Should we just > reject Robins' patches en masse? If we do that, then the rest of this > is moot. If we don't do that, then the second question is whether we > should try to introduce a new schedule, and if so, whether we should > split out that new schedule before or after committing these patches > as they stand. It's sad to simply reject meaningful automated tests on the basis of doubt that they're important enough to belong in every human-in-the-loop test run. I share the broader vision for automated testing represented by these patches. > Here are my opinions, for what they are worth. First, I think that > rejecting these new tests is a bad idea, although I've looked them > over a bit and I think there might be individual things we might want > to take out. Second, I think that creating a new schedule is likely > to cost developers more time than it saves them. Sure, you'll be able > to run the tests slightly faster, but when you commit something, break > the buildfarm (which does run those tests), and then have to go back > and fix the tests (or your patch), you'll waste more time doing that > than you saved by avoiding those few extra seconds of runtime. Third, > I think if we do want to create a new schedule, it makes more sense to > commit the tests first and then split out the new schedule according > to some criteria that we devise. There should be a principled reason > for putting tests in one schedule or the other; "all the tests that > Robins Tharakan wrote" is not a good filter criteria. +1 for that plan. I don't know whether placing certain tests outside the main test sequence would indeed cost more time than it saves. I definitely agree that if these new tests should appear elsewhere, some of our existing tests should also move there. Thanks, nm -- Noah Misch EnterpriseDB http://www.enterprisedb.com
On 2 July 2013 18:43, Noah Misch <noah@leadboat.com> wrote:
On Tue, Jul 02, 2013 at 10:17:08AM -0400, Robert Haas wrote:It's sad to simply reject meaningful automated tests on the basis of doubt
> So I think the first question we need to answer is: Should we just
> reject Robins' patches en masse? If we do that, then the rest of this
> is moot. If we don't do that, then the second question is whether we
> should try to introduce a new schedule, and if so, whether we should
> split out that new schedule before or after committing these patches
> as they stand.
that they're important enough to belong in every human-in-the-loop test run.
I share the broader vision for automated testing represented by these patches.
+1 We should be encouraging people to submit automated tests.
I share the annoyance of increasing the length of the automated test runs, and have watched them get longer and longer over time. But I run the tests because I want to see whether I broke anything and I stopped sitting and watching them run long ago.
Automated testing is about x10-100 faster than manual testing, so I see new tests as saving me time not wasting it.
> Here are my opinions, for what they are worth. First, I think that+1 for that plan. I don't know whether placing certain tests outside the main
> rejecting these new tests is a bad idea, although I've looked them
> over a bit and I think there might be individual things we might want
> to take out. Second, I think that creating a new schedule is likely
> to cost developers more time than it saves them. Sure, you'll be able
> to run the tests slightly faster, but when you commit something, break
> the buildfarm (which does run those tests), and then have to go back
> and fix the tests (or your patch), you'll waste more time doing that
> than you saved by avoiding those few extra seconds of runtime. Third,
> I think if we do want to create a new schedule, it makes more sense to
> commit the tests first and then split out the new schedule according
> to some criteria that we devise. There should be a principled reason
> for putting tests in one schedule or the other; "all the tests that
> Robins Tharakan wrote" is not a good filter criteria.
test sequence would indeed cost more time than it saves. I definitely agree
that if these new tests should appear elsewhere, some of our existing tests
should also move there.
Let's have a new schedule called minute-check with the objective to run the common tests in 60 secs.
We can continue to expand the normal schedules from here.
Anybody that wants short tests can run that, everyone else can run the full test suite.
We should be encouraging people to run the full test suite, not the fast one.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Jul 3, 2013 at 2:28 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> It's sad to simply reject meaningful automated tests on the basis of doubt >> that they're important enough to belong in every human-in-the-loop test >> run. >> I share the broader vision for automated testing represented by these >> patches. > > +1 We should be encouraging people to submit automated tests. It seems like there is now a clear consensus to proceed here, so I'll start looking at committing some of these tests. > Automated testing is about x10-100 faster than manual testing, so I see new > tests as saving me time not wasting it. +1. > Let's have a new schedule called minute-check with the objective to run the > common tests in 60 secs. > > We can continue to expand the normal schedules from here. > > Anybody that wants short tests can run that, everyone else can run the full > test suite. > > We should be encouraging people to run the full test suite, not the fast > one. I like this idea, or some variant of it (fastcheck?). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 3 July 2013 15:43, Robert Haas <robertmhaas@gmail.com> wrote:
> Let's have a new schedule called minute-check with the objective to run theI like this idea, or some variant of it (fastcheck?).
> common tests in 60 secs.
>
> We can continue to expand the normal schedules from here.
>
> Anybody that wants short tests can run that, everyone else can run the full
> test suite.
>
> We should be encouraging people to run the full test suite, not the fast
> one.
I thought about that, but then people will spend time trying to tune it.
If its called minute-check and it runs in 60 secs then they'll leave it alone...
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 07/03/2013 07:43 AM, Robert Haas wrote: > Let's have a new schedule called minute-check with the objective to run the >> common tests in 60 secs. Note that we're below 60s even with assert and CLOBBER_CACHE_ALWAYS, at least on my laptop. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 07/03/2013 02:50 PM, Josh Berkus wrote: > On 07/03/2013 07:43 AM, Robert Haas wrote: >> Let's have a new schedule called minute-check with the objective to run the >>> common tests in 60 secs. > Note that we're below 60s even with assert and CLOBBER_CACHE_ALWAYS, at > least on my laptop. > I find that hard to believe. On friarbird, which has CLOBBER_CACHE_ALWAYS turned on, "make check" takes 110 minutes. The same machine runs nightjar which takes 34 seconds. cheers andrew