Thread: Test code is worth the space
We've too often criticized patches for carrying many lines/bytes of test case additions. Let's continue to demand debuggable, secure tests that fail only when something is wrong, but let's stop pushing for test minimalism. Such objections may improve the individual patch, but that doesn't make up for the chilling effect on test contributions. I remember clearly the first time I submitted thorough test coverage with a feature. Multiple committers attacked it in the name of brevity. PostgreSQL would be better off with 10x its current test bulk, even if the average line of test code were considerably less important than we expect today. Thanks, nm
On Sat, Aug 8, 2015 at 9:47 AM, Noah Misch <noah@leadboat.com> wrote: > We've too often criticized patches for carrying many lines/bytes of test case > additions. Let's continue to demand debuggable, secure tests that fail only > when something is wrong, but let's stop pushing for test minimalism. Such > objections may improve the individual patch, but that doesn't make up for the > chilling effect on test contributions. I remember clearly the first time I > submitted thorough test coverage with a feature. Multiple committers attacked > it in the name of brevity. PostgreSQL would be better off with 10x its > current test bulk, even if the average line of test code were considerably > less important than we expect today. I strongly agree. I am fond of the example of external sorting, which at present has exactly zero test coverage, even though in production those code paths are exercised all the time. I think that there needs to be a way of running an extended set of regression tests. I could definitely respect the desire for minimalism when it comes to adding tests to the regression tests proper if there was an extended set of tests that could be run during development less frequently. I thought about doing the extended set as a satellite project, but that may not be workable. -- Peter Geoghegan
On 08/08/2015 12:24 PM, Peter Geoghegan wrote: > I think that there needs to be a way of running an extended set of > regression tests. I could definitely respect the desire for minimalism > when it comes to adding tests to the regression tests proper if there > was an extended set of tests that could be run during development less > frequently. I thought about doing the extended set as a satellite > project, but that may not be workable. There already is, isn't there? All of those named sets of regression tests which aren't run by default. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 08/08/2015 05:09 PM, Josh Berkus wrote: > On 08/08/2015 12:24 PM, Peter Geoghegan wrote: >> I think that there needs to be a way of running an extended set of >> regression tests. I could definitely respect the desire for minimalism >> when it comes to adding tests to the regression tests proper if there >> was an extended set of tests that could be run during development less >> frequently. I thought about doing the extended set as a satellite >> project, but that may not be workable. > There already is, isn't there? All of those named sets of regression > tests which aren't run by default. > Exactly. And there is nothing to stop us expanding those. For example, it might be useful to have a suite that provably touched every code path, or something close to it. Incidentally, making the buildfarm client run extra sets of tests in the main tree is almost completely trivial. Making it run tests from somewhere else, such as a different git repo, is only slightly harder. cheers andrew
On Sat, Aug 8, 2015 at 8:24 PM, Peter Geoghegan <pg@heroku.com> wrote: > I think that there needs to be a way of running an extended set of > regression tests. I could definitely respect the desire for minimalism The larger expense in having extensive test suites is the cost to maintain them. With our current test framework tests have to be fairly carefully written to produce output that isn't very fragile and sensitive to minor changes in the code. In practice this is what really drives the push towards minimal tests. If we tried testing every code path right now -- which I tried to do once for the TOAST code -- what you'll find is that what you're really testing is not that the code is correct but that it does exactly what it does today. At that point the test failures become entirely noise meaning "something changed" rather than signal meaning "something's broken". A better test framework can go a long way towards fixing this. It would be much less of a problem if we had a unit test framework rather than only black box integration tests. That would allow us to test that every function in tuplestore.c meets its contract, not just that some SQL query produces output that's correct and we think happened to go through some code path we were interested in. There are many code paths that will be hard to arrange to reach from SQL and impossible to have any confidence will continue to be reached in the future when the behaviour is intentionally changed. That said, I don't think anyone would object to adding some regression tests that at least test basic correctness of the sorting code. That's a pretty embarrassing gap and iirc the only reason for it is that it would make the regression tests sensitive to collation locale settings. -- greg
On 8 August 2015 at 17:47, Noah Misch <noah@leadboat.com> wrote:
--
We've too often criticized patches for carrying many lines/bytes of test case
additions. Let's continue to demand debuggable, secure tests that fail only
when something is wrong, but let's stop pushing for test minimalism. Such
objections may improve the individual patch, but that doesn't make up for the
chilling effect on test contributions. I remember clearly the first time I
submitted thorough test coverage with a feature. Multiple committers attacked
it in the name of brevity. PostgreSQL would be better off with 10x its
current test bulk, even if the average line of test code were considerably
less important than we expect today.
Almost every patch I review has either zero or insufficient tests.
If we care about robustness, then we must discuss tests. Here are my two recent experiences:
I agree we could do with x10 as many tests, but that doesn't mean all tests make sense, so there needs to be some limiting principles to avoid adding pointless test cases. I recently objected to adding a test case to Isolation tester for the latest ALTER TABLE patch because in that case there is no concurrent behavior to test. There is already a regression test that tests lock levels for those statements, so in my view it is more sensible to modify the existing test than to add a whole new isolation test that does nothing more than demonstrate the lock levels work as described, which we already knew.
On another review I suggested we add a function to core to allow it to be used in regression tests. A long debate ensued, deciding that we must be consistent and put diagnostic functions in contrib. My understanding is that we are not allowed to write regression tests that use contrib modules, yet the consistent place to put diagnostic functions is contrib - therefore, we are never allowed to write tests utilizing diagnostic functions. We are allowed to put modules for testing in another directory, but these are not distributed to users so cannot be used for production diagnosis. Systemic fail, advice needed.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Aug 10, 2015 at 2:02 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > Almost every patch I review has either zero or insufficient tests. > > If we care about robustness, then we must discuss tests. Here are my two > recent experiences: > > I agree we could do with x10 as many tests, but that doesn't mean all tests > make sense, so there needs to be some limiting principles to avoid adding > pointless test cases. I recently objected to adding a test case to Isolation > tester for the latest ALTER TABLE patch because in that case there is no > concurrent behavior to test. There is already a regression test that tests > lock levels for those statements, so in my view it is more sensible to > modify the existing test than to add a whole new isolation test that does > nothing more than demonstrate the lock levels work as described, which we > already knew. > > On another review I suggested we add a function to core to allow it to be > used in regression tests. A long debate ensued, deciding that we must be > consistent and put diagnostic functions in contrib. My understanding is that > we are not allowed to write regression tests that use contrib modules, yet > the consistent place to put diagnostic functions is contrib - therefore, we > are never allowed to write tests utilizing diagnostic functions. We are > allowed to put modules for testing in another directory, but these are not > distributed to users so cannot be used for production diagnosis. Systemic > fail, advice needed. There are actually two ways to do this. One model is the dummy_seclabel module. The build system arranges for that to be available when running the core regression tests, so they can use it. And the dummy_seclabel test does. There are actually a couple of other loadable modules that are used by the core regression tests in this kind of way (refint and autoinc, from contrib/spi). The other model is what I'll call the test_decoding model. Since the core code can't be tested without a module, we have a module. But then the tests are in the module's directory (contrib/test_decoding/{sql,expected}) not the core regression tests. In general, I have a mild preference for the second model. It seems more scalable, and keeps the core tests quick to run, which is appropriate for more obscure tests that are unlikely to break very often. But the first model can also be done, as show by the fact that we have in fact done it several times. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 10 August 2015 at 13:55, Robert Haas <robertmhaas@gmail.com> wrote:
--
On Mon, Aug 10, 2015 at 2:02 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On another review I suggested we add a function to core to allow it to be
> used in regression tests. A long debate ensued, deciding that we must be
> consistent and put diagnostic functions in contrib. My understanding is that
> we are not allowed to write regression tests that use contrib modules, yet
> the consistent place to put diagnostic functions is contrib - therefore, we
> are never allowed to write tests utilizing diagnostic functions. We are
> allowed to put modules for testing in another directory, but these are not
> distributed to users so cannot be used for production diagnosis. Systemic
> fail, advice needed.
There are actually two ways to do this.
One model is the dummy_seclabel module. The build system arranges for
that to be available when running the core regression tests, so they
can use it. And the dummy_seclabel test does. There are actually a
couple of other loadable modules that are used by the core regression
tests in this kind of way (refint and autoinc, from contrib/spi).
The other model is what I'll call the test_decoding model. Since the
core code can't be tested without a module, we have a module. But
then the tests are in the module's directory
(contrib/test_decoding/{sql,expected}) not the core regression tests.
In general, I have a mild preference for the second model. It seems
more scalable, and keeps the core tests quick to run, which is
appropriate for more obscure tests that are unlikely to break very
often. But the first model can also be done, as show by the fact that
we have in fact done it several times.
Neither of those uses a diagnostic function that also has value in production environments - for logical decoding we added something to core specifically to allow this pg_recvlogical.
Anyway, resolving this isn't important anymore because I wish to pursue a different mechanism for freezing, but its possible I hit the same issue.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Aug 10, 2015 at 07:02:17AM +0100, Simon Riggs wrote: > Almost every patch I review has either zero or insufficient tests. > > If we care about robustness, then we must discuss tests. Here are my two > recent experiences: > > I agree we could do with x10 as many tests, but that doesn't mean all tests > make sense, so there needs to be some limiting principles to avoid adding > pointless test cases. I recently objected to adding a test case to > Isolation tester for the latest ALTER TABLE patch because in that case > there is no concurrent behavior to test. There is already a regression test > that tests lock levels for those statements, so in my view it is more > sensible to modify the existing test than to add a whole new isolation test > that does nothing more than demonstrate the lock levels work as described, > which we already knew. Committers press authors to delete tests more often than we press them to resubmit with more tests. No wonder so many patches have insufficient tests; we treat those patches more favorably, on average. I have no objective principles for determining whether a test is pointlessly redundant, but I think the principles should become roughly 10x more permissive than the (unspecified) ones we've been using. > On another review I suggested we add a function to core to allow it to be > used in regression tests. A long debate ensued, deciding that we must be > consistent and put diagnostic functions in contrib. My understanding is > that we are not allowed to write regression tests that use contrib modules, > yet the consistent place to put diagnostic functions is contrib - > therefore, we are never allowed to write tests utilizing diagnostic > functions. We are allowed to put modules for testing in another directory, > but these are not distributed to users so cannot be used for production > diagnosis. Systemic fail, advice needed. If you want a user-visible function for production diagnosis, set aside the fact that you wish to use it in a test, and find the best place for it. Then, place the tests with the function. (That is, if the function belongs in contrib for other reasons, put tests calling it in the contrib module itself.) Place non-user-visible test support functions in regress.c, or use one of the options Robert described. Thanks, nm
On 12 August 2015 at 03:10, Noah Misch <noah@leadboat.com> wrote:
--
> On another review I suggested we add a function to core to allow it to be
> used in regression tests. A long debate ensued, deciding that we must be
> consistent and put diagnostic functions in contrib. My understanding is
> that we are not allowed to write regression tests that use contrib modules,
> yet the consistent place to put diagnostic functions is contrib -
> therefore, we are never allowed to write tests utilizing diagnostic
> functions. We are allowed to put modules for testing in another directory,
> but these are not distributed to users so cannot be used for production
> diagnosis. Systemic fail, advice needed.
If you want a user-visible function for production diagnosis, set aside the
fact that you wish to use it in a test, and find the best place for it. Then,
place the tests with the function. (That is, if the function belongs in
contrib for other reasons, put tests calling it in the contrib module itself.)
Place non-user-visible test support functions in regress.c, or use one of the
options Robert described.
That helps, thanks.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Aug 12, 2015 at 3:10 AM, Noah Misch <noah@leadboat.com> wrote: > Committers press authors to delete tests more often than we press them to > resubmit with more tests. No wonder so many patches have insufficient tests; > we treat those patches more favorably, on average. I have no objective > principles for determining whether a test is pointlessly redundant, but I > think the principles should become roughly 10x more permissive than the > (unspecified) ones we've been using. I would suggest the metric should be "if this test fails is it more likely to be noise due to an intentional change in behaviour or more likely to be a bug?" The only time I've seen pushback against tests is when the test author made valiant efforts to test every codepath and the expected output embeds the precise behaviour of the current code as "correct". Even when patches have extensive tests I don't recall seeing much pushback (though I've been having trouble keeping up with the list in recent months) if the tests are written in a way that they will only fail if there's a bug, even if behaviour changes in unrelated ways. -- greg
On Wed, Aug 12, 2015 at 10:46 AM, Greg Stark <stark@mit.edu> wrote: > The only time I've seen pushback against tests is when the test author > made valiant efforts to test every codepath and the expected output > embeds the precise behaviour of the current code as "correct". Even > when patches have extensive tests I don't recall seeing much pushback > (though I've been having trouble keeping up with the list in recent > months) if the tests are written in a way that they will only fail if > there's a bug, even if behaviour changes in unrelated ways. Really? I think Noah's description of how less testing is in effect incentivized by committers is totally accurate. No patch author is going to dig their heals in over the objections of a committer when the complaint is about brevity of tests. This resistance to adding tests seems quite short sighted to me, especially when the concern is about queries that will each typically take less than 1ms to execute. Like Noah, I think that it would be very helpful to simply be more inclusive of additional tests that don't increase test coverage by as much as each query in a minimal subset. I am not at all convinced by arguments about the cost of maintaining tests when a simple behavioral change occurs. -- Peter Geoghegan
One trouble I face when adding tests is that sometimes they require hooks in the code, to test for race conditions. In BRIN I cannot test some code paths without resorting to adding breakpoints in GDB, for instance. If there's no support for such in the core code, it's essentially impossible to add meaningful test for certain code paths. I toyed with the notion of adding hook points (such that a test module can put the backend to sleep until the other backend acknowledges hitting the race condition window); I decided not to because it seems a more general discussion is necessary first about the desirability of such. As I recall we needed this in SKIP LOCKED and of course for multixacts also. I agree with the general idea that it's worthwhile to add lots more tests than we currently have, both for the current code and for future patches. Surely we don't need to reach the point where every single nuance of every single user interface is verified to the point that tests are such a large burden to maintain as is being suggested. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Aug 12, 2015 at 2:04 PM, Peter Geoghegan <pg@heroku.com> wrote: > This resistance to adding tests seems quite short sighted to me, > especially when the concern is about queries that will each typically > take less than 1ms to execute. Like Noah, I think that it would be > very helpful to simply be more inclusive of additional tests that > don't increase test coverage by as much as each query in a minimal > subset. I am not at all convinced by arguments about the cost of > maintaining tests when a simple behavioral change occurs. I've removed tests from patches that in my opinion were unlikely to fail either (a) for any reason or (b) for any reason other than an intentional change, and I think that's a reasonable thing to do. However, I still think it's a good idea, and useful, to try to expand the code coverage we get from 'make check'. However, the bigger issue, IMHO, is the stuff that can't be tested via pg_regress, e.g. because it needs hooks, like what Alvaro is talking about, or because it needs a custom testing framework. Recovery, for example, really needs a lot more testing, as we talked about at PGCon. If we just expand what 'make check' covers and don't deal with those kinds of things, we will be improving our test coverage but maybe not all that much. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
* Robert Haas (robertmhaas@gmail.com) wrote: > On Wed, Aug 12, 2015 at 2:04 PM, Peter Geoghegan <pg@heroku.com> wrote: > > This resistance to adding tests seems quite short sighted to me, > > especially when the concern is about queries that will each typically > > take less than 1ms to execute. Like Noah, I think that it would be > > very helpful to simply be more inclusive of additional tests that > > don't increase test coverage by as much as each query in a minimal > > subset. I am not at all convinced by arguments about the cost of > > maintaining tests when a simple behavioral change occurs. > > I've removed tests from patches that in my opinion were unlikely to > fail either (a) for any reason or (b) for any reason other than an > intentional change, and I think that's a reasonable thing to do. > However, I still think it's a good idea, and useful, to try to expand > the code coverage we get from 'make check'. However, the bigger > issue, IMHO, is the stuff that can't be tested via pg_regress, e.g. > because it needs hooks, like what Alvaro is talking about, or because > it needs a custom testing framework. Recovery, for example, really > needs a lot more testing, as we talked about at PGCon. If we just > expand what 'make check' covers and don't deal with those kinds of > things, we will be improving our test coverage but maybe not all that > much. Agreed, and this is something which I said we'd work to help with and which we have some things to show now, for those interested. The regression tests included in pgBackRest (available here: https://github.com/pgmasters/backrest) go through a number of different recovery tests. There's vagrant configs for a few different VMs too (CentOS 6, CentOS 7, Ubuntu 12.04 and Ubuntu 14.04) which is what we've been testing. We're working to continue expanding those tests and will also be adding tests for replication and promotion in the future. Eventually, we plan to write a buildfarm module for pgBackRest, to allow it to be run in the same manner as the regular buildfarm animals with the results posted. David, feel free to correct me if I'm misconstrued anything above. Thanks! Stephen
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Aug 12, 2015 at 2:04 PM, Peter Geoghegan <pg@heroku.com> wrote: >> This resistance to adding tests seems quite short sighted to me, >> especially when the concern is about queries that will each typically >> take less than 1ms to execute. Like Noah, I think that it would be >> very helpful to simply be more inclusive of additional tests that >> don't increase test coverage by as much as each query in a minimal >> subset. I am not at all convinced by arguments about the cost of >> maintaining tests when a simple behavioral change occurs. > I've removed tests from patches that in my opinion were unlikely to > fail either (a) for any reason or (b) for any reason other than an > intentional change, and I think that's a reasonable thing to do. FWIW, I've objected in the past to tests that would significantly increase the runtime of "make check", unless I thought they were especially valuable (which enumerating every minor behavior of a feature patch generally isn't IMO). I still think that that's an important consideration: every second you add to "make check" is multiplied many times over when you consider how many developers run that how many times a day. We've talked about having some sort of second rank of tests that people wouldn't necessarily run before committing, and that would be allowed to eat more time than the core regression tests would. I think that might be a valuable direction to pursue if people start submitting very bulky tests. regards, tom lane
On Wed, Aug 12, 2015 at 7:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > FWIW, I've objected in the past to tests that would significantly > increase the runtime of "make check", unless I thought they were > especially valuable (which enumerating every minor behavior of a > feature patch generally isn't IMO). I still think that that's an > important consideration: every second you add to "make check" is > multiplied many times over when you consider how many developers > run that how many times a day. > > We've talked about having some sort of second rank of tests that > people wouldn't necessarily run before committing, and that would > be allowed to eat more time than the core regression tests would. > I think that might be a valuable direction to pursue if people start > submitting very bulky tests. Maybe. Adding a whole new test suite is significantly more administratively complex, because the BF client has to get updated to run it. And if expected outputs in that test suite change very often at all, then committers will have to run it before committing anyway. The value of a core regression suite that takes less time to run has to be weighed against the possibility that a better core regression suite might cause us to find more bugs before committing. That could easily be worth the price in runtime. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Aug 13, 2015 at 5:54 AM, Stephen Frost wrote: > The regression tests included in pgBackRest (available here: > https://github.com/pgmasters/backrest) go through a number of different > recovery tests. There's vagrant configs for a few different VMs too > (CentOS 6, CentOS 7, Ubuntu 12.04 and Ubuntu 14.04) which is what we've > been testing. > We're working to continue expanding those tests and will also be adding > tests for replication and promotion in the future. Eventually, we plan > to write a buildfarm module for pgBackRest, to allow it to be run in the > same manner as the regular buildfarm animals with the results posted. Interesting. Do you mind if I pick up from it some ideas for the in-core replication test suite based on TAP stuff? That's still in the works for the next CF. -- Michael
* Michael Paquier (michael.paquier@gmail.com) wrote: > On Thu, Aug 13, 2015 at 5:54 AM, Stephen Frost wrote: > > The regression tests included in pgBackRest (available here: > > https://github.com/pgmasters/backrest) go through a number of different > > recovery tests. There's vagrant configs for a few different VMs too > > (CentOS 6, CentOS 7, Ubuntu 12.04 and Ubuntu 14.04) which is what we've > > been testing. > > We're working to continue expanding those tests and will also be adding > > tests for replication and promotion in the future. Eventually, we plan > > to write a buildfarm module for pgBackRest, to allow it to be run in the > > same manner as the regular buildfarm animals with the results posted. > > Interesting. Do you mind if I pick up from it some ideas for the > in-core replication test suite based on TAP stuff? That's still in the > works for the next CF. Certainly don't mind at all, entirely open source under the MIT license. Thanks! Stephen
On Wed, Aug 12, 2015 at 9:24 PM, Stephen Frost <sfrost@snowman.net> wrote: > * Michael Paquier (michael.paquier@gmail.com) wrote: >> On Thu, Aug 13, 2015 at 5:54 AM, Stephen Frost wrote: >> > The regression tests included in pgBackRest (available here: >> > https://github.com/pgmasters/backrest) go through a number of different >> > recovery tests. There's vagrant configs for a few different VMs too >> > (CentOS 6, CentOS 7, Ubuntu 12.04 and Ubuntu 14.04) which is what we've >> > been testing. >> > We're working to continue expanding those tests and will also be adding >> > tests for replication and promotion in the future. Eventually, we plan >> > to write a buildfarm module for pgBackRest, to allow it to be run in the >> > same manner as the regular buildfarm animals with the results posted. >> >> Interesting. Do you mind if I pick up from it some ideas for the >> in-core replication test suite based on TAP stuff? That's still in the >> works for the next CF. > > Certainly don't mind at all, entirely open source under the MIT > license. Why not the PG license? It would be nicer if we didn't have to worry about license contamination here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
* Robert Haas (robertmhaas@gmail.com) wrote: > On Wed, Aug 12, 2015 at 9:24 PM, Stephen Frost <sfrost@snowman.net> wrote: > > * Michael Paquier (michael.paquier@gmail.com) wrote: > >> On Thu, Aug 13, 2015 at 5:54 AM, Stephen Frost wrote: > >> > The regression tests included in pgBackRest (available here: > >> > https://github.com/pgmasters/backrest) go through a number of different > >> > recovery tests. There's vagrant configs for a few different VMs too > >> > (CentOS 6, CentOS 7, Ubuntu 12.04 and Ubuntu 14.04) which is what we've > >> > been testing. > >> > We're working to continue expanding those tests and will also be adding > >> > tests for replication and promotion in the future. Eventually, we plan > >> > to write a buildfarm module for pgBackRest, to allow it to be run in the > >> > same manner as the regular buildfarm animals with the results posted. > >> > >> Interesting. Do you mind if I pick up from it some ideas for the > >> in-core replication test suite based on TAP stuff? That's still in the > >> works for the next CF. > > > > Certainly don't mind at all, entirely open source under the MIT > > license. > > Why not the PG license? It would be nicer if we didn't have to worry > about license contamination here. There is no conflict between the licenses and pgBackRest is not part of nor maintained as part of the PostgreSQL project. Not that I'm against that changing, certainly, but it was independently developed and I wouldn't ask the community to take on maintaining something which wasn't developed following the community process, not to mention that I imagine some would probably want it rewritten in C. We clearly have dependencies, to the extent that there's concern about licenses, on much more restrictive and interesting licenses, and there currently isn't even any real dependency here and would only be if the PGDG decided to fork pgBackRest and start maintaining it independently. Thanks! Stephen
> FWIW, I've objected in the past to tests that would significantly > increase the runtime of "make check", unless I thought they were > especially valuable (which enumerating every minor behavior of a > feature patch generally isn't IMO). I still think that that's an > important consideration: every second you add to "make check" is > multiplied many times over when you consider how many developers > run that how many times a day. Sure. These are developer's tests run 50 times per day just to check that nothing was just completly broken. It's just a kind of test. I agree that having ISN conversions tested everytime does not make much sense, especially as the source file is touched every two years. On the other hand, when I tried to fix ISN bugs and figure out that there was no single tests for the module, then it is hard to spot regressions, so the tests should be there even if they are not run often: Testing the obvious is useful when you start meddling in the code. > We've talked about having some sort of second rank of tests that > people wouldn't necessarily run before committing, and that would > be allowed to eat more time than the core regression tests would. > I think that might be a valuable direction to pursue if people start > submitting very bulky tests. Yep. For regression tests, the list of tests run is maintained in the *_schedule files. There could be a "large_parallel_schedule" which included more tests. This is already more or less the case, as there are "big*" targets which run numeric_big in addition to the others. This could be expanded and taken into account by the build farm. I recall test submissions which were rejected on the ground of 'it takes 1 second of my time every day' and is 'not very useful as the feature is known to work'. I think that a key change needed is that committers are more open to such additions and the discussion rather focus on (1) are the tests portable (2) do the test cover features not already tested (3) should they be in the default list of tests run under "make check". But some should be accepted in *core* nevertheless, and not push out in the bin. -- Fabien.
On Thu, Aug 13, 2015 at 1:31 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Aug 12, 2015 at 7:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> FWIW, I've objected in the past to tests that would significantly
> increase the runtime of "make check", unless I thought they were
> especially valuable (which enumerating every minor behavior of a
> feature patch generally isn't IMO). I still think that that's an
> important consideration: every second you add to "make check" is
> multiplied many times over when you consider how many developers
> run that how many times a day.
>
> We've talked about having some sort of second rank of tests that
> people wouldn't necessarily run before committing, and that would
> be allowed to eat more time than the core regression tests would.
> I think that might be a valuable direction to pursue if people start
> submitting very bulky tests.
Maybe. Adding a whole new test suite is significantly more
administratively complex, because the BF client has to get updated to
run it. And if expected outputs in that test suite change very often
at all, then committers will have to run it before committing anyway.
We could always do that the other way - meaning put everything in "make check", and invent a "make quickcheck" for developers with the smaller subset.
The value of a core regression suite that takes less time to run has
to be weighed against the possibility that a better core regression
suite might cause us to find more bugs before committing. That could
easily be worth the price in runtime.
Or have a quickcheck you run "all the time" and then run the bigger one once before committing perhaps?
On Wed, Aug 12, 2015 at 11:23 PM, Magnus Hagander <magnus@hagander.net> wrote: >> The value of a core regression suite that takes less time to run has >> to be weighed against the possibility that a better core regression >> suite might cause us to find more bugs before committing. That could >> easily be worth the price in runtime. > > > Or have a quickcheck you run "all the time" and then run the bigger one once > before committing perhaps? I favor splitting the regression tests to add "all the time" and "before commit" targets as you describe. I think that once the facility is there, we can determine over time how expansive that second category gets to be. -- Peter Geoghegan
On 8/12/15 9:24 PM, Stephen Frost wrote: > * Michael Paquier (michael.paquier@gmail.com) wrote: >> On Thu, Aug 13, 2015 at 5:54 AM, Stephen Frost wrote: >>> The regression tests included in pgBackRest (available here: >>> https://github.com/pgmasters/backrest) go through a number of different >>> recovery tests. There's vagrant configs for a few different VMs too >>> (CentOS 6, CentOS 7, Ubuntu 12.04 and Ubuntu 14.04) which is what we've >>> been testing. >>> We're working to continue expanding those tests and will also be adding >>> tests for replication and promotion in the future. Eventually, we plan >>> to write a buildfarm module for pgBackRest, to allow it to be run in the >>> same manner as the regular buildfarm animals with the results posted. >> >> Interesting. Do you mind if I pick up from it some ideas for the >> in-core replication test suite based on TAP stuff? That's still in the >> works for the next CF. > > Certainly don't mind at all, entirely open source under the MIT > license. Absolutely, and I'm always look to add new tests so some cross-pollination may be beneficial as well. Currently the regression tests work with 9.0 - 9.5alpha1 (and back to 8.3, actually). The pause_on_recovery_target test is currently disabled for 9.5 and I have not implemented recovery_target_action or recovery_target = immediate yet. I haven't tested alpha2 yet but it should work unless catalog or other IDs have changed. -- -David david@pgmasters.net
On 8/12/15 9:32 PM, Robert Haas wrote: > On Wed, Aug 12, 2015 at 9:24 PM, Stephen Frost <sfrost@snowman.net> wrote: >> * Michael Paquier (michael.paquier@gmail.com) wrote: >>> Interesting. Do you mind if I pick up from it some ideas for the >>> in-core replication test suite based on TAP stuff? That's still in the >>> works for the next CF. >> >> Certainly don't mind at all, entirely open source under the MIT >> license. > > Why not the PG license? It would be nicer if we didn't have to worry > about license contamination here. There are actually a few reasons I chose the MIT license: 1) It's one of the most permissive licenses around. 2) I originally had plans to extend backrest to other database systems. Nearly two years into development I don't thinkthat sounds like a great idea anymore but it was the original plan. 3) It's common for GitHub projects and backrest has lived there its entire life. I'm not against a license change in theory though I can't see why it matters very much. -- -David david@pgmasters.net
On 2015-08-13 09:32:02 -0400, David Steele wrote: > On 8/12/15 9:32 PM, Robert Haas wrote: > >On Wed, Aug 12, 2015 at 9:24 PM, Stephen Frost <sfrost@snowman.net> wrote: > >>Certainly don't mind at all, entirely open source under the MIT > >>license. > > > >Why not the PG license? It would be nicer if we didn't have to worry > >about license contamination here. I don't think MIT is particularly problematic, it's rather similar to a 3 clause BSD and both are pretty similar to PG's license. > There are actually a few reasons I chose the MIT license: > > 1) It's one of the most permissive licenses around. > 2) I originally had plans to extend backrest to other database systems. > Nearly two years into development I don't think that sounds like a great > idea anymore but it was the original plan. I don't see the difference to/with postgres' license there. Andres
On 8/13/15 9:55 AM, Andres Freund wrote: > On 2015-08-13 09:32:02 -0400, David Steele wrote: >> On 8/12/15 9:32 PM, Robert Haas wrote: >>> On Wed, Aug 12, 2015 at 9:24 PM, Stephen Frost <sfrost@snowman.net> wrote: >>>> Certainly don't mind at all, entirely open source under the MIT >>>> license. >>> >>> Why not the PG license? It would be nicer if we didn't have to worry >>> about license contamination here. > > I don't think MIT is particularly problematic, it's rather similar to a > 3 clause BSD and both are pretty similar to PG's license. > >> There are actually a few reasons I chose the MIT license: >> >> 1) It's one of the most permissive licenses around. > >> 2) I originally had plans to extend backrest to other database systems. >> Nearly two years into development I don't think that sounds like a great >> idea anymore but it was the original plan. > > I don't see the difference to/with postgres' license there. Perhaps just me but it was really about perception. If I extended BackRest to work with MySQL (shudders) then I thought it would be weird if it used the PostgreSQL license. Anyway, now BackRest is now pretty solidly pgBackRest and I don't have any immediate plans to port to other database systems so the point is probably moot. -- -David david@pgmasters.net
On 8/13/15 1:31 AM, Peter Geoghegan wrote: > On Wed, Aug 12, 2015 at 11:23 PM, Magnus Hagander <magnus@hagander.net> wrote: >>> The value of a core regression suite that takes less time to run has >>> to be weighed against the possibility that a better core regression >>> suite might cause us to find more bugs before committing. That could >>> easily be worth the price in runtime. >> >> >> Or have a quickcheck you run "all the time" and then run the bigger one once >> before committing perhaps? > > I favor splitting the regression tests to add "all the time" and > "before commit" targets as you describe. I think that once the > facility is there, we can determine over time how expansive that > second category gets to be. I don't know how many folks work in a github fork of Postgres, but anyone that does could run slow checks on every single push via Travis-CI. [1] is an example of that. That wouldn't work directly since it depends on Peter Eisentraut's scripts [2] that pull from apt.postgresql.org, but presumably it wouldn't be too hard to get tests running directly out of a Postgres repo. [1] https://travis-ci.org/decibel/variant [2] See wget URLs in https://github.com/decibel/variant/blob/master/.travis.yml -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Data in Trouble? Get it in Treble! http://BlueTreble.com
On 13 August 2015 at 00:31, Robert Haas <robertmhaas@gmail.com> wrote:
--
On Wed, Aug 12, 2015 at 7:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> FWIW, I've objected in the past to tests that would significantly
> increase the runtime of "make check", unless I thought they were
> especially valuable (which enumerating every minor behavior of a
> feature patch generally isn't IMO). I still think that that's an
> important consideration: every second you add to "make check" is
> multiplied many times over when you consider how many developers
> run that how many times a day.
>
> We've talked about having some sort of second rank of tests that
> people wouldn't necessarily run before committing, and that would
> be allowed to eat more time than the core regression tests would.
> I think that might be a valuable direction to pursue if people start
> submitting very bulky tests.
Maybe. Adding a whole new test suite is significantly more
administratively complex, because the BF client has to get updated to
run it. And if expected outputs in that test suite change very often
at all, then committers will have to run it before committing anyway.
The value of a core regression suite that takes less time to run has
to be weighed against the possibility that a better core regression
suite might cause us to find more bugs before committing. That could
easily be worth the price in runtime.
Seems like a simple fix. We maintain all regression tests in full, but keep slow tests in separate files accessed only by a different schedule.
make check == fast-parallel_schedule
make check-full == parallel_schedule
Probably easier to make one schedule call the other, so we don't duplicate anything.
Tom gets his fast schedule, others get their full schedule.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 8/14/15 12:11 AM, Jim Nasby wrote: >> >> I favor splitting the regression tests to add "all the time" and >> "before commit" targets as you describe. I think that once the >> facility is there, we can determine over time how expansive that >> second category gets to be. > > I don't know how many folks work in a github fork of Postgres, but > anyone that does could run slow checks on every single push via Travis-CI. I setup a simple example of this with 64 variations of TAP tests, BLKSZ and WAL blocksize. Unfortunately to make this work you have to commit a .travis.yml file to your fork. build: https://travis-ci.org/decibel/postgres/builds/75692344 .travis.yml: https://github.com/decibel/postgres/blob/master/.travis.yml Looks like we might have some problems with BLKSZ != 8... -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Data in Trouble? Get it in Treble! http://BlueTreble.com
On 2015-08-15 03:35, Jim Nasby wrote: > > I setup a simple example of this with 64 variations of TAP tests, BLKSZ > and WAL blocksize. Unfortunately to make this work you have to commit a > .travis.yml file to your fork. > > build: https://travis-ci.org/decibel/postgres/builds/75692344 > .travis.yml: https://github.com/decibel/postgres/blob/master/.travis.yml > > Looks like we might have some problems with BLKSZ != 8... I went through several of those and ISTM that most test failures are to be expected: a) order of resulting rows is different for some of the joins b) planner behaves differently because of different number of pages c) tablesample returns different results because it uses ctids as random source d) toaster behaves differently because of different threshold which is calculated from BLKSZ e) max size of btree value is exceeded with BLKSZ = 4 in the test that tries to create deep enough btree We could fix a) by adding ORDER BY to those queries but I don't see how to fix the rest easily or at all without sacrificing some test coverage. However I see one real error in the BLKSZ = 4 tests - two of the merge join queries in equivclass die with "ERROR: could not find commutator for operator" ( example build output https://travis-ci.org/decibel/postgres/jobs/75692377 ). I didn't yet investigate what's causing this. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Aug 12, 2015 at 06:46:19PM +0100, Greg Stark wrote: > On Wed, Aug 12, 2015 at 3:10 AM, Noah Misch <noah@leadboat.com> wrote: > > Committers press authors to delete tests more often than we press them to > > resubmit with more tests. No wonder so many patches have insufficient tests; > > we treat those patches more favorably, on average. I have no objective > > principles for determining whether a test is pointlessly redundant, but I > > think the principles should become roughly 10x more permissive than the > > (unspecified) ones we've been using. > > I would suggest the metric should be "if this test fails is it more > likely to be noise due to an intentional change in behaviour or more > likely to be a bug?" When I've just spent awhile implementing a behavior change, the test diffs are a comforting sight. They confirm that the test suite exercises the topic I just changed. Furthermore, most tests today do not qualify under this stringent metric you suggest. The nature of pg_regress opposes it. I sometimes hear a myth that tests catch the bugs their authors anticipated. We have tests like that (opr_sanity comes to mind), but much test-induced bug discovery is serendipitous. To give a recent example, Peter Eisentraut didn't write src/bin tests to reveal the bug that led to commit d73d14c.
On Fri, Aug 14, 2015 at 12:47:49PM +0100, Simon Riggs wrote: > On 13 August 2015 at 00:31, Robert Haas <robertmhaas@gmail.com> wrote: > > On Wed, Aug 12, 2015 at 7:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > We've talked about having some sort of second rank of tests that > > > people wouldn't necessarily run before committing, and that would > > > be allowed to eat more time than the core regression tests would. > > > I think that might be a valuable direction to pursue if people start > > > submitting very bulky tests. > > > > Maybe. Adding a whole new test suite is significantly more > > administratively complex, because the BF client has to get updated to > > run it. And if expected outputs in that test suite change very often > > at all, then committers will have to run it before committing anyway. > > > > The value of a core regression suite that takes less time to run has > > to be weighed against the possibility that a better core regression > > suite might cause us to find more bugs before committing. That could > > easily be worth the price in runtime. > > Seems like a simple fix. We maintain all regression tests in full, but keep > slow tests in separate files accessed only by a different schedule. > > make check == fast-parallel_schedule > make check-full == parallel_schedule +1 for a split, though I would do "make quickcheck" and "make check". Using fewer tests should be a conscious decision, and "check" is the widely-known Makefile target. In particular, non-hackers building production binaries need the thorough test battery. (As a bonus, the buildfarm wouldn't miss a beat.)
On Sun, Aug 16, 2015 at 7:33 AM, Noah Misch <noah@leadboat.com> wrote: > When I've just spent awhile implementing a behavior change, the test diffs are > a comforting sight. They confirm that the test suite exercises the topic I > just changed. Furthermore, most tests today do not qualify under this > stringent metric you suggest. The nature of pg_regress opposes it. It's not a comforting sight for me. It means I need to change the test results and then of course the tests will pass. So they didn't really test anything and I don't really know whether the changes broke anything. Any test I had to adjust for my change was effectively worthless. This is precisely my point about pg_regress and why it's the reason there's pressure not to have extensive tests. It's useful to have some end-to-end black box tests like this but it's self-limiting. You can't test many details without tying the tests to specific behaviour. I have worked with insanely extensive testing suites that didn't have this problem. But they were unit tests for code that was structured around testability. When the API is designed to be testable and you have good infrastructure for mocking up the environment and comparing function results in a much narrower way than just looking at text output you can test the correctness of each function without tying the test to the specific behaviour. *That* gives a real comfort. When you change a function to behave entirely differently and can still run all the tests and see that all the code that depends on it still operate correctly then you know you haven't broken anything. In that case it actually *speeds up* development rather than slowing it down. It lets newbie developers hack on code where they don't necessarily know all the downstream dependent code and not be paralyzed by fear that they'll break something. -- greg
On 8/15/15 4:45 AM, Petr Jelinek wrote: > We could fix a) by adding ORDER BY to those queries but I don't see how > to fix the rest easily or at all without sacrificing some test coverage. Hopefully at some point we'll have test frameworks that don't depend on capturing raw psql output, which presumably makes dealing with a lot of this easier. Maybe some of this could be handled by factoring BLKSZ into the queries... A really crude solution would be to have expected output be BLKSZ dependent where appropriate. No one will remember to test that by hand, but if we have CI setup then maybe it's workable... -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Data in Trouble? Get it in Treble! http://BlueTreble.com
On 8/16/15 8:48 AM, Greg Stark wrote: > On Sun, Aug 16, 2015 at 7:33 AM, Noah Misch <noah@leadboat.com> wrote: >> When I've just spent awhile implementing a behavior change, the test diffs are >> a comforting sight. They confirm that the test suite exercises the topic I >> just changed. Furthermore, most tests today do not qualify under this >> stringent metric you suggest. The nature of pg_regress opposes it. > > It's not a comforting sight for me. It means I need to change the test > results and then of course the tests will pass. So they didn't really > test anything and I don't really know whether the changes broke > anything. Any test I had to adjust for my change was effectively > worthless. > > This is precisely my point about pg_regress and why it's the reason > there's pressure not to have extensive tests. It's useful to have some > end-to-end black box tests like this but it's self-limiting. You can't > test many details without tying the tests to specific behaviour. > > I have worked with insanely extensive testing suites that didn't have > this problem. But they were unit tests for code that was structured > around testability. When the API is designed to be testable and you > have good infrastructure for mocking up the environment and comparing > function results in a much narrower way than just looking at text > output you can test the correctness of each function without tying the > test to the specific behaviour. > > *That* gives a real comfort. When you change a function to behave > entirely differently and can still run all the tests and see that all > the code that depends on it still operate correctly then you know you > haven't broken anything. In that case it actually *speeds up* > development rather than slowing it down. It lets newbie developers > hack on code where they don't necessarily know all the downstream > dependent code and not be paralyzed by fear that they'll break > something. As someone who oversaw a database test suite with almost 500 test files (each testing multiple cases), I completely agree. Our early framework was actually similar to pg_regress and that worked "OK" up to about 200 test files, but it got very unwieldy very quickly. We switched to pgTap and it was far easier to work with. I suspect any effort to significantly improve Postgres test coverage is doomed until there's an alternative to pg_regress. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Data in Trouble? Get it in Treble! http://BlueTreble.com
On Mon, Aug 17, 2015 at 02:04:40PM -0500, Jim Nasby wrote: > On 8/16/15 8:48 AM, Greg Stark wrote: > >On Sun, Aug 16, 2015 at 7:33 AM, Noah Misch <noah@leadboat.com> wrote: > >>When I've just spent awhile implementing a behavior change, the test diffs are > >>a comforting sight. They confirm that the test suite exercises the topic I > >>just changed. Furthermore, most tests today do not qualify under this > >>stringent metric you suggest. The nature of pg_regress opposes it. > > > >It's not a comforting sight for me. It means I need to change the test > >results and then of course the tests will pass. So they didn't really > >test anything and I don't really know whether the changes broke > >anything. Any test I had to adjust for my change was effectively > >worthless. > > > >This is precisely my point about pg_regress and why it's the reason > >there's pressure not to have extensive tests. It's useful to have some > >end-to-end black box tests like this but it's self-limiting. You can't > >test many details without tying the tests to specific behaviour. > > > >I have worked with insanely extensive testing suites that didn't have > >this problem. But they were unit tests for code that was structured > >around testability. When the API is designed to be testable and you > >have good infrastructure for mocking up the environment and comparing > >function results in a much narrower way than just looking at text > >output you can test the correctness of each function without tying the > >test to the specific behaviour. > > > >*That* gives a real comfort. When you change a function to behave > >entirely differently and can still run all the tests and see that all > >the code that depends on it still operate correctly then you know you > >haven't broken anything. In that case it actually *speeds up* > >development rather than slowing it down. It lets newbie developers > >hack on code where they don't necessarily know all the downstream > >dependent code and not be paralyzed by fear that they'll break > >something. > > As someone who oversaw a database test suite with almost 500 test files > (each testing multiple cases), I completely agree. Our early framework was > actually similar to pg_regress and that worked "OK" up to about 200 test > files, but it got very unwieldy very quickly. We switched to pgTap and it > was far easier to work with. My own position is based on having maintained a pg_regress suite an order of magnitude larger than that. I don't know why that outcome was so different. > I suspect any effort to significantly improve Postgres test coverage is > doomed until there's an alternative to pg_regress. There is the src/test/perl/TestLib.pm harness.
On Tue, Aug 18, 2015 at 6:57 AM, Noah Misch <noah@leadboat.com> wrote: > My own position is based on having maintained a pg_regress suite an order of > magnitude larger than that. I don't know why that outcome was so different. Comparing the size of test suites by these numbers is impossible because people put more or fewer tests in each schedule file. I would be more interested in the run-time as a metric but even that is fallible as I've seen individual tests that took 30min to run. And does your pg_regress test suite actually find many bugs or does it mainly detect when functionality has changed and require updating expected results to match? >> I suspect any effort to significantly improve Postgres test coverage is >> doomed until there's an alternative to pg_regress. > > There is the src/test/perl/TestLib.pm harness. Sadly I think the test suite is only half the battle. The coding style of Postgres predates modern test suite systems and makes it hard to test. Most functions require a specific environment set up that would be laborious and difficult to do in any sane way. Even something as self-contained as tuplesort would be difficult to test without the whole operator class and syscache mechanisms initialized and populated. And that's an easy case, imagine trying to test individual functions in the planner without doing a complete planner run on a query. -- greg
On Tue, Aug 18, 2015 at 02:03:19PM +0100, Greg Stark wrote: > On Tue, Aug 18, 2015 at 6:57 AM, Noah Misch <noah@leadboat.com> wrote: > >> I suspect any effort to significantly improve Postgres test > >> coverage is doomed until there's an alternative to pg_regress. > > > > There is the src/test/perl/TestLib.pm harness. > > Sadly I think the test suite is only half the battle. The coding > style of Postgres predates modern test suite systems and makes it > hard to test. Most functions require a specific environment set up > that would be laborious and difficult to do in any sane way. Even > something as self-contained as tuplesort would be difficult to test > without the whole operator class and syscache mechanisms initialized > and populated. And that's an easy case, imagine trying to test > individual functions in the planner without doing a complete planner > run on a query. I'm given to understand that this tight coupling is necessary for performance. Are you saying that it could be unwound, or that testing strategies mostly need to take it into account, or...? Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Tue, Aug 18, 2015 at 2:16 PM, David Fetter <david@fetter.org> wrote: > I'm given to understand that this tight coupling is necessary for > performance. Are you saying that it could be unwound, or that testing > strategies mostly need to take it into account, or...? I'm just saying that we shouldn't expect to find a magic bullet test framework that solves all these problems. Without restructuring code, which I don't think is really feasible, we won't be able to have good unit test coverage for most existing code. It might be more practical to start using such a new tool for new code only. Then the new code could be structured in ways that allow the environment to be mocked more easily and the results observed more easily. -- greg
On Tue, Aug 18, 2015 at 04:54:07PM +0100, Greg Stark wrote: > On Tue, Aug 18, 2015 at 2:16 PM, David Fetter <david@fetter.org> wrote: > > I'm given to understand that this tight coupling is necessary for > > performance. Are you saying that it could be unwound, or that > > testing strategies mostly need to take it into account, or...? > > I'm just saying that we shouldn't expect to find a magic bullet test > framework that solves all these problems. Without restructuring > code, which I don't think is really feasible, we won't be able to > have good unit test coverage for most existing code. > > It might be more practical to start using such a new tool for new > code only. Then the new code could be structured in ways that allow > the environment to be mocked more easily and the results observed > more easily. Great! Do we have examples of such tools and code bases structured to accommodate them that we'd like to use for reference, or at least for inspiration? Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Tue, Aug 18, 2015 at 02:03:19PM +0100, Greg Stark wrote: > On Tue, Aug 18, 2015 at 6:57 AM, Noah Misch <noah@leadboat.com> wrote: > > My own position is based on having maintained a pg_regress suite an order of > > magnitude larger than that. I don't know why that outcome was so different. > And does your pg_regress test suite actually find many bugs or does it > mainly detect when functionality has changed and require updating > expected results to match? It found plenty of bugs and required plenty of expected output updates. > >> I suspect any effort to significantly improve Postgres test coverage is > >> doomed until there's an alternative to pg_regress. > > > > There is the src/test/perl/TestLib.pm harness. > > Sadly I think the test suite is only half the battle. Certainly.
On Tue, Aug 18, 2015 at 3:32 PM, David Fetter <david@fetter.org> wrote:
On Tue, Aug 18, 2015 at 04:54:07PM +0100, Greg Stark wrote:
> On Tue, Aug 18, 2015 at 2:16 PM, David Fetter <david@fetter.org> wrote:
> > I'm given to understand that this tight coupling is necessary for
> > performance. Are you saying that it could be unwound, or that
> > testing strategies mostly need to take it into account, or...?
>
> I'm just saying that we shouldn't expect to find a magic bullet test
> framework that solves all these problems. Without restructuring
> code, which I don't think is really feasible, we won't be able to
> have good unit test coverage for most existing code.
>
> It might be more practical to start using such a new tool for new
> code only. Then the new code could be structured in ways that allow
> the environment to be mocked more easily and the results observed
> more easily.
Great!
Do we have examples of such tools and code bases structured to
accommodate them that we'd like to use for reference, or at least for
inspiration?
+1 on that. It would be helpful to see successful examples. Especially ones written in C.
I can't really figure out what success looks like just from reading the descriptions.
Cheers,
Jeff