Thread: Test code is worth the space

Test code is worth the space

From
Noah Misch
Date:
We've too often criticized patches for carrying many lines/bytes of test case
additions.  Let's continue to demand debuggable, secure tests that fail only
when something is wrong, but let's stop pushing for test minimalism.  Such
objections may improve the individual patch, but that doesn't make up for the
chilling effect on test contributions.  I remember clearly the first time I
submitted thorough test coverage with a feature.  Multiple committers attacked
it in the name of brevity.  PostgreSQL would be better off with 10x its
current test bulk, even if the average line of test code were considerably
less important than we expect today.

Thanks,
nm



Re: Test code is worth the space

From
Peter Geoghegan
Date:
On Sat, Aug 8, 2015 at 9:47 AM, Noah Misch <noah@leadboat.com> wrote:
> We've too often criticized patches for carrying many lines/bytes of test case
> additions.  Let's continue to demand debuggable, secure tests that fail only
> when something is wrong, but let's stop pushing for test minimalism.  Such
> objections may improve the individual patch, but that doesn't make up for the
> chilling effect on test contributions.  I remember clearly the first time I
> submitted thorough test coverage with a feature.  Multiple committers attacked
> it in the name of brevity.  PostgreSQL would be better off with 10x its
> current test bulk, even if the average line of test code were considerably
> less important than we expect today.

I strongly agree. I am fond of the example of external sorting, which
at present has exactly zero test coverage, even though in production
those code paths are exercised all the time.

I think that there needs to be a way of running an extended set of
regression tests. I could definitely respect the desire for minimalism
when it comes to adding tests to the regression tests proper if there
was an extended set of tests that could be run during development less
frequently. I thought about doing the extended set as a satellite
project, but that may not be workable.

-- 
Peter Geoghegan



Re: Test code is worth the space

From
Josh Berkus
Date:
On 08/08/2015 12:24 PM, Peter Geoghegan wrote:
> I think that there needs to be a way of running an extended set of
> regression tests. I could definitely respect the desire for minimalism
> when it comes to adding tests to the regression tests proper if there
> was an extended set of tests that could be run during development less
> frequently. I thought about doing the extended set as a satellite
> project, but that may not be workable.

There already is, isn't there?  All of those named sets of regression
tests which aren't run by default.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: Test code is worth the space

From
Andrew Dunstan
Date:
On 08/08/2015 05:09 PM, Josh Berkus wrote:
> On 08/08/2015 12:24 PM, Peter Geoghegan wrote:
>> I think that there needs to be a way of running an extended set of
>> regression tests. I could definitely respect the desire for minimalism
>> when it comes to adding tests to the regression tests proper if there
>> was an extended set of tests that could be run during development less
>> frequently. I thought about doing the extended set as a satellite
>> project, but that may not be workable.
> There already is, isn't there?  All of those named sets of regression
> tests which aren't run by default.
>

Exactly. And there is nothing to stop us expanding those. For example, 
it might be useful to have a suite that provably touched every code 
path, or something close to it.

Incidentally, making the buildfarm client run extra sets of tests in the 
main tree is almost completely trivial. Making it run tests from 
somewhere else, such as a different git repo, is only slightly harder.

cheers

andrew





Re: Test code is worth the space

From
Greg Stark
Date:
On Sat, Aug 8, 2015 at 8:24 PM, Peter Geoghegan <pg@heroku.com> wrote:
> I think that there needs to be a way of running an extended set of
> regression tests. I could definitely respect the desire for minimalism

The larger expense in having extensive test suites is the cost to
maintain them. With our current test framework tests have to be fairly
carefully written to produce output that isn't very fragile and
sensitive to minor changes in the code. In practice this is what
really drives the push towards minimal tests. If we tried testing
every code path right now -- which I tried to do once for the TOAST
code -- what you'll find is that what you're really testing is not
that the code is correct but that it does exactly what it does today.
At that point the test failures become entirely noise meaning
"something changed" rather than signal meaning "something's broken".

A better test framework can go a long way towards fixing this. It
would be much less of a problem if we had a unit test framework rather
than only black box integration tests. That would allow us to test
that every function in tuplestore.c meets its contract, not just that
some SQL query produces output that's correct and we think happened to
go through some code path we were interested in. There are many code
paths that will be hard to arrange to reach from SQL and impossible to
have any confidence will continue to be reached in the future when the
behaviour is intentionally changed.

That said, I don't think anyone would object to adding some regression
tests that at least test basic correctness of the sorting code. That's
a pretty embarrassing gap and iirc the only reason for it is that it
would make the regression tests sensitive to collation locale
settings.

-- 
greg



Re: Test code is worth the space

From
Simon Riggs
Date:
On 8 August 2015 at 17:47, Noah Misch <noah@leadboat.com> wrote:
We've too often criticized patches for carrying many lines/bytes of test case
additions.  Let's continue to demand debuggable, secure tests that fail only
when something is wrong, but let's stop pushing for test minimalism.  Such
objections may improve the individual patch, but that doesn't make up for the
chilling effect on test contributions.  I remember clearly the first time I
submitted thorough test coverage with a feature.  Multiple committers attacked
it in the name of brevity.  PostgreSQL would be better off with 10x its
current test bulk, even if the average line of test code were considerably
less important than we expect today.

Almost every patch I review has either zero or insufficient tests.

If we care about robustness, then we must discuss tests. Here are my two recent experiences:

I agree we could do with x10 as many tests, but that doesn't mean all tests make sense, so there needs to be some limiting principles to avoid adding pointless test cases. I recently objected to adding a test case to Isolation tester for the latest ALTER TABLE patch because in that case there is no concurrent behavior to test. There is already a regression test that tests lock levels for those statements, so in my view it is more sensible to modify the existing test than to add a whole new isolation test that does nothing more than demonstrate the lock levels work as described, which we already knew.

On another review I suggested we add a function to core to allow it to be used in regression tests. A long debate ensued, deciding that we must be consistent and put diagnostic functions in contrib. My understanding is that we are not allowed to write regression tests that use contrib modules, yet the consistent place to put diagnostic functions is contrib - therefore, we are never allowed to write tests utilizing diagnostic functions. We are allowed to put modules for testing in another directory, but these are not distributed to users so cannot be used for production diagnosis. Systemic fail, advice needed.
 
--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Test code is worth the space

From
Robert Haas
Date:
On Mon, Aug 10, 2015 at 2:02 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Almost every patch I review has either zero or insufficient tests.
>
> If we care about robustness, then we must discuss tests. Here are my two
> recent experiences:
>
> I agree we could do with x10 as many tests, but that doesn't mean all tests
> make sense, so there needs to be some limiting principles to avoid adding
> pointless test cases. I recently objected to adding a test case to Isolation
> tester for the latest ALTER TABLE patch because in that case there is no
> concurrent behavior to test. There is already a regression test that tests
> lock levels for those statements, so in my view it is more sensible to
> modify the existing test than to add a whole new isolation test that does
> nothing more than demonstrate the lock levels work as described, which we
> already knew.
>
> On another review I suggested we add a function to core to allow it to be
> used in regression tests. A long debate ensued, deciding that we must be
> consistent and put diagnostic functions in contrib. My understanding is that
> we are not allowed to write regression tests that use contrib modules, yet
> the consistent place to put diagnostic functions is contrib - therefore, we
> are never allowed to write tests utilizing diagnostic functions. We are
> allowed to put modules for testing in another directory, but these are not
> distributed to users so cannot be used for production diagnosis. Systemic
> fail, advice needed.

There are actually two ways to do this.

One model is the dummy_seclabel module.  The build system arranges for
that to be available when running the core regression tests, so they
can use it.  And the dummy_seclabel test does.  There are actually a
couple of other loadable modules that are used by the core regression
tests in this kind of way (refint and autoinc, from contrib/spi).

The other model is what I'll call the test_decoding model.  Since the
core code can't be tested without a module, we have a module.  But
then the tests are in the module's directory
(contrib/test_decoding/{sql,expected}) not the core regression tests.

In general, I have a mild preference for the second model.  It seems
more scalable, and keeps the core tests quick to run, which is
appropriate for more obscure tests that are unlikely to break very
often.  But the first model can also be done, as show by the fact that
we have in fact done it several times.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Test code is worth the space

From
Simon Riggs
Date:
On 10 August 2015 at 13:55, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Aug 10, 2015 at 2:02 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On another review I suggested we add a function to core to allow it to be
> used in regression tests. A long debate ensued, deciding that we must be
> consistent and put diagnostic functions in contrib. My understanding is that
> we are not allowed to write regression tests that use contrib modules, yet
> the consistent place to put diagnostic functions is contrib - therefore, we
> are never allowed to write tests utilizing diagnostic functions. We are
> allowed to put modules for testing in another directory, but these are not
> distributed to users so cannot be used for production diagnosis. Systemic
> fail, advice needed.

There are actually two ways to do this.

One model is the dummy_seclabel module.  The build system arranges for
that to be available when running the core regression tests, so they
can use it.  And the dummy_seclabel test does.  There are actually a
couple of other loadable modules that are used by the core regression
tests in this kind of way (refint and autoinc, from contrib/spi).

The other model is what I'll call the test_decoding model.  Since the
core code can't be tested without a module, we have a module.  But
then the tests are in the module's directory
(contrib/test_decoding/{sql,expected}) not the core regression tests.

In general, I have a mild preference for the second model.  It seems
more scalable, and keeps the core tests quick to run, which is
appropriate for more obscure tests that are unlikely to break very
often.  But the first model can also be done, as show by the fact that
we have in fact done it several times.

Neither of those uses a diagnostic function that also has value in production environments - for logical decoding we added something to core specifically to allow this pg_recvlogical.

Anyway, resolving this isn't important anymore because I wish to pursue a different mechanism for freezing, but its possible I hit the same issue.
 
--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Test code is worth the space

From
Noah Misch
Date:
On Mon, Aug 10, 2015 at 07:02:17AM +0100, Simon Riggs wrote:
> Almost every patch I review has either zero or insufficient tests.
> 
> If we care about robustness, then we must discuss tests. Here are my two
> recent experiences:
> 
> I agree we could do with x10 as many tests, but that doesn't mean all tests
> make sense, so there needs to be some limiting principles to avoid adding
> pointless test cases. I recently objected to adding a test case to
> Isolation tester for the latest ALTER TABLE patch because in that case
> there is no concurrent behavior to test. There is already a regression test
> that tests lock levels for those statements, so in my view it is more
> sensible to modify the existing test than to add a whole new isolation test
> that does nothing more than demonstrate the lock levels work as described,
> which we already knew.

Committers press authors to delete tests more often than we press them to
resubmit with more tests.  No wonder so many patches have insufficient tests;
we treat those patches more favorably, on average.  I have no objective
principles for determining whether a test is pointlessly redundant, but I
think the principles should become roughly 10x more permissive than the
(unspecified) ones we've been using.

> On another review I suggested we add a function to core to allow it to be
> used in regression tests. A long debate ensued, deciding that we must be
> consistent and put diagnostic functions in contrib. My understanding is
> that we are not allowed to write regression tests that use contrib modules,
> yet the consistent place to put diagnostic functions is contrib -
> therefore, we are never allowed to write tests utilizing diagnostic
> functions. We are allowed to put modules for testing in another directory,
> but these are not distributed to users so cannot be used for production
> diagnosis. Systemic fail, advice needed.

If you want a user-visible function for production diagnosis, set aside the
fact that you wish to use it in a test, and find the best place for it.  Then,
place the tests with the function.  (That is, if the function belongs in
contrib for other reasons, put tests calling it in the contrib module itself.)

Place non-user-visible test support functions in regress.c, or use one of the
options Robert described.

Thanks,
nm



Re: Test code is worth the space

From
Simon Riggs
Date:
On 12 August 2015 at 03:10, Noah Misch <noah@leadboat.com> wrote:
 
> On another review I suggested we add a function to core to allow it to be
> used in regression tests. A long debate ensued, deciding that we must be
> consistent and put diagnostic functions in contrib. My understanding is
> that we are not allowed to write regression tests that use contrib modules,
> yet the consistent place to put diagnostic functions is contrib -
> therefore, we are never allowed to write tests utilizing diagnostic
> functions. We are allowed to put modules for testing in another directory,
> but these are not distributed to users so cannot be used for production
> diagnosis. Systemic fail, advice needed.

If you want a user-visible function for production diagnosis, set aside the
fact that you wish to use it in a test, and find the best place for it.  Then,
place the tests with the function.  (That is, if the function belongs in
contrib for other reasons, put tests calling it in the contrib module itself.)

Place non-user-visible test support functions in regress.c, or use one of the
options Robert described.

That helps, thanks. 

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Test code is worth the space

From
Greg Stark
Date:
On Wed, Aug 12, 2015 at 3:10 AM, Noah Misch <noah@leadboat.com> wrote:
> Committers press authors to delete tests more often than we press them to
> resubmit with more tests.  No wonder so many patches have insufficient tests;
> we treat those patches more favorably, on average.  I have no objective
> principles for determining whether a test is pointlessly redundant, but I
> think the principles should become roughly 10x more permissive than the
> (unspecified) ones we've been using.


I would suggest the metric should be "if this test fails is it more
likely to be noise due to an intentional change in behaviour or more
likely to be a bug?"

The only time I've seen pushback against tests is when the test author
made valiant efforts to test every codepath and the expected output
embeds the precise behaviour of the current code as "correct". Even
when patches have extensive tests I don't recall seeing much pushback
(though I've been having trouble keeping up with the list in recent
months) if the tests are written in a way that they will only fail if
there's a bug, even if behaviour changes in unrelated ways.

-- 
greg



Re: Test code is worth the space

From
Peter Geoghegan
Date:
On Wed, Aug 12, 2015 at 10:46 AM, Greg Stark <stark@mit.edu> wrote:
> The only time I've seen pushback against tests is when the test author
> made valiant efforts to test every codepath and the expected output
> embeds the precise behaviour of the current code as "correct". Even
> when patches have extensive tests I don't recall seeing much pushback
> (though I've been having trouble keeping up with the list in recent
> months) if the tests are written in a way that they will only fail if
> there's a bug, even if behaviour changes in unrelated ways.

Really? I think Noah's description of how less testing is in effect
incentivized by committers is totally accurate. No patch author is
going to dig their heals in over the objections of a committer when
the complaint is about brevity of tests.

This resistance to adding tests seems quite short sighted to me,
especially when the concern is about queries that will each typically
take less than 1ms to execute. Like Noah, I think that it would be
very helpful to simply be more inclusive of additional tests that
don't increase test coverage by as much as each query in a minimal
subset. I am not at all convinced by arguments about the cost of
maintaining tests when a simple behavioral change occurs.

-- 
Peter Geoghegan



Re: Test code is worth the space

From
Alvaro Herrera
Date:
One trouble I face when adding tests is that sometimes they require
hooks in the code, to test for race conditions.  In BRIN I cannot test
some code paths without resorting to adding breakpoints in GDB, for
instance.  If there's no support for such in the core code, it's
essentially impossible to add meaningful test for certain code paths.  I
toyed with the notion of adding hook points (such that a test module can
put the backend to sleep until the other backend acknowledges hitting
the race condition window); I decided not to because it seems a more
general discussion is necessary first about the desirability of such.

As I recall we needed this in SKIP LOCKED and of course for multixacts
also.

I agree with the general idea that it's worthwhile to add lots more
tests than we currently have, both for the current code and for future
patches.  Surely we don't need to reach the point where every single
nuance of every single user interface is verified to the point that
tests are such a large burden to maintain as is being suggested.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Test code is worth the space

From
Robert Haas
Date:
On Wed, Aug 12, 2015 at 2:04 PM, Peter Geoghegan <pg@heroku.com> wrote:
> This resistance to adding tests seems quite short sighted to me,
> especially when the concern is about queries that will each typically
> take less than 1ms to execute. Like Noah, I think that it would be
> very helpful to simply be more inclusive of additional tests that
> don't increase test coverage by as much as each query in a minimal
> subset. I am not at all convinced by arguments about the cost of
> maintaining tests when a simple behavioral change occurs.

I've removed tests from patches that in my opinion were unlikely to
fail either (a) for any reason or (b) for any reason other than an
intentional change, and I think that's a reasonable thing to do.
However, I still think it's a good idea, and useful, to try to expand
the code coverage we get from 'make check'.  However, the bigger
issue, IMHO, is the stuff that can't be tested via pg_regress, e.g.
because it needs hooks, like what Alvaro is talking about, or because
it needs a custom testing framework.  Recovery, for example, really
needs a lot more testing, as we talked about at PGCon.  If we just
expand what 'make check' covers and don't deal with those kinds of
things, we will be improving our test coverage but maybe not all that
much.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Test code is worth the space

From
Stephen Frost
Date:
* Robert Haas (robertmhaas@gmail.com) wrote:
> On Wed, Aug 12, 2015 at 2:04 PM, Peter Geoghegan <pg@heroku.com> wrote:
> > This resistance to adding tests seems quite short sighted to me,
> > especially when the concern is about queries that will each typically
> > take less than 1ms to execute. Like Noah, I think that it would be
> > very helpful to simply be more inclusive of additional tests that
> > don't increase test coverage by as much as each query in a minimal
> > subset. I am not at all convinced by arguments about the cost of
> > maintaining tests when a simple behavioral change occurs.
>
> I've removed tests from patches that in my opinion were unlikely to
> fail either (a) for any reason or (b) for any reason other than an
> intentional change, and I think that's a reasonable thing to do.
> However, I still think it's a good idea, and useful, to try to expand
> the code coverage we get from 'make check'.  However, the bigger
> issue, IMHO, is the stuff that can't be tested via pg_regress, e.g.
> because it needs hooks, like what Alvaro is talking about, or because
> it needs a custom testing framework.  Recovery, for example, really
> needs a lot more testing, as we talked about at PGCon.  If we just
> expand what 'make check' covers and don't deal with those kinds of
> things, we will be improving our test coverage but maybe not all that
> much.

Agreed, and this is something which I said we'd work to help with and
which we have some things to show now, for those interested.

The regression tests included in pgBackRest (available here:
https://github.com/pgmasters/backrest) go through a number of different
recovery tests.  There's vagrant configs for a few different VMs too
(CentOS 6, CentOS 7, Ubuntu 12.04 and Ubuntu 14.04) which is what we've
been testing.

We're working to continue expanding those tests and will also be adding
tests for replication and promotion in the future.  Eventually, we plan
to write a buildfarm module for pgBackRest, to allow it to be run in the
same manner as the regular buildfarm animals with the results posted.

David, feel free to correct me if I'm misconstrued anything above.
Thanks!
    Stephen

Re: Test code is worth the space

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Aug 12, 2015 at 2:04 PM, Peter Geoghegan <pg@heroku.com> wrote:
>> This resistance to adding tests seems quite short sighted to me,
>> especially when the concern is about queries that will each typically
>> take less than 1ms to execute. Like Noah, I think that it would be
>> very helpful to simply be more inclusive of additional tests that
>> don't increase test coverage by as much as each query in a minimal
>> subset. I am not at all convinced by arguments about the cost of
>> maintaining tests when a simple behavioral change occurs.

> I've removed tests from patches that in my opinion were unlikely to
> fail either (a) for any reason or (b) for any reason other than an
> intentional change, and I think that's a reasonable thing to do.

FWIW, I've objected in the past to tests that would significantly
increase the runtime of "make check", unless I thought they were
especially valuable (which enumerating every minor behavior of a
feature patch generally isn't IMO).  I still think that that's an
important consideration: every second you add to "make check" is
multiplied many times over when you consider how many developers
run that how many times a day.

We've talked about having some sort of second rank of tests that
people wouldn't necessarily run before committing, and that would
be allowed to eat more time than the core regression tests would.
I think that might be a valuable direction to pursue if people start
submitting very bulky tests.
        regards, tom lane



Re: Test code is worth the space

From
Robert Haas
Date:
On Wed, Aug 12, 2015 at 7:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> FWIW, I've objected in the past to tests that would significantly
> increase the runtime of "make check", unless I thought they were
> especially valuable (which enumerating every minor behavior of a
> feature patch generally isn't IMO).  I still think that that's an
> important consideration: every second you add to "make check" is
> multiplied many times over when you consider how many developers
> run that how many times a day.
>
> We've talked about having some sort of second rank of tests that
> people wouldn't necessarily run before committing, and that would
> be allowed to eat more time than the core regression tests would.
> I think that might be a valuable direction to pursue if people start
> submitting very bulky tests.

Maybe.  Adding a whole new test suite is significantly more
administratively complex, because the BF client has to get updated to
run it.  And if expected outputs in that test suite change very often
at all, then committers will have to run it before committing anyway.

The value of a core regression suite that takes less time to run has
to be weighed against the possibility that a better core regression
suite might cause us to find more bugs before committing.  That could
easily be worth the price in runtime.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Test code is worth the space

From
Michael Paquier
Date:
On Thu, Aug 13, 2015 at 5:54 AM, Stephen Frost wrote:
> The regression tests included in pgBackRest (available here:
> https://github.com/pgmasters/backrest) go through a number of different
> recovery tests.  There's vagrant configs for a few different VMs too
> (CentOS 6, CentOS 7, Ubuntu 12.04 and Ubuntu 14.04) which is what we've
> been testing.
> We're working to continue expanding those tests and will also be adding
> tests for replication and promotion in the future.  Eventually, we plan
> to write a buildfarm module for pgBackRest, to allow it to be run in the
> same manner as the regular buildfarm animals with the results posted.

Interesting. Do you mind if I pick up from it some ideas for the
in-core replication test suite based on TAP stuff? That's still in the
works for the next CF.
-- 
Michael



Re: Test code is worth the space

From
Stephen Frost
Date:
* Michael Paquier (michael.paquier@gmail.com) wrote:
> On Thu, Aug 13, 2015 at 5:54 AM, Stephen Frost wrote:
> > The regression tests included in pgBackRest (available here:
> > https://github.com/pgmasters/backrest) go through a number of different
> > recovery tests.  There's vagrant configs for a few different VMs too
> > (CentOS 6, CentOS 7, Ubuntu 12.04 and Ubuntu 14.04) which is what we've
> > been testing.
> > We're working to continue expanding those tests and will also be adding
> > tests for replication and promotion in the future.  Eventually, we plan
> > to write a buildfarm module for pgBackRest, to allow it to be run in the
> > same manner as the regular buildfarm animals with the results posted.
>
> Interesting. Do you mind if I pick up from it some ideas for the
> in-core replication test suite based on TAP stuff? That's still in the
> works for the next CF.

Certainly don't mind at all, entirely open source under the MIT
license.
Thanks!
    Stephen

Re: Test code is worth the space

From
Robert Haas
Date:
On Wed, Aug 12, 2015 at 9:24 PM, Stephen Frost <sfrost@snowman.net> wrote:
> * Michael Paquier (michael.paquier@gmail.com) wrote:
>> On Thu, Aug 13, 2015 at 5:54 AM, Stephen Frost wrote:
>> > The regression tests included in pgBackRest (available here:
>> > https://github.com/pgmasters/backrest) go through a number of different
>> > recovery tests.  There's vagrant configs for a few different VMs too
>> > (CentOS 6, CentOS 7, Ubuntu 12.04 and Ubuntu 14.04) which is what we've
>> > been testing.
>> > We're working to continue expanding those tests and will also be adding
>> > tests for replication and promotion in the future.  Eventually, we plan
>> > to write a buildfarm module for pgBackRest, to allow it to be run in the
>> > same manner as the regular buildfarm animals with the results posted.
>>
>> Interesting. Do you mind if I pick up from it some ideas for the
>> in-core replication test suite based on TAP stuff? That's still in the
>> works for the next CF.
>
> Certainly don't mind at all, entirely open source under the MIT
> license.

Why not the PG license?  It would be nicer if we didn't have to worry
about license contamination here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Test code is worth the space

From
Stephen Frost
Date:
* Robert Haas (robertmhaas@gmail.com) wrote:
> On Wed, Aug 12, 2015 at 9:24 PM, Stephen Frost <sfrost@snowman.net> wrote:
> > * Michael Paquier (michael.paquier@gmail.com) wrote:
> >> On Thu, Aug 13, 2015 at 5:54 AM, Stephen Frost wrote:
> >> > The regression tests included in pgBackRest (available here:
> >> > https://github.com/pgmasters/backrest) go through a number of different
> >> > recovery tests.  There's vagrant configs for a few different VMs too
> >> > (CentOS 6, CentOS 7, Ubuntu 12.04 and Ubuntu 14.04) which is what we've
> >> > been testing.
> >> > We're working to continue expanding those tests and will also be adding
> >> > tests for replication and promotion in the future.  Eventually, we plan
> >> > to write a buildfarm module for pgBackRest, to allow it to be run in the
> >> > same manner as the regular buildfarm animals with the results posted.
> >>
> >> Interesting. Do you mind if I pick up from it some ideas for the
> >> in-core replication test suite based on TAP stuff? That's still in the
> >> works for the next CF.
> >
> > Certainly don't mind at all, entirely open source under the MIT
> > license.
>
> Why not the PG license?  It would be nicer if we didn't have to worry
> about license contamination here.

There is no conflict between the licenses and pgBackRest is not part of
nor maintained as part of the PostgreSQL project.

Not that I'm against that changing, certainly, but it was independently
developed and I wouldn't ask the community to take on maintaining
something which wasn't developed following the community process, not to
mention that I imagine some would probably want it rewritten in C.

We clearly have dependencies, to the extent that there's concern about
licenses, on much more restrictive and interesting licenses, and there
currently isn't even any real dependency here and would only be if the
PGDG decided to fork pgBackRest and start maintaining it independently.
Thanks!
    Stephen

Re: Test code is worth the space

From
Fabien COELHO
Date:
> FWIW, I've objected in the past to tests that would significantly
> increase the runtime of "make check", unless I thought they were
> especially valuable (which enumerating every minor behavior of a
> feature patch generally isn't IMO).  I still think that that's an
> important consideration: every second you add to "make check" is
> multiplied many times over when you consider how many developers
> run that how many times a day.

Sure. These are developer's tests run 50 times per day just to check that 
nothing was just completly broken. It's just a kind of test.

I agree that having ISN conversions tested everytime does not make much 
sense, especially as the source file is touched every two years. On the 
other hand, when I tried to fix ISN bugs and figure out that there was no 
single tests for the module, then it is hard to spot regressions, so the 
tests should be there even if they are not run often: Testing the obvious 
is useful when you start meddling in the code.

> We've talked about having some sort of second rank of tests that
> people wouldn't necessarily run before committing, and that would
> be allowed to eat more time than the core regression tests would.
> I think that might be a valuable direction to pursue if people start
> submitting very bulky tests.

Yep.

For regression tests, the list of tests run is maintained in the 
*_schedule files. There could be a "large_parallel_schedule" which 
included more tests. This is already more or less the case, as there are 
"big*" targets which run numeric_big in addition to the others.
This could be expanded and taken into account by the build farm.

I recall test submissions which were rejected on the ground of 'it takes 1 
second of my time every day' and is 'not very useful as the feature is 
known to work'.

I think that a key change needed is that committers are more open to such 
additions and the discussion rather focus on (1) are the tests portable 
(2) do the test cover features not already tested (3) should they be in 
the default list of tests run under "make check". But some should be 
accepted in *core* nevertheless, and not push out in the bin.

-- 
Fabien.



Re: Test code is worth the space

From
Magnus Hagander
Date:
On Thu, Aug 13, 2015 at 1:31 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Aug 12, 2015 at 7:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> FWIW, I've objected in the past to tests that would significantly
> increase the runtime of "make check", unless I thought they were
> especially valuable (which enumerating every minor behavior of a
> feature patch generally isn't IMO).  I still think that that's an
> important consideration: every second you add to "make check" is
> multiplied many times over when you consider how many developers
> run that how many times a day.
>
> We've talked about having some sort of second rank of tests that
> people wouldn't necessarily run before committing, and that would
> be allowed to eat more time than the core regression tests would.
> I think that might be a valuable direction to pursue if people start
> submitting very bulky tests.

Maybe.  Adding a whole new test suite is significantly more
administratively complex, because the BF client has to get updated to
run it.  And if expected outputs in that test suite change very often
at all, then committers will have to run it before committing anyway.


We could always do that the other way - meaning put everything in "make check", and invent a "make quickcheck" for developers with the smaller subset.



The value of a core regression suite that takes less time to run has
to be weighed against the possibility that a better core regression
suite might cause us to find more bugs before committing.  That could
easily be worth the price in runtime.


Or have a quickcheck you run "all the time" and then run the bigger one once before committing perhaps? 


--

Re: Test code is worth the space

From
Peter Geoghegan
Date:
On Wed, Aug 12, 2015 at 11:23 PM, Magnus Hagander <magnus@hagander.net> wrote:
>> The value of a core regression suite that takes less time to run has
>> to be weighed against the possibility that a better core regression
>> suite might cause us to find more bugs before committing.  That could
>> easily be worth the price in runtime.
>
>
> Or have a quickcheck you run "all the time" and then run the bigger one once
> before committing perhaps?

I favor splitting the regression tests to add "all the time" and
"before commit" targets as you describe. I think that once the
facility is there, we can determine over time how expansive that
second category gets to be.

-- 
Peter Geoghegan



Re: Test code is worth the space

From
David Steele
Date:
On 8/12/15 9:24 PM, Stephen Frost wrote:
> * Michael Paquier (michael.paquier@gmail.com) wrote:
>> On Thu, Aug 13, 2015 at 5:54 AM, Stephen Frost wrote:
>>> The regression tests included in pgBackRest (available here:
>>> https://github.com/pgmasters/backrest) go through a number of different
>>> recovery tests.  There's vagrant configs for a few different VMs too
>>> (CentOS 6, CentOS 7, Ubuntu 12.04 and Ubuntu 14.04) which is what we've
>>> been testing.
>>> We're working to continue expanding those tests and will also be adding
>>> tests for replication and promotion in the future.  Eventually, we plan
>>> to write a buildfarm module for pgBackRest, to allow it to be run in the
>>> same manner as the regular buildfarm animals with the results posted.
>>
>> Interesting. Do you mind if I pick up from it some ideas for the
>> in-core replication test suite based on TAP stuff? That's still in the
>> works for the next CF.
>
> Certainly don't mind at all, entirely open source under the MIT
> license.

Absolutely, and I'm always look to add new tests so some 
cross-pollination may be beneficial as well.

Currently the regression tests work with 9.0 - 9.5alpha1 (and back to 
8.3, actually).  The pause_on_recovery_target test is currently disabled 
for 9.5 and I have not implemented recovery_target_action or 
recovery_target = immediate yet.  I haven't tested alpha2 yet but it 
should work unless catalog or other IDs have changed.

-- 
-David
david@pgmasters.net



Re: Test code is worth the space

From
David Steele
Date:
On 8/12/15 9:32 PM, Robert Haas wrote:
> On Wed, Aug 12, 2015 at 9:24 PM, Stephen Frost <sfrost@snowman.net> wrote:
>> * Michael Paquier (michael.paquier@gmail.com) wrote:
>>> Interesting. Do you mind if I pick up from it some ideas for the
>>> in-core replication test suite based on TAP stuff? That's still in the
>>> works for the next CF.
>>
>> Certainly don't mind at all, entirely open source under the MIT
>> license.
>
> Why not the PG license?  It would be nicer if we didn't have to worry
> about license contamination here.

There are actually a few reasons I chose the MIT license:

1) It's one of the most permissive licenses around.

2) I originally had plans to extend backrest to other database systems.  Nearly two years into development I don't
thinkthat sounds like a 
 
great idea anymore but it was the original plan.

3) It's common for GitHub projects and backrest has lived there its 
entire life.

I'm not against a license change in theory though I can't see why it 
matters very much.

-- 
-David
david@pgmasters.net



Re: Test code is worth the space

From
Andres Freund
Date:
On 2015-08-13 09:32:02 -0400, David Steele wrote:
> On 8/12/15 9:32 PM, Robert Haas wrote:
> >On Wed, Aug 12, 2015 at 9:24 PM, Stephen Frost <sfrost@snowman.net> wrote:
> >>Certainly don't mind at all, entirely open source under the MIT
> >>license.
> >
> >Why not the PG license?  It would be nicer if we didn't have to worry
> >about license contamination here.

I don't think MIT is particularly problematic, it's rather similar to a
3 clause BSD and both are pretty similar to PG's license.

> There are actually a few reasons I chose the MIT license:
> 
> 1) It's one of the most permissive licenses around.

> 2) I originally had plans to extend backrest to other database systems.
> Nearly two years into development I don't think that sounds like a great
> idea anymore but it was the original plan.

I don't see the difference to/with postgres' license there.

Andres



Re: Test code is worth the space

From
David Steele
Date:
On 8/13/15 9:55 AM, Andres Freund wrote:
> On 2015-08-13 09:32:02 -0400, David Steele wrote:
>> On 8/12/15 9:32 PM, Robert Haas wrote:
>>> On Wed, Aug 12, 2015 at 9:24 PM, Stephen Frost <sfrost@snowman.net> wrote:
>>>> Certainly don't mind at all, entirely open source under the MIT
>>>> license.
>>>
>>> Why not the PG license?  It would be nicer if we didn't have to worry
>>> about license contamination here.
>
> I don't think MIT is particularly problematic, it's rather similar to a
> 3 clause BSD and both are pretty similar to PG's license.
>
>> There are actually a few reasons I chose the MIT license:
>>
>> 1) It's one of the most permissive licenses around.
>
>> 2) I originally had plans to extend backrest to other database systems.
>> Nearly two years into development I don't think that sounds like a great
>> idea anymore but it was the original plan.
>
> I don't see the difference to/with postgres' license there.

Perhaps just me but it was really about perception.  If I extended 
BackRest to work with MySQL (shudders) then I thought it would be weird 
if it used the PostgreSQL license.

Anyway, now BackRest is now pretty solidly pgBackRest and I don't have 
any immediate plans to port to other database systems so the point is 
probably moot.

-- 
-David
david@pgmasters.net



Re: Test code is worth the space

From
Jim Nasby
Date:
On 8/13/15 1:31 AM, Peter Geoghegan wrote:
> On Wed, Aug 12, 2015 at 11:23 PM, Magnus Hagander <magnus@hagander.net> wrote:
>>> The value of a core regression suite that takes less time to run has
>>> to be weighed against the possibility that a better core regression
>>> suite might cause us to find more bugs before committing.  That could
>>> easily be worth the price in runtime.
>>
>>
>> Or have a quickcheck you run "all the time" and then run the bigger one once
>> before committing perhaps?
>
> I favor splitting the regression tests to add "all the time" and
> "before commit" targets as you describe. I think that once the
> facility is there, we can determine over time how expansive that
> second category gets to be.

I don't know how many folks work in a github fork of Postgres, but 
anyone that does could run slow checks on every single push via 
Travis-CI. [1] is an example of that. That wouldn't work directly since 
it depends on Peter Eisentraut's scripts [2] that pull from 
apt.postgresql.org, but presumably it wouldn't be too hard to get tests 
running directly out of a Postgres repo.

[1] https://travis-ci.org/decibel/variant
[2] See wget URLs in 
https://github.com/decibel/variant/blob/master/.travis.yml
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: Test code is worth the space

From
Simon Riggs
Date:
On 13 August 2015 at 00:31, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Aug 12, 2015 at 7:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> FWIW, I've objected in the past to tests that would significantly
> increase the runtime of "make check", unless I thought they were
> especially valuable (which enumerating every minor behavior of a
> feature patch generally isn't IMO).  I still think that that's an
> important consideration: every second you add to "make check" is
> multiplied many times over when you consider how many developers
> run that how many times a day.
>
> We've talked about having some sort of second rank of tests that
> people wouldn't necessarily run before committing, and that would
> be allowed to eat more time than the core regression tests would.
> I think that might be a valuable direction to pursue if people start
> submitting very bulky tests.

Maybe.  Adding a whole new test suite is significantly more
administratively complex, because the BF client has to get updated to
run it.  And if expected outputs in that test suite change very often
at all, then committers will have to run it before committing anyway.

The value of a core regression suite that takes less time to run has
to be weighed against the possibility that a better core regression
suite might cause us to find more bugs before committing.  That could
easily be worth the price in runtime.

Seems like a simple fix. We maintain all regression tests in full, but keep slow tests in separate files accessed only by a different schedule. 

make check == fast-parallel_schedule

make check-full == parallel_schedule

Probably easier to make one schedule call the other, so we don't duplicate anything.

Tom gets his fast schedule, others get their full schedule.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Test code is worth the space

From
Jim Nasby
Date:
On 8/14/15 12:11 AM, Jim Nasby wrote:
>>
>> I favor splitting the regression tests to add "all the time" and
>> "before commit" targets as you describe. I think that once the
>> facility is there, we can determine over time how expansive that
>> second category gets to be.
>
> I don't know how many folks work in a github fork of Postgres, but
> anyone that does could run slow checks on every single push via Travis-CI.

I setup a simple example of this with 64 variations of TAP tests, BLKSZ 
and WAL blocksize. Unfortunately to make this work you have to commit a 
.travis.yml file to your fork.

build: https://travis-ci.org/decibel/postgres/builds/75692344
.travis.yml: https://github.com/decibel/postgres/blob/master/.travis.yml

Looks like we might have some problems with BLKSZ != 8...
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: Test code is worth the space

From
Petr Jelinek
Date:
On 2015-08-15 03:35, Jim Nasby wrote:
>
> I setup a simple example of this with 64 variations of TAP tests, BLKSZ
> and WAL blocksize. Unfortunately to make this work you have to commit a
> .travis.yml file to your fork.
>
> build: https://travis-ci.org/decibel/postgres/builds/75692344
> .travis.yml: https://github.com/decibel/postgres/blob/master/.travis.yml
>
> Looks like we might have some problems with BLKSZ != 8...

I went through several of those and ISTM that most test failures are to 
be expected:
a) order of resulting rows is different for some of the joins
b) planner behaves differently because of different number of pages
c) tablesample returns different results because it uses ctids as random 
source
d) toaster behaves differently because of different threshold which is 
calculated from BLKSZ
e) max size of btree value is exceeded with BLKSZ = 4 in the test that 
tries to create deep enough btree

We could fix a) by adding ORDER BY to those queries but I don't see how 
to fix the rest easily or at all without sacrificing some test coverage.

However I see one real error in the BLKSZ = 4 tests - two of the merge 
join queries in equivclass die with "ERROR:  could not find commutator 
for operator" ( example build output 
https://travis-ci.org/decibel/postgres/jobs/75692377 ). I didn't yet 
investigate what's causing this.


--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: Test code is worth the space

From
Noah Misch
Date:
On Wed, Aug 12, 2015 at 06:46:19PM +0100, Greg Stark wrote:
> On Wed, Aug 12, 2015 at 3:10 AM, Noah Misch <noah@leadboat.com> wrote:
> > Committers press authors to delete tests more often than we press them to
> > resubmit with more tests.  No wonder so many patches have insufficient tests;
> > we treat those patches more favorably, on average.  I have no objective
> > principles for determining whether a test is pointlessly redundant, but I
> > think the principles should become roughly 10x more permissive than the
> > (unspecified) ones we've been using.
> 
> I would suggest the metric should be "if this test fails is it more
> likely to be noise due to an intentional change in behaviour or more
> likely to be a bug?"

When I've just spent awhile implementing a behavior change, the test diffs are
a comforting sight.  They confirm that the test suite exercises the topic I
just changed.  Furthermore, most tests today do not qualify under this
stringent metric you suggest.  The nature of pg_regress opposes it.

I sometimes hear a myth that tests catch the bugs their authors anticipated.
We have tests like that (opr_sanity comes to mind), but much test-induced bug
discovery is serendipitous.  To give a recent example, Peter Eisentraut didn't
write src/bin tests to reveal the bug that led to commit d73d14c.



Re: Test code is worth the space

From
Noah Misch
Date:
On Fri, Aug 14, 2015 at 12:47:49PM +0100, Simon Riggs wrote:
> On 13 August 2015 at 00:31, Robert Haas <robertmhaas@gmail.com> wrote:
> > On Wed, Aug 12, 2015 at 7:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > > We've talked about having some sort of second rank of tests that
> > > people wouldn't necessarily run before committing, and that would
> > > be allowed to eat more time than the core regression tests would.
> > > I think that might be a valuable direction to pursue if people start
> > > submitting very bulky tests.
> >
> > Maybe.  Adding a whole new test suite is significantly more
> > administratively complex, because the BF client has to get updated to
> > run it.  And if expected outputs in that test suite change very often
> > at all, then committers will have to run it before committing anyway.
> >
> > The value of a core regression suite that takes less time to run has
> > to be weighed against the possibility that a better core regression
> > suite might cause us to find more bugs before committing.  That could
> > easily be worth the price in runtime.
> 
> Seems like a simple fix. We maintain all regression tests in full, but keep
> slow tests in separate files accessed only by a different schedule.
> 
> make check == fast-parallel_schedule
> make check-full == parallel_schedule

+1 for a split, though I would do "make quickcheck" and "make check".  Using
fewer tests should be a conscious decision, and "check" is the widely-known
Makefile target.  In particular, non-hackers building production binaries need
the thorough test battery.  (As a bonus, the buildfarm wouldn't miss a beat.)



Re: Test code is worth the space

From
Greg Stark
Date:
On Sun, Aug 16, 2015 at 7:33 AM, Noah Misch <noah@leadboat.com> wrote:
> When I've just spent awhile implementing a behavior change, the test diffs are
> a comforting sight.  They confirm that the test suite exercises the topic I
> just changed.  Furthermore, most tests today do not qualify under this
> stringent metric you suggest.  The nature of pg_regress opposes it.

It's not a comforting sight for me. It means I need to change the test
results and then of course the tests will pass. So they didn't really
test anything and I don't really know whether the changes broke
anything. Any test I had to adjust for my change was effectively
worthless.

This is precisely my point about pg_regress and why it's the reason
there's pressure not to have extensive tests. It's useful to have some
end-to-end black box tests like this but it's self-limiting. You can't
test many details without tying the tests to specific behaviour.

I have worked with insanely extensive testing suites that didn't have
this problem. But they were unit tests for code that was structured
around testability. When the API is designed to be testable and you
have good infrastructure for mocking up the environment and comparing
function results in a much narrower way than just looking at text
output you can test the correctness of each function without tying the
test to the specific behaviour.

*That* gives a real comfort. When you change a function to behave
entirely differently and can still run all the tests and see that all
the code that depends on it still operate correctly then you know you
haven't broken anything. In that case it actually *speeds up*
development rather than slowing it down. It lets newbie developers
hack on code where they don't necessarily know all the downstream
dependent code and not be paralyzed by fear that they'll break
something.

-- 
greg



Re: Test code is worth the space

From
Jim Nasby
Date:
On 8/15/15 4:45 AM, Petr Jelinek wrote:
> We could fix a) by adding ORDER BY to those queries but I don't see how
> to fix the rest easily or at all without sacrificing some test coverage.

Hopefully at some point we'll have test frameworks that don't depend on 
capturing raw psql output, which presumably makes dealing with a lot of 
this easier.

Maybe some of this could be handled by factoring BLKSZ into the queries...

A really crude solution would be to have expected output be BLKSZ 
dependent where appropriate. No one will remember to test that by hand, 
but if we have CI setup then maybe it's workable...
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: Test code is worth the space

From
Jim Nasby
Date:
On 8/16/15 8:48 AM, Greg Stark wrote:
> On Sun, Aug 16, 2015 at 7:33 AM, Noah Misch <noah@leadboat.com> wrote:
>> When I've just spent awhile implementing a behavior change, the test diffs are
>> a comforting sight.  They confirm that the test suite exercises the topic I
>> just changed.  Furthermore, most tests today do not qualify under this
>> stringent metric you suggest.  The nature of pg_regress opposes it.
>
> It's not a comforting sight for me. It means I need to change the test
> results and then of course the tests will pass. So they didn't really
> test anything and I don't really know whether the changes broke
> anything. Any test I had to adjust for my change was effectively
> worthless.
>
> This is precisely my point about pg_regress and why it's the reason
> there's pressure not to have extensive tests. It's useful to have some
> end-to-end black box tests like this but it's self-limiting. You can't
> test many details without tying the tests to specific behaviour.
>
> I have worked with insanely extensive testing suites that didn't have
> this problem. But they were unit tests for code that was structured
> around testability. When the API is designed to be testable and you
> have good infrastructure for mocking up the environment and comparing
> function results in a much narrower way than just looking at text
> output you can test the correctness of each function without tying the
> test to the specific behaviour.
>
> *That* gives a real comfort. When you change a function to behave
> entirely differently and can still run all the tests and see that all
> the code that depends on it still operate correctly then you know you
> haven't broken anything. In that case it actually *speeds up*
> development rather than slowing it down. It lets newbie developers
> hack on code where they don't necessarily know all the downstream
> dependent code and not be paralyzed by fear that they'll break
> something.

As someone who oversaw a database test suite with almost 500 test files 
(each testing multiple cases), I completely agree. Our early framework 
was actually similar to pg_regress and that worked "OK" up to about 200 
test files, but it got very unwieldy very quickly. We switched to pgTap 
and it was far easier to work with.

I suspect any effort to significantly improve Postgres test coverage is 
doomed until there's an alternative to pg_regress.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: Test code is worth the space

From
Noah Misch
Date:
On Mon, Aug 17, 2015 at 02:04:40PM -0500, Jim Nasby wrote:
> On 8/16/15 8:48 AM, Greg Stark wrote:
> >On Sun, Aug 16, 2015 at 7:33 AM, Noah Misch <noah@leadboat.com> wrote:
> >>When I've just spent awhile implementing a behavior change, the test diffs are
> >>a comforting sight.  They confirm that the test suite exercises the topic I
> >>just changed.  Furthermore, most tests today do not qualify under this
> >>stringent metric you suggest.  The nature of pg_regress opposes it.
> >
> >It's not a comforting sight for me. It means I need to change the test
> >results and then of course the tests will pass. So they didn't really
> >test anything and I don't really know whether the changes broke
> >anything. Any test I had to adjust for my change was effectively
> >worthless.
> >
> >This is precisely my point about pg_regress and why it's the reason
> >there's pressure not to have extensive tests. It's useful to have some
> >end-to-end black box tests like this but it's self-limiting. You can't
> >test many details without tying the tests to specific behaviour.
> >
> >I have worked with insanely extensive testing suites that didn't have
> >this problem. But they were unit tests for code that was structured
> >around testability. When the API is designed to be testable and you
> >have good infrastructure for mocking up the environment and comparing
> >function results in a much narrower way than just looking at text
> >output you can test the correctness of each function without tying the
> >test to the specific behaviour.
> >
> >*That* gives a real comfort. When you change a function to behave
> >entirely differently and can still run all the tests and see that all
> >the code that depends on it still operate correctly then you know you
> >haven't broken anything. In that case it actually *speeds up*
> >development rather than slowing it down. It lets newbie developers
> >hack on code where they don't necessarily know all the downstream
> >dependent code and not be paralyzed by fear that they'll break
> >something.
> 
> As someone who oversaw a database test suite with almost 500 test files
> (each testing multiple cases), I completely agree. Our early framework was
> actually similar to pg_regress and that worked "OK" up to about 200 test
> files, but it got very unwieldy very quickly. We switched to pgTap and it
> was far easier to work with.

My own position is based on having maintained a pg_regress suite an order of
magnitude larger than that.  I don't know why that outcome was so different.

> I suspect any effort to significantly improve Postgres test coverage is
> doomed until there's an alternative to pg_regress.

There is the src/test/perl/TestLib.pm harness.



Re: Test code is worth the space

From
Greg Stark
Date:
On Tue, Aug 18, 2015 at 6:57 AM, Noah Misch <noah@leadboat.com> wrote:
> My own position is based on having maintained a pg_regress suite an order of
> magnitude larger than that.  I don't know why that outcome was so different.

Comparing the size of test suites by these numbers is impossible
because people put more or fewer tests in each schedule file. I would
be more interested in the run-time as a metric but even that is
fallible as I've seen individual tests that took 30min to run.

And does your pg_regress test suite actually find many bugs or does it
mainly detect when functionality has changed and require updating
expected results to match?


>> I suspect any effort to significantly improve Postgres test coverage is
>> doomed until there's an alternative to pg_regress.
>
> There is the src/test/perl/TestLib.pm harness.

Sadly I think the test suite is only half the battle. The coding style
of Postgres predates modern test suite systems and makes it hard to
test. Most functions require a specific environment set up that would
be laborious and difficult to do in any sane way. Even something as
self-contained as tuplesort would be difficult to test without the
whole operator class and syscache mechanisms initialized and
populated. And that's an easy case, imagine trying to test individual
functions in the planner without doing a complete planner run on a
query.

-- 
greg



Re: Test code is worth the space

From
David Fetter
Date:
On Tue, Aug 18, 2015 at 02:03:19PM +0100, Greg Stark wrote:
> On Tue, Aug 18, 2015 at 6:57 AM, Noah Misch <noah@leadboat.com> wrote:

> >> I suspect any effort to significantly improve Postgres test
> >> coverage is doomed until there's an alternative to pg_regress.
> >
> > There is the src/test/perl/TestLib.pm harness.
> 
> Sadly I think the test suite is only half the battle. The coding
> style of Postgres predates modern test suite systems and makes it
> hard to test. Most functions require a specific environment set up
> that would be laborious and difficult to do in any sane way. Even
> something as self-contained as tuplesort would be difficult to test
> without the whole operator class and syscache mechanisms initialized
> and populated. And that's an easy case, imagine trying to test
> individual functions in the planner without doing a complete planner
> run on a query.

I'm given to understand that this tight coupling is necessary for
performance.  Are you saying that it could be unwound, or that testing
strategies mostly need to take it into account, or...?

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: Test code is worth the space

From
Greg Stark
Date:
On Tue, Aug 18, 2015 at 2:16 PM, David Fetter <david@fetter.org> wrote:
> I'm given to understand that this tight coupling is necessary for
> performance.  Are you saying that it could be unwound, or that testing
> strategies mostly need to take it into account, or...?

I'm just saying that we shouldn't expect to find a magic bullet test
framework that solves all these problems. Without restructuring code,
which I don't think is really feasible, we won't be able to have good
unit test coverage for most existing code.

It might be more practical to start using such a new tool for new code
only. Then the new code could be structured in ways that allow the
environment to be mocked more easily and the results observed more
easily.


-- 
greg



Re: Test code is worth the space

From
David Fetter
Date:
On Tue, Aug 18, 2015 at 04:54:07PM +0100, Greg Stark wrote:
> On Tue, Aug 18, 2015 at 2:16 PM, David Fetter <david@fetter.org> wrote:
> > I'm given to understand that this tight coupling is necessary for
> > performance.  Are you saying that it could be unwound, or that
> > testing strategies mostly need to take it into account, or...?
> 
> I'm just saying that we shouldn't expect to find a magic bullet test
> framework that solves all these problems. Without restructuring
> code, which I don't think is really feasible, we won't be able to
> have good unit test coverage for most existing code.
> 
> It might be more practical to start using such a new tool for new
> code only. Then the new code could be structured in ways that allow
> the environment to be mocked more easily and the results observed
> more easily.

Great!

Do we have examples of such tools and code bases structured to
accommodate them that we'd like to use for reference, or at least for
inspiration?

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: Test code is worth the space

From
Noah Misch
Date:
On Tue, Aug 18, 2015 at 02:03:19PM +0100, Greg Stark wrote:
> On Tue, Aug 18, 2015 at 6:57 AM, Noah Misch <noah@leadboat.com> wrote:
> > My own position is based on having maintained a pg_regress suite an order of
> > magnitude larger than that.  I don't know why that outcome was so different.

> And does your pg_regress test suite actually find many bugs or does it
> mainly detect when functionality has changed and require updating
> expected results to match?

It found plenty of bugs and required plenty of expected output updates.

> >> I suspect any effort to significantly improve Postgres test coverage is
> >> doomed until there's an alternative to pg_regress.
> >
> > There is the src/test/perl/TestLib.pm harness.
> 
> Sadly I think the test suite is only half the battle.

Certainly.



Re: Test code is worth the space

From
Jeff Janes
Date:

On Tue, Aug 18, 2015 at 3:32 PM, David Fetter <david@fetter.org> wrote:
On Tue, Aug 18, 2015 at 04:54:07PM +0100, Greg Stark wrote:
> On Tue, Aug 18, 2015 at 2:16 PM, David Fetter <david@fetter.org> wrote:
> > I'm given to understand that this tight coupling is necessary for
> > performance.  Are you saying that it could be unwound, or that
> > testing strategies mostly need to take it into account, or...?
>
> I'm just saying that we shouldn't expect to find a magic bullet test
> framework that solves all these problems. Without restructuring
> code, which I don't think is really feasible, we won't be able to
> have good unit test coverage for most existing code.
>
> It might be more practical to start using such a new tool for new
> code only. Then the new code could be structured in ways that allow
> the environment to be mocked more easily and the results observed
> more easily.

Great!

Do we have examples of such tools and code bases structured to
accommodate them that we'd like to use for reference, or at least for
inspiration?

+1 on that.  It would be helpful to see successful examples.  Especially ones written in C.

I can't really figure out what success looks like just from reading the descriptions.

Cheers,

Jeff