Thread: Unit testing

Unit testing

From
Gavin Sherry
Date:
Hi guys,

For the latest few weeks Neil and I have been discussing unit testing as
a means of testing Postgres more rigorously. A brief overview of the ideas
behind unit testing can be found here:
http://www.extremeprogramming.org/rules/unittests.html.

The basic idea is that you write small(ish) tests which test an individual
function. The tests encode the logical rules which the function should
follow. That is, for this particular input, a particular output.

Unit tests give us two things: 1) a lot more granularity than
regression tests, which may have a call path of many hundreds of
functions. We can pass functions atypical input, which they should be able
to deal with, but which they would not receive from the regression test
system. 2) For developers working on new functionality, unit tests will be
a complement to the regression test suite which allows them to quickly see
if their modifications to existing functions break.

Importantly, unit tests should have a minimal impact upon the source.
Tests for functions of a particular file can be placed in a seperate file.
We can create seperate unit test argument, like we do for regression
tests. The only problem I can think of is that many of the functions we
would want to test are static. We could just sed the static typing away,
however, using a rule that all static functions are defined with ^static.

There are frameworks which simplify the creation of test suites. These
frameworks provide a handful of functionality, generally: an assertion,
kind of like Assert() but if the assertion fails, it generates output
(unit test name, function being tested, some times more information) in a
standard format; a queueing system, so that individual tests can be
cobbled together into a suite; and, some provide the ability to run the
test in a child process so that segmentation violations can be detected
and so that a memory overrun doesn't stomp all over the actual test's
address space too. These frameworks usually come as a header and a C
language file can easily be compiled into the source to run the tests.

There are two frameworks I am aware of: cutest and Check.

Cutest, which you can find here: http://cutest.sourceforge.net/,  is a
very lightweight unit test suite. It provides and API to assert that
output, in various formats, matches a certain rule and it allows you to
queue tests into a suite. I think the API is a little more complex than
necessary. Where as with a standard Assert() in C we produce a test with a
boolean result, CuTest can do the tests itself (ie, if you want to assert
on a string comparison or an integer equality, CuTest provides functions
to actually do those operations). I think this is ugly and I guess we
could just use the standard boolean test.

It is licensed under a zlib style license which is, I believe, bsd compatible.

Check (http://check.sourceforge.net) is a library which the same
functionality as Cutest but also allows test designers to run the test in
a child process as well as have general setup and shutdown functions for a
range of tests. The latter is useful if setting up some state data for a
unit test is non-trivial (in terms of time) and would be required to run
for a sub set of the test suite. The actual API is much more like a C
Assert() than Cutest. It is LGPL.

We could of course write our own, we may give us some more freedom.

The point the project can greatly benefit from this kind of low level
testing. I'm happy to kick it off by working out the integration of a unit
test framework or developing a frame work which suites us -- depending on
what others think. I think I'll also start off with some low hanging fruit
by testing non-trivial functions in utils/adt/.

Neil and I will also be developing unit tests for the stored procedure
code we're working on for 8.1.

Ideas, comments, criticisms?

Thanks,

Gavin


Re: Unit testing

From
"Neil Conway"
Date:
[ apologies if this mail is poorly formatted, posted via webmail ]

Gavin Sherry said:
> For the latest few weeks Neil and I have been discussing unit testing as
> a means of testing Postgres more rigorously.

I should note that we've also been looking at some other ideas, including
different approaches to testing, static analysis, and model checking.

> The only problem I can think of is that many of the functions we
> would want to test are static. We could just sed the static typing away,
> however, using a rule that all static functions are defined with ^static.

Another approach would be to have a "configure" flag to enable unit
testing that would define "static" to nothing when enabled. It would be
nice to have access to the prototypes of static functions while writing
unit tests: we could either do without that, or have a script to generate
the header files automatically.

BTW, I think that unit testing could probably only be enabled when a
configure flag is specified in any case: unfortunately the changes needed
to implement it may be rather invasive.

> Where as with a standard Assert() in C we produce a test with a
> boolean result, CuTest can do the tests itself (ie, if you want to assert
> on a string comparison or an integer equality, CuTest provides functions
> to actually do those operations). I think this is ugly and I guess we
> could just use the standard boolean test.

I don't think it's ugly. FWIW, CuTest probably uses that approach because
it is what most of the SUnit-derived testing frameworks do. You can always
use CuAssert() if you want to write the rest of the assertion condition
yourself.

I think one challenge Gavin didn't mention is how easy (or not) it will be
to write unit tests for deeply-internal parts of the backend: unit testing
utils/adt and the like is all well and good, but there really isn't much
point if that's all we can test. The problem with testing the guts of the
backend is that a given backend function typically requires an enormous
amount of state -- it will often make some pretty specific assumptions
about the environment in which it is executing. I'm not sure if there is a
simple way to solve this -- writing the first few deep-internals unit
tests is probably going to be pretty painful. But I think there are a few
reasons to be optimistic:

- once we've written a few such tests, we can begin to see the
initialization / setup code that is required by multiple tests, and
refactor this out into separate functions in the backend. Eventually, the
code that is invoked to do _real_ backend startup would be just another
client of the same set of shared initialization functions that are used to
initialize the environment for unit tests.

- we don't need to write tests for the *entire* source tree before we
begin to see some payback. Once we have a good test suite for a specific
component (say, utils/adt or FE libpq), developers should be able to see
the gains (and hassles) of unit testing; if people like it it should be
easy to incrementally add more tests.

One final note: it is a hassle to unit test a 300 line function because of
all the different code paths and error conditions such a function usually
has. I think a natural pattern will be to test small bits of functionality
and refactor as you go: rather than trying to test a huge function, we
ought to pull a distinct piece of functionality out and into its own
function, which will be much easier to unit test by itself. So unit
testing and refactoring the code to be divided into smaller, more granular
functions tends to go hand in hand. Now, we can debate about whether the
resulting functions are in good style (I strongly believe they are), but I
thought I'd add that.

-Neil




Re: Unit testing

From
Andrew Dunstan
Date:

Neil Conway wrote:

>[ apologies if this mail is poorly formatted, posted via webmail ]
>
>Gavin Sherry said:
>  
>
>>For the latest few weeks Neil and I have been discussing unit testing as
>>a means of testing Postgres more rigorously.
>>    
>>
>
>I should note that we've also been looking at some other ideas, including
>different approaches to testing, static analysis, and model checking.
>
>  
>
>>The only problem I can think of is that many of the functions we
>>would want to test are static. We could just sed the static typing away,
>>however, using a rule that all static functions are defined with ^static.
>>    
>>
>
>Another approach would be to have a "configure" flag to enable unit
>testing that would define "static" to nothing when enabled. It would be
>nice to have access to the prototypes of static functions while writing
>unit tests: we could either do without that, or have a script to generate
>the header files automatically.
>
>BTW, I think that unit testing could probably only be enabled when a
>configure flag is specified in any case: unfortunately the changes needed
>to implement it may be rather invasive.
>
>  
>
>>Where as with a standard Assert() in C we produce a test with a
>>boolean result, CuTest can do the tests itself (ie, if you want to assert
>>on a string comparison or an integer equality, CuTest provides functions
>>to actually do those operations). I think this is ugly and I guess we
>>could just use the standard boolean test.
>>    
>>
>
>I don't think it's ugly. FWIW, CuTest probably uses that approach because
>it is what most of the SUnit-derived testing frameworks do. You can always
>use CuAssert() if you want to write the rest of the assertion condition
>yourself.
>
>I think one challenge Gavin didn't mention is how easy (or not) it will be
>to write unit tests for deeply-internal parts of the backend: unit testing
>utils/adt and the like is all well and good, but there really isn't much
>point if that's all we can test. The problem with testing the guts of the
>backend is that a given backend function typically requires an enormous
>amount of state -- it will often make some pretty specific assumptions
>about the environment in which it is executing. I'm not sure if there is a
>simple way to solve this -- writing the first few deep-internals unit
>tests is probably going to be pretty painful. But I think there are a few
>reasons to be optimistic:
>
>- once we've written a few such tests, we can begin to see the
>initialization / setup code that is required by multiple tests, and
>refactor this out into separate functions in the backend. Eventually, the
>code that is invoked to do _real_ backend startup would be just another
>client of the same set of shared initialization functions that are used to
>initialize the environment for unit tests.
>
>- we don't need to write tests for the *entire* source tree before we
>begin to see some payback. Once we have a good test suite for a specific
>component (say, utils/adt or FE libpq), developers should be able to see
>the gains (and hassles) of unit testing; if people like it it should be
>easy to incrementally add more tests.
>
>One final note: it is a hassle to unit test a 300 line function because of
>all the different code paths and error conditions such a function usually
>has. I think a natural pattern will be to test small bits of functionality
>and refactor as you go: rather than trying to test a huge function, we
>ought to pull a distinct piece of functionality out and into its own
>function, which will be much easier to unit test by itself. So unit
>testing and refactoring the code to be divided into smaller, more granular
>functions tends to go hand in hand. Now, we can debate about whether the
>resulting functions are in good style (I strongly believe they are), but I
>thought I'd add that.
>
>
>  
>

a few thoughts.

1. Small functions are good. My personal rule of thumb is that if I 
can't read it all in one screenful it might be too big (and yes, I know 
I have broken that rule sometimes).
As time goes on, we seem to have a habit of adding bits and pieces of 
stuff inline rather than in a separate function. Short story: many years 
ago I was confronted with a 1000-line "if" statement with many levels, 
and was told that it was to avoid the overhead of function calls (on an 
architecture where the overhead was known to be very low, and where the 
compiler supported inlining anyway). It cost me days and days of work to 
find the right places to stuff my mods.

2. Won't dissolving away "static" cause naming conflicts?

3. Unit testing frameworks are best suited to component-based 
architectures, ISTM. I'm not sure that one would fit Postgres very well. 
Retrofitting unit testing is a lot harder than starting out doing it 
from day 1.

cheers

andrew


Re: Unit testing

From
Neil Conway
Date:
Andrew Dunstan wrote:
> 2. Won't dissolving away "static" cause naming conflicts?

It might, yes. Those can be resolved, I think. I don't see a good reason 
why function names can't be unique across the source tree; at the very 
least, it means less irritation for anyone using tags.

> 3. Unit testing frameworks are best suited to component-based 
> architectures, ISTM. I'm not sure that one would fit Postgres very well.

Can you elaborate?

> Retrofitting unit testing is a lot harder than starting out doing it 
> from day 1.

Granted, but I don't think that implies that retrofitting isn't worth 
the effort.

-Neil


Re: Unit testing

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> 2. Won't dissolving away "static" cause naming conflicts?

Most likely (and I for one will for sure resist any attempt to force
global uniqueness on static names).  It seems that that whole issue
is easily avoided though ... just #include the source file under test
into the unit-test module for it, instead of compiling them separately.

> 3. Unit testing frameworks are best suited to component-based 
> architectures, ISTM. I'm not sure that one would fit Postgres very well. 

I have strong doubts about the usefulness of this too, but if Gavin and
Neil want to invest some time in trying it, I won't stand in their way.

One thing I don't particularly want is a bunch of invasive code changes,
at least in advance of seeing convincing proof that this will be a big win
for us.  The bits about "we'll just refactor the code till we like it"
are raising some red flags for me --- I think that that is at least as
likely to introduce new bugs as find existing ones.
        regards, tom lane


Re: Unit testing

From
Andrew Dunstan
Date:

Neil Conway wrote:

>
>
>> 3. Unit testing frameworks are best suited to component-based 
>> architectures, ISTM. I'm not sure that one would fit Postgres very well.
>
>
> Can you elaborate?
>
>

With objects that are relatively standalone, you can instantiate them 
easily and plug them into a testing framework. The more interdependent 
things are, the harder it is.

I think a concrete example of what you're suggesting would help a lot. 
Pick one small area, show the changes needed, and how you would use the 
testing setup.

cheers

andrew


Re: Unit testing

From
Greg Stark
Date:
Neil Conway <neilc@samurai.com> writes:

> Andrew Dunstan wrote:
> > 2. Won't dissolving away "static" cause naming conflicts?
> 
> It might, yes. Those can be resolved, I think. I don't see a good reason why
> function names can't be unique across the source tree; at the very least, it
> means less irritation for anyone using tags.

You can just compile all the objects normally, and compile the one object
you're going to test with static #defined away.

But it seems to me that most of the really hard bugs to find involve subtle
interactions between functions and the state of the database.

You wouldn't be able to find errors in the semantics of xids for example, or
in the WAL logic that didn't cover some corner case. Or race conditions
between backends...

Unit testing, especially at the level of detail of functions, is a mostly
bankrupt idea. It tests the very things that are easiest to track down. Where
it can come in handy is if you have entire modules with well defined external
interfaces, like the storage manager for example, then you can test them very
thoroughly -- possibly included scenarios that don't even come up in postgres.

But the storage manager is a bad example since it's pretty solid and doesn't
change much. I'm not sure transaction id management or management of locks and
so on are really well defined modules at a high enough level to be testing
anything significant.

-- 
greg



Re: Unit testing

From
Neil Conway
Date:
On Tue, 2004-10-12 at 00:43, Tom Lane wrote:
> Most likely (and I for one will for sure resist any attempt to force
> global uniqueness on static names).

You're right that the issue can be avoided easily enough, but what need
is there _not_ to have globally unique function names?

-Neil




Re: Unit testing

From
Neil Conway
Date:
On Tue, 2004-10-12 at 05:08, Greg Stark wrote:
> But it seems to me that most of the really hard bugs to find involve subtle
> interactions between functions and the state of the database.
> 
> You wouldn't be able to find errors in the semantics of xids for example, or
> in the WAL logic that didn't cover some corner case. Or race conditions
> between backends...

Going into this, these were precisely the kinds of bugs that Gavin and I
wanted to be able to find via some kind of automated QA. I agree that
unit tests aren't ideal for finding these kinds of bugs (although I
don't think they are useless), but what better technique is there?
Regression tests are certainly ineffective at best. Static analysis is
best for finding superficial bugs or enforcing invariants that are easy
to verify at compile-time, so even if there were good open source static
analysis tools I don't think it would be that helpful.

Model checking has some promise[1], but (a) it requires a substantial
amount of work to model check a program, even if we use a tool that will
automatically extract the model for us (e.g. CMC) (b) I'm not aware of a
good open source model checking tool (c) I'm skeptical that model
checking in general is mature enough that it is useful outside academia.

-Neil

[1] e.g. http://www.stanford.edu/~engler/osdi04-fisc.pdf



Re: Unit testing

From
Tom Lane
Date:
Neil Conway <neilc@samurai.com> writes:
> On Tue, 2004-10-12 at 00:43, Tom Lane wrote:
>> Most likely (and I for one will for sure resist any attempt to force
>> global uniqueness on static names).

> You're right that the issue can be avoided easily enough, but what need
> is there _not_ to have globally unique function names?

To me that's pretty much in the you've-got-to-be-kidding domain.  The
reason static functions and local name scoping were invented was exactly
to avoid having to ensure every single name is unique across a whole
project.  The overhead of avoiding duplicates swamps any possible
benefit.

There is another problem with it, which is static variables.  While the
linker should warn about duplicate global code symbols, it's quite
likely to think duplicate global variable declarations should be merged,
thereby silently changing the semantics (and introducing hard-to-find
bugs).  Not to mention the extent of semantics change invoidfoo f(){static int i = 0;...}
So you'd have to be very very careful about just which occurrences of
"static" you removed.  I don't think I'd trust a "column 1" heuristic.

The real bottom line here is that the entire objective of the exercise
is to find bugs ... and we don't really expect it to find a lot of them,
just a few more than our existing methods find.  So adding even a small
probability of introducing new bugs may do serious damage to the
cost/benefit ratio.  Thus I'm pretty skeptical of any part of the
proposal that says to make nontrivial alterations to the existing code.
        regards, tom lane


Re: Unit testing

From
Gavin Sherry
Date:
On Mon, 11 Oct 2004, Tom Lane wrote:

> Neil Conway <neilc@samurai.com> writes:
> > On Tue, 2004-10-12 at 00:43, Tom Lane wrote:
> >> Most likely (and I for one will for sure resist any attempt to force
> >> global uniqueness on static names).
>
> > You're right that the issue can be avoided easily enough, but what need
> > is there _not_ to have globally unique function names?
>
> To me that's pretty much in the you've-got-to-be-kidding domain.  The
> reason static functions and local name scoping were invented was exactly
> to avoid having to ensure every single name is unique across a whole
> project.  The overhead of avoiding duplicates swamps any possible
> benefit.

I agree. I think we can use #include foo.c and in any situation where we
*may* run into duplicate statics, a few lines of sed magic should be
enough. Thus, we would have no impact on the existing code.

Gavin


Re: Unit testing

From
Gavin Sherry
Date:
On Tue, 12 Oct 2004, Neil Conway wrote:

> On Tue, 2004-10-12 at 05:08, Greg Stark wrote:
> > But it seems to me that most of the really hard bugs to find involve subtle
> > interactions between functions and the state of the database.
> >
> > You wouldn't be able to find errors in the semantics of xids for example, or
> > in the WAL logic that didn't cover some corner case. Or race conditions
> > between backends...
>
> Going into this, these were precisely the kinds of bugs that Gavin and I
> wanted to be able to find via some kind of automated QA. I agree that
> unit tests aren't ideal for finding these kinds of bugs (although I
> don't think they are useless), but what better technique is there?
> Regression tests are certainly ineffective at best. Static analysis is
> best for finding superficial bugs or enforcing invariants that are easy
> to verify at compile-time, so even if there were good open source static
> analysis tools I don't think it would be that helpful.

I agree. The fact is, doing nothing new means that we will not find bugs
which the regression test system is unsuited to finding. This makes the
question(s), will adding unit tests: i) find enough current bugs
(presumably, quite a small number); and, ii) assist with development in
the future to the extent that it is worth spending the time working on the
tests themselves. I'm unsure of the worth of arguing this in abstract.
I'll start looking at the amount of time involved in testing functions in
different parts of the source and then report in the coming weeks.

> Model checking has some promise[1], but (a) it requires a substantial
> amount of work to model check a program, even if we use a tool that will
> automatically extract the model for us (e.g. CMC) (b) I'm not aware of a
> good open source model checking tool (c) I'm skeptical that model
> checking in general is mature enough that it is useful outside academia.

Given infinite resources, I'd say it would be worthwhile building a model
checking system analogous to Engler's for Postgres. That is, see if each
data modification operation (ie, additional to WAL) via 'all' possible
paths was recoverable and that the recovery placed the system in the state
expected. Based on the information in Engler's paper, I'd say this would
take a few man months for a very clued up developer. Anyone out there
looking for a thesis project? :-)

Gavin


Re: Unit testing

From
Tom Lane
Date:
Neil Conway <neilc@samurai.com> writes:
> On Tue, 2004-10-12 at 05:08, Greg Stark wrote:
>> You wouldn't be able to find errors in the semantics of xids for example, or
>> in the WAL logic that didn't cover some corner case. Or race conditions
>> between backends...

> Going into this, these were precisely the kinds of bugs that Gavin and I
> wanted to be able to find via some kind of automated QA. I agree that
> unit tests aren't ideal for finding these kinds of bugs (although I
> don't think they are useless), but what better technique is there?
> Regression tests are certainly ineffective at best.

Ahem.  Our *existing* regression tests are fairly ineffective, but
that's because neither the test cases nor the engine are designed to
cover concurrent behavior at all; if anything they go out of their way
to avoid stressing concurrent behavior, in order to get perfectly
constant results.

We've speculated in the past about building a test harness that could
step multiple backends through a concurrent script.  Building something
like that, perhaps with some extra frammishes such as being able to
automatically vary the relative timing of operations, is the best
testing idea that I've heard about.  Also you could extend it to force
crashes at varying points in the sequence and check for successful
recovery (which could catch WAL omissions such as Greg was worrying
about).  You could probably build this on top of an existing tool like
"expect".

While you've not said much about exactly what you have in mind for your
unit-test scheme, I doubt it will be of any value at all if it doesn't
provide ways to test concurrent behavior.  I'm also quite concerned
about the cost of building scaffolding that will allow individual
modules to be tested outside the context of a live backend; and more
than a bit dubious about the effectiveness of tests in a scaffolding
environment instead of a real one.
        regards, tom lane