Thread: Unit testing
Hi guys, For the latest few weeks Neil and I have been discussing unit testing as a means of testing Postgres more rigorously. A brief overview of the ideas behind unit testing can be found here: http://www.extremeprogramming.org/rules/unittests.html. The basic idea is that you write small(ish) tests which test an individual function. The tests encode the logical rules which the function should follow. That is, for this particular input, a particular output. Unit tests give us two things: 1) a lot more granularity than regression tests, which may have a call path of many hundreds of functions. We can pass functions atypical input, which they should be able to deal with, but which they would not receive from the regression test system. 2) For developers working on new functionality, unit tests will be a complement to the regression test suite which allows them to quickly see if their modifications to existing functions break. Importantly, unit tests should have a minimal impact upon the source. Tests for functions of a particular file can be placed in a seperate file. We can create seperate unit test argument, like we do for regression tests. The only problem I can think of is that many of the functions we would want to test are static. We could just sed the static typing away, however, using a rule that all static functions are defined with ^static. There are frameworks which simplify the creation of test suites. These frameworks provide a handful of functionality, generally: an assertion, kind of like Assert() but if the assertion fails, it generates output (unit test name, function being tested, some times more information) in a standard format; a queueing system, so that individual tests can be cobbled together into a suite; and, some provide the ability to run the test in a child process so that segmentation violations can be detected and so that a memory overrun doesn't stomp all over the actual test's address space too. These frameworks usually come as a header and a C language file can easily be compiled into the source to run the tests. There are two frameworks I am aware of: cutest and Check. Cutest, which you can find here: http://cutest.sourceforge.net/, is a very lightweight unit test suite. It provides and API to assert that output, in various formats, matches a certain rule and it allows you to queue tests into a suite. I think the API is a little more complex than necessary. Where as with a standard Assert() in C we produce a test with a boolean result, CuTest can do the tests itself (ie, if you want to assert on a string comparison or an integer equality, CuTest provides functions to actually do those operations). I think this is ugly and I guess we could just use the standard boolean test. It is licensed under a zlib style license which is, I believe, bsd compatible. Check (http://check.sourceforge.net) is a library which the same functionality as Cutest but also allows test designers to run the test in a child process as well as have general setup and shutdown functions for a range of tests. The latter is useful if setting up some state data for a unit test is non-trivial (in terms of time) and would be required to run for a sub set of the test suite. The actual API is much more like a C Assert() than Cutest. It is LGPL. We could of course write our own, we may give us some more freedom. The point the project can greatly benefit from this kind of low level testing. I'm happy to kick it off by working out the integration of a unit test framework or developing a frame work which suites us -- depending on what others think. I think I'll also start off with some low hanging fruit by testing non-trivial functions in utils/adt/. Neil and I will also be developing unit tests for the stored procedure code we're working on for 8.1. Ideas, comments, criticisms? Thanks, Gavin
[ apologies if this mail is poorly formatted, posted via webmail ] Gavin Sherry said: > For the latest few weeks Neil and I have been discussing unit testing as > a means of testing Postgres more rigorously. I should note that we've also been looking at some other ideas, including different approaches to testing, static analysis, and model checking. > The only problem I can think of is that many of the functions we > would want to test are static. We could just sed the static typing away, > however, using a rule that all static functions are defined with ^static. Another approach would be to have a "configure" flag to enable unit testing that would define "static" to nothing when enabled. It would be nice to have access to the prototypes of static functions while writing unit tests: we could either do without that, or have a script to generate the header files automatically. BTW, I think that unit testing could probably only be enabled when a configure flag is specified in any case: unfortunately the changes needed to implement it may be rather invasive. > Where as with a standard Assert() in C we produce a test with a > boolean result, CuTest can do the tests itself (ie, if you want to assert > on a string comparison or an integer equality, CuTest provides functions > to actually do those operations). I think this is ugly and I guess we > could just use the standard boolean test. I don't think it's ugly. FWIW, CuTest probably uses that approach because it is what most of the SUnit-derived testing frameworks do. You can always use CuAssert() if you want to write the rest of the assertion condition yourself. I think one challenge Gavin didn't mention is how easy (or not) it will be to write unit tests for deeply-internal parts of the backend: unit testing utils/adt and the like is all well and good, but there really isn't much point if that's all we can test. The problem with testing the guts of the backend is that a given backend function typically requires an enormous amount of state -- it will often make some pretty specific assumptions about the environment in which it is executing. I'm not sure if there is a simple way to solve this -- writing the first few deep-internals unit tests is probably going to be pretty painful. But I think there are a few reasons to be optimistic: - once we've written a few such tests, we can begin to see the initialization / setup code that is required by multiple tests, and refactor this out into separate functions in the backend. Eventually, the code that is invoked to do _real_ backend startup would be just another client of the same set of shared initialization functions that are used to initialize the environment for unit tests. - we don't need to write tests for the *entire* source tree before we begin to see some payback. Once we have a good test suite for a specific component (say, utils/adt or FE libpq), developers should be able to see the gains (and hassles) of unit testing; if people like it it should be easy to incrementally add more tests. One final note: it is a hassle to unit test a 300 line function because of all the different code paths and error conditions such a function usually has. I think a natural pattern will be to test small bits of functionality and refactor as you go: rather than trying to test a huge function, we ought to pull a distinct piece of functionality out and into its own function, which will be much easier to unit test by itself. So unit testing and refactoring the code to be divided into smaller, more granular functions tends to go hand in hand. Now, we can debate about whether the resulting functions are in good style (I strongly believe they are), but I thought I'd add that. -Neil
Neil Conway wrote: >[ apologies if this mail is poorly formatted, posted via webmail ] > >Gavin Sherry said: > > >>For the latest few weeks Neil and I have been discussing unit testing as >>a means of testing Postgres more rigorously. >> >> > >I should note that we've also been looking at some other ideas, including >different approaches to testing, static analysis, and model checking. > > > >>The only problem I can think of is that many of the functions we >>would want to test are static. We could just sed the static typing away, >>however, using a rule that all static functions are defined with ^static. >> >> > >Another approach would be to have a "configure" flag to enable unit >testing that would define "static" to nothing when enabled. It would be >nice to have access to the prototypes of static functions while writing >unit tests: we could either do without that, or have a script to generate >the header files automatically. > >BTW, I think that unit testing could probably only be enabled when a >configure flag is specified in any case: unfortunately the changes needed >to implement it may be rather invasive. > > > >>Where as with a standard Assert() in C we produce a test with a >>boolean result, CuTest can do the tests itself (ie, if you want to assert >>on a string comparison or an integer equality, CuTest provides functions >>to actually do those operations). I think this is ugly and I guess we >>could just use the standard boolean test. >> >> > >I don't think it's ugly. FWIW, CuTest probably uses that approach because >it is what most of the SUnit-derived testing frameworks do. You can always >use CuAssert() if you want to write the rest of the assertion condition >yourself. > >I think one challenge Gavin didn't mention is how easy (or not) it will be >to write unit tests for deeply-internal parts of the backend: unit testing >utils/adt and the like is all well and good, but there really isn't much >point if that's all we can test. The problem with testing the guts of the >backend is that a given backend function typically requires an enormous >amount of state -- it will often make some pretty specific assumptions >about the environment in which it is executing. I'm not sure if there is a >simple way to solve this -- writing the first few deep-internals unit >tests is probably going to be pretty painful. But I think there are a few >reasons to be optimistic: > >- once we've written a few such tests, we can begin to see the >initialization / setup code that is required by multiple tests, and >refactor this out into separate functions in the backend. Eventually, the >code that is invoked to do _real_ backend startup would be just another >client of the same set of shared initialization functions that are used to >initialize the environment for unit tests. > >- we don't need to write tests for the *entire* source tree before we >begin to see some payback. Once we have a good test suite for a specific >component (say, utils/adt or FE libpq), developers should be able to see >the gains (and hassles) of unit testing; if people like it it should be >easy to incrementally add more tests. > >One final note: it is a hassle to unit test a 300 line function because of >all the different code paths and error conditions such a function usually >has. I think a natural pattern will be to test small bits of functionality >and refactor as you go: rather than trying to test a huge function, we >ought to pull a distinct piece of functionality out and into its own >function, which will be much easier to unit test by itself. So unit >testing and refactoring the code to be divided into smaller, more granular >functions tends to go hand in hand. Now, we can debate about whether the >resulting functions are in good style (I strongly believe they are), but I >thought I'd add that. > > > > a few thoughts. 1. Small functions are good. My personal rule of thumb is that if I can't read it all in one screenful it might be too big (and yes, I know I have broken that rule sometimes). As time goes on, we seem to have a habit of adding bits and pieces of stuff inline rather than in a separate function. Short story: many years ago I was confronted with a 1000-line "if" statement with many levels, and was told that it was to avoid the overhead of function calls (on an architecture where the overhead was known to be very low, and where the compiler supported inlining anyway). It cost me days and days of work to find the right places to stuff my mods. 2. Won't dissolving away "static" cause naming conflicts? 3. Unit testing frameworks are best suited to component-based architectures, ISTM. I'm not sure that one would fit Postgres very well. Retrofitting unit testing is a lot harder than starting out doing it from day 1. cheers andrew
Andrew Dunstan wrote: > 2. Won't dissolving away "static" cause naming conflicts? It might, yes. Those can be resolved, I think. I don't see a good reason why function names can't be unique across the source tree; at the very least, it means less irritation for anyone using tags. > 3. Unit testing frameworks are best suited to component-based > architectures, ISTM. I'm not sure that one would fit Postgres very well. Can you elaborate? > Retrofitting unit testing is a lot harder than starting out doing it > from day 1. Granted, but I don't think that implies that retrofitting isn't worth the effort. -Neil
Andrew Dunstan <andrew@dunslane.net> writes: > 2. Won't dissolving away "static" cause naming conflicts? Most likely (and I for one will for sure resist any attempt to force global uniqueness on static names). It seems that that whole issue is easily avoided though ... just #include the source file under test into the unit-test module for it, instead of compiling them separately. > 3. Unit testing frameworks are best suited to component-based > architectures, ISTM. I'm not sure that one would fit Postgres very well. I have strong doubts about the usefulness of this too, but if Gavin and Neil want to invest some time in trying it, I won't stand in their way. One thing I don't particularly want is a bunch of invasive code changes, at least in advance of seeing convincing proof that this will be a big win for us. The bits about "we'll just refactor the code till we like it" are raising some red flags for me --- I think that that is at least as likely to introduce new bugs as find existing ones. regards, tom lane
Neil Conway wrote: > > >> 3. Unit testing frameworks are best suited to component-based >> architectures, ISTM. I'm not sure that one would fit Postgres very well. > > > Can you elaborate? > > With objects that are relatively standalone, you can instantiate them easily and plug them into a testing framework. The more interdependent things are, the harder it is. I think a concrete example of what you're suggesting would help a lot. Pick one small area, show the changes needed, and how you would use the testing setup. cheers andrew
Neil Conway <neilc@samurai.com> writes: > Andrew Dunstan wrote: > > 2. Won't dissolving away "static" cause naming conflicts? > > It might, yes. Those can be resolved, I think. I don't see a good reason why > function names can't be unique across the source tree; at the very least, it > means less irritation for anyone using tags. You can just compile all the objects normally, and compile the one object you're going to test with static #defined away. But it seems to me that most of the really hard bugs to find involve subtle interactions between functions and the state of the database. You wouldn't be able to find errors in the semantics of xids for example, or in the WAL logic that didn't cover some corner case. Or race conditions between backends... Unit testing, especially at the level of detail of functions, is a mostly bankrupt idea. It tests the very things that are easiest to track down. Where it can come in handy is if you have entire modules with well defined external interfaces, like the storage manager for example, then you can test them very thoroughly -- possibly included scenarios that don't even come up in postgres. But the storage manager is a bad example since it's pretty solid and doesn't change much. I'm not sure transaction id management or management of locks and so on are really well defined modules at a high enough level to be testing anything significant. -- greg
On Tue, 2004-10-12 at 00:43, Tom Lane wrote: > Most likely (and I for one will for sure resist any attempt to force > global uniqueness on static names). You're right that the issue can be avoided easily enough, but what need is there _not_ to have globally unique function names? -Neil
On Tue, 2004-10-12 at 05:08, Greg Stark wrote: > But it seems to me that most of the really hard bugs to find involve subtle > interactions between functions and the state of the database. > > You wouldn't be able to find errors in the semantics of xids for example, or > in the WAL logic that didn't cover some corner case. Or race conditions > between backends... Going into this, these were precisely the kinds of bugs that Gavin and I wanted to be able to find via some kind of automated QA. I agree that unit tests aren't ideal for finding these kinds of bugs (although I don't think they are useless), but what better technique is there? Regression tests are certainly ineffective at best. Static analysis is best for finding superficial bugs or enforcing invariants that are easy to verify at compile-time, so even if there were good open source static analysis tools I don't think it would be that helpful. Model checking has some promise[1], but (a) it requires a substantial amount of work to model check a program, even if we use a tool that will automatically extract the model for us (e.g. CMC) (b) I'm not aware of a good open source model checking tool (c) I'm skeptical that model checking in general is mature enough that it is useful outside academia. -Neil [1] e.g. http://www.stanford.edu/~engler/osdi04-fisc.pdf
Neil Conway <neilc@samurai.com> writes: > On Tue, 2004-10-12 at 00:43, Tom Lane wrote: >> Most likely (and I for one will for sure resist any attempt to force >> global uniqueness on static names). > You're right that the issue can be avoided easily enough, but what need > is there _not_ to have globally unique function names? To me that's pretty much in the you've-got-to-be-kidding domain. The reason static functions and local name scoping were invented was exactly to avoid having to ensure every single name is unique across a whole project. The overhead of avoiding duplicates swamps any possible benefit. There is another problem with it, which is static variables. While the linker should warn about duplicate global code symbols, it's quite likely to think duplicate global variable declarations should be merged, thereby silently changing the semantics (and introducing hard-to-find bugs). Not to mention the extent of semantics change invoidfoo f(){static int i = 0;...} So you'd have to be very very careful about just which occurrences of "static" you removed. I don't think I'd trust a "column 1" heuristic. The real bottom line here is that the entire objective of the exercise is to find bugs ... and we don't really expect it to find a lot of them, just a few more than our existing methods find. So adding even a small probability of introducing new bugs may do serious damage to the cost/benefit ratio. Thus I'm pretty skeptical of any part of the proposal that says to make nontrivial alterations to the existing code. regards, tom lane
On Mon, 11 Oct 2004, Tom Lane wrote: > Neil Conway <neilc@samurai.com> writes: > > On Tue, 2004-10-12 at 00:43, Tom Lane wrote: > >> Most likely (and I for one will for sure resist any attempt to force > >> global uniqueness on static names). > > > You're right that the issue can be avoided easily enough, but what need > > is there _not_ to have globally unique function names? > > To me that's pretty much in the you've-got-to-be-kidding domain. The > reason static functions and local name scoping were invented was exactly > to avoid having to ensure every single name is unique across a whole > project. The overhead of avoiding duplicates swamps any possible > benefit. I agree. I think we can use #include foo.c and in any situation where we *may* run into duplicate statics, a few lines of sed magic should be enough. Thus, we would have no impact on the existing code. Gavin
On Tue, 12 Oct 2004, Neil Conway wrote: > On Tue, 2004-10-12 at 05:08, Greg Stark wrote: > > But it seems to me that most of the really hard bugs to find involve subtle > > interactions between functions and the state of the database. > > > > You wouldn't be able to find errors in the semantics of xids for example, or > > in the WAL logic that didn't cover some corner case. Or race conditions > > between backends... > > Going into this, these were precisely the kinds of bugs that Gavin and I > wanted to be able to find via some kind of automated QA. I agree that > unit tests aren't ideal for finding these kinds of bugs (although I > don't think they are useless), but what better technique is there? > Regression tests are certainly ineffective at best. Static analysis is > best for finding superficial bugs or enforcing invariants that are easy > to verify at compile-time, so even if there were good open source static > analysis tools I don't think it would be that helpful. I agree. The fact is, doing nothing new means that we will not find bugs which the regression test system is unsuited to finding. This makes the question(s), will adding unit tests: i) find enough current bugs (presumably, quite a small number); and, ii) assist with development in the future to the extent that it is worth spending the time working on the tests themselves. I'm unsure of the worth of arguing this in abstract. I'll start looking at the amount of time involved in testing functions in different parts of the source and then report in the coming weeks. > Model checking has some promise[1], but (a) it requires a substantial > amount of work to model check a program, even if we use a tool that will > automatically extract the model for us (e.g. CMC) (b) I'm not aware of a > good open source model checking tool (c) I'm skeptical that model > checking in general is mature enough that it is useful outside academia. Given infinite resources, I'd say it would be worthwhile building a model checking system analogous to Engler's for Postgres. That is, see if each data modification operation (ie, additional to WAL) via 'all' possible paths was recoverable and that the recovery placed the system in the state expected. Based on the information in Engler's paper, I'd say this would take a few man months for a very clued up developer. Anyone out there looking for a thesis project? :-) Gavin
Neil Conway <neilc@samurai.com> writes: > On Tue, 2004-10-12 at 05:08, Greg Stark wrote: >> You wouldn't be able to find errors in the semantics of xids for example, or >> in the WAL logic that didn't cover some corner case. Or race conditions >> between backends... > Going into this, these were precisely the kinds of bugs that Gavin and I > wanted to be able to find via some kind of automated QA. I agree that > unit tests aren't ideal for finding these kinds of bugs (although I > don't think they are useless), but what better technique is there? > Regression tests are certainly ineffective at best. Ahem. Our *existing* regression tests are fairly ineffective, but that's because neither the test cases nor the engine are designed to cover concurrent behavior at all; if anything they go out of their way to avoid stressing concurrent behavior, in order to get perfectly constant results. We've speculated in the past about building a test harness that could step multiple backends through a concurrent script. Building something like that, perhaps with some extra frammishes such as being able to automatically vary the relative timing of operations, is the best testing idea that I've heard about. Also you could extend it to force crashes at varying points in the sequence and check for successful recovery (which could catch WAL omissions such as Greg was worrying about). You could probably build this on top of an existing tool like "expect". While you've not said much about exactly what you have in mind for your unit-test scheme, I doubt it will be of any value at all if it doesn't provide ways to test concurrent behavior. I'm also quite concerned about the cost of building scaffolding that will allow individual modules to be tested outside the context of a live backend; and more than a bit dubious about the effectiveness of tests in a scaffolding environment instead of a real one. regards, tom lane