Thread: Bug in bttext_abbrev_convert()
Commits b181a919 and arguably c79b6413 added bugs to bttext_abbrev_convert() in the process of fixing some others. In the master branch, bttext_abbrev_convert() can leak memory when the C locale happens to be used and we must detoast, which is unacceptable for about the same reason that it's unacceptable for a standard B-Tree comparator routine. There could also be a use-after-free issue for large strings that are detoasted, because bttext_abbrev_convert() hashes memory which might already be freed (when hashing the authoritative value). Attached patch fixes these issues. As we all know, the state of automated testing is pretty lamentable. This is the kind of thing that we could catch more easily in the future if better infrastructure were in place. I caught this by eyeballing bttext_abbrev_convert() with slightly fresher eyes than the last time I looked. -- Peter Geoghegan
Attachment
On Mon, Jun 29, 2015 at 7:47 PM, Peter Geoghegan <pg@heroku.com> wrote: > Commits b181a919 and arguably c79b6413 added bugs to > bttext_abbrev_convert() in the process of fixing some others. In the > master branch, bttext_abbrev_convert() can leak memory when the C > locale happens to be used and we must detoast, which is unacceptable > for about the same reason that it's unacceptable for a standard B-Tree > comparator routine. There could also be a use-after-free issue for > large strings that are detoasted, because bttext_abbrev_convert() > hashes memory which might already be freed (when hashing the > authoritative value). > > Attached patch fixes these issues. > > As we all know, the state of automated testing is pretty lamentable. > This is the kind of thing that we could catch more easily in the > future if better infrastructure were in place. I caught this by > eyeballing bttext_abbrev_convert() with slightly fresher eyes than the > last time I looked. Committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 6/29/15 6:47 PM, Peter Geoghegan wrote: > As we all know, the state of automated testing is pretty lamentable. > This is the kind of thing that we could catch more easily in the > future if better infrastructure were in place. I caught this by > eyeballing bttext_abbrev_convert() with slightly fresher eyes than the > last time I looked. Isn't this the kind of thing Coverty's supposed to find? -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Data in Trouble? Get it in Treble! http://BlueTreble.com
On Tue, Jun 30, 2015 at 5:25 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote: > Isn't this the kind of thing Coverty's supposed to find? I don't know, but in general I'm not very excited about static analysis tools. The best things that they have going for them is that they're available, and don't require test coverage in the same way as running the regression tests with Valgrind enabled. There is no real testing of sorting in the regression tests. It would be nice to have a way of generating a large and varied selection of sort operations programmatically, to catch this kind of thing. pg_regress is not really up to it. The same applies to various other cases where having a lot of "expected" output makes using pg_regress infeasible. -- Peter Geoghegan
On Wed, Jul 1, 2015 at 9:36 AM, Peter Geoghegan <pg@heroku.com> wrote: > On Tue, Jun 30, 2015 at 5:25 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote: >> Isn't this the kind of thing Coverty's supposed to find? > > I don't know, but in general I'm not very excited about static > analysis tools. The best things that they have going for them is that > they're available, and don't require test coverage in the same way as > running the regression tests with Valgrind enabled. Well, yes. It should have complained about that if it were a perfect tool able to emulate all code paths with all possible variable values. But Coverity is not actually that perfect, and sometimes it misses the shot, like here. by experience when you look at reports of a static tool, you need to have a look first at other code paths that share similarities with the problem reported, and you will actually see that most of the time what the static tool saw is just the tip of the iceberg. The human factor is determinant in this case. > There is no real testing of sorting in the regression tests. It would > be nice to have a way of generating a large and varied selection of > sort operations programmatically, to catch this kind of thing. > pg_regress is not really up to it. The same applies to various other > cases where having a lot of "expected" output makes using pg_regress > infeasible. Well, the issue is double here: 1) Having a buildfarm member with valgrind would greatly help. 2) This code path is not used at all AFAIK in the test suite, so we definitely lack regression tests here. In your opinion what would be a sort set large enough to be able to trigger this code path? The idea is to not make the regression test suite take too much time because of it, and not to have the server created by pg_regress running the regression tests having a too large PGDATA folder. For example, could a btree index do it with a correct data set do it on at least one platform? Regards, -- Michael
On Tue, Jun 30, 2015 at 9:39 PM, Michael Paquier <michael.paquier@gmail.com> wrote: >> There is no real testing of sorting in the regression tests. It would >> be nice to have a way of generating a large and varied selection of >> sort operations programmatically, to catch this kind of thing. >> pg_regress is not really up to it. The same applies to various other >> cases where having a lot of "expected" output makes using pg_regress >> infeasible. > > Well, the issue is double here: > 1) Having a buildfarm member with valgrind would greatly help. Not sure that I agree with that. It wouldn't hurt, but people are running Valgrind often enough these days that I think it's unlikely that having it run consistently actually adds all that much. Maybe it would make a difference to do it on a 32-bit platform, since most of us have not used a 32-bit machine as our primary development environment in about a decade, and we probably miss some things because of that. If you run Valgrind on Postgres master these days, you actually have a pretty good chance of not finding any issues, which is great. shared_buffers support for Valgrind would also help a lot (e.g. making it so that referencing a shared buffer that isn't pinned causes a Valgrind error -- Noah and I discussed this a while back; should be doable). If you're looking for a new project, may I highly recommend working on that. :-) > 2) This code path is not used at all AFAIK in the test suite, so we > definitely lack regression tests here. In your opinion what would be a > sort set large enough to be able to trigger this code path? The idea > is to not make the regression test suite take too much time because of > it, and not to have the server created by pg_regress running the > regression tests having a too large PGDATA folder. For example, could > a btree index do it with a correct data set do it on at least one > platform? Maybe. The trick would be constructing a case where many different code paths are covered, including the many different permutations and combinations of how an external sort can go (number of runs, merge steps, datatypes, etc). In general, I think that there is a lot of value to be added by someone making it their explicit goal to increase test coverage, as measured by a tool like gcov (plus subjective expert analysis, of course), particularly when it comes to things like memory corruption bugs (hopefully including shared memory corruption bugs, and hopefully including recovery). If someone was doing that in a focused way, then the codepath where we must explicitly pfree() (in the case of this bug) would probably have coverage, and then Valgrind probably would catch this. As long as that has to be a part of adding things to the standard regression test suite (particularly with sorts), a suite which is expected to run quickly, and as long as we're entirely relying on pg_regress, we will not make progress here. Maybe if there was an extended regression test suite that was explicitly about meeting a code coverage goal we'd do better. Yes, I think mechanically increasing code coverage is useful as an end in itself (although I do accept that meeting a particularly coverage percentage is often not especially useful). Increasing coverage has led me to the right question before, just as static analysis has done that for you. For example, the effort of increasing coverage can find dead code, and you can intuitively get a sense of where to look for bugs manually by determining mechanically what code paths are hard to hit, or are naturally seldom hit. It would be nice to always have a html report from gcov always available on the internet. That would be something useful to automate, IMV. Watching that regress over time might provide useful insight, but I only use gcov a couple of times a year, so that's not going to happen on its own. -- Peter Geoghegan
On Tue, Jun 30, 2015 at 10:25 PM, Peter Geoghegan <pg@heroku.com> wrote: > It would be nice to always have a html report from gcov always > available on the internet. That would be something useful to automate, > IMV. Watching that regress over time might provide useful insight, but > I only use gcov a couple of times a year, so that's not going to > happen on its own. I generated such a report just now, and noticed this: 1137 : tuplesort_performsort(btspool->sortstate); 214 1131 : if (btspool2) 215 0 : tuplesort_performsort(btspool2->sortstate); 216 : 217 1131 : wstate.heap = btspool->heap; 218 1131 : wstate.index = btspool->index; The regression tests have zero coverage for this tuplesort_performsort() "btspool2" case. That's a fairly common case to have no coverage for, and that took me all of 5 minutes to find. -- Peter Geoghegan
On Tue, Jun 30, 2015 at 10:35 PM, Peter Geoghegan <pg@heroku.com> wrote: > The regression tests have zero coverage for this > tuplesort_performsort() "btspool2" case. That's a fairly common case > to have no coverage for, and that took me all of 5 minutes to find. BTW, I looked here because I added a bunch of sortsupport stuff to _bt_load() in 9.5. It appears that that new code is entirely without coverage. -- Peter Geoghegan
On Wed, Jul 1, 2015 at 2:38 PM, Peter Geoghegan <pg@heroku.com> wrote: > On Tue, Jun 30, 2015 at 10:35 PM, Peter Geoghegan <pg@heroku.com> wrote: >> The regression tests have zero coverage for this >> tuplesort_performsort() "btspool2" case. That's a fairly common case >> to have no coverage for, and that took me all of 5 minutes to find. > > BTW, I looked here because I added a bunch of sortsupport stuff to > _bt_load() in 9.5. It appears that that new code is entirely without > coverage. That's not cool. A patch for the src/test/regress area would be welcome. -- Michael
Peter Geoghegan wrote: > It would be nice to always have a html report from gcov always > available on the internet. That would be something useful to automate, > IMV. http://pgci.eisentraut.org/jenkins/job/postgresql_master_coverage/Coverage/ -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Peter Geoghegan wrote: >> It would be nice to always have a html report from gcov always >> available on the internet. That would be something useful to >> automate, IMV. > > http://pgci.eisentraut.org/jenkins/job/postgresql_master_coverage/Coverage/ What would it take to get something like that which uses the check-world target instead of just the check target? Without the additional tests (like the isolation tests), some of these numbers don't reflect the coverage of regularly run tests. While it is of some interest what coverage we get on the 30 second `make check` runs, I would be a lot more interested in what our coverage is when the full 8.5 minute set of regression tests we have in git is run. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Jul 6, 2015 at 12:27 PM, Kevin Grittner <kgrittn@ymail.com> wrote: > What would it take to get something like that which uses the > check-world target instead of just the check target? Without the > additional tests (like the isolation tests), some of these numbers > don't reflect the coverage of regularly run tests. While it is of > some interest what coverage we get on the 30 second `make check` > runs, I would be a lot more interested in what our coverage is when > the full 8.5 minute set of regression tests we have in git is run. I agree. Obviously in some cases the coverage is bad because concurrency is naturally required. It's even more glaring that recovery codepaths are always all-red in coverage reports. -- Peter Geoghegan
Kevin Grittner wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > Peter Geoghegan wrote: > > >> It would be nice to always have a html report from gcov always > >> available on the internet. That would be something useful to > >> automate, IMV. > > > > http://pgci.eisentraut.org/jenkins/job/postgresql_master_coverage/Coverage/ > > What would it take to get something like that which uses the > check-world target instead of just the check target? I suggest CC'ing Peter as a first measure. I already suggested this (or something similar) to him months ago. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Jul 6, 2015 at 12:52 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > I suggest CC'ing Peter as a first measure. I already suggested this (or > something similar) to him months ago. This would be a worthwhile effort. tuplesort.c only has 46% coverage. There is no coverage for functions that I know are used all the time in production, like dumptuples(), or ExecHashIncreaseNumBatches(). We should make increasing test coverage an explicit goal. Ideally, there'd be a lower quality set of tests that fill in certain gaps in code coverage, but are used less frequently, and may take much longer to run. Simply having the code executed will allow tools like Valgrind to catch bugs. -- Peter Geoghegan
On 7/6/15 3:52 PM, Alvaro Herrera wrote: > Kevin Grittner wrote: >> Alvaro Herrera <alvherre@2ndquadrant.com> wrote: >>> Peter Geoghegan wrote: >> >>>> It would be nice to always have a html report from gcov always >>>> available on the internet. That would be something useful to >>>> automate, IMV. >>> >>> http://pgci.eisentraut.org/jenkins/job/postgresql_master_coverage/Coverage/ >> >> What would it take to get something like that which uses the >> check-world target instead of just the check target? > > I suggest CC'ing Peter as a first measure. I already suggested this (or > something similar) to him months ago. I would like to do that, but the current host has no more capacity and the hoster is complaining, so I need to look for a new hosting solution before I can expand what is being built.
On 2015-07-19 12:35:46 -0400, Peter Eisentraut wrote: > I would like to do that, but the current host has no more capacity and > the hoster is complaining, so I need to look for a new hosting solution > before I can expand what is being built. Perhaps this could be moved into a virtual machine hosted by the postgres project infrastructure? What kind of resources do you currently have and what would you need? Andres
On Sun, Jul 19, 2015 at 9:53 AM, Andres Freund <andres@anarazel.de> wrote: > Perhaps this could be moved into a virtual machine hosted by the > postgres project infrastructure? I would definitely be willing to start an external project to provide additional coverage tests if I thought someone was inclined to run them in this kind of environment. There *is* a place for scattergun tests that just exercise rarely hit codepaths (or even frequently hit codepaths that are not tested by the regression tests currently, often due to requiring non-trivial amounts of data). Running the tests with tools like Valgrind will catch a certain class of bug, like the one reported and fixed on this thread. The advantage of an external project is that being a committer isn't tied to being a Postgres committer, and that it allows us to have broader requirements to run all the tests. For example, it would be acceptable to write some tests in Perl, and others in Python, according to individual preference. Or, individual scripts could have requirements that are only practical to satisfy with a language-specific package manager like Python's pip, which would never fly with the standard regression tests. I can imagine that kind of thing being useful for generating a test-case involving very large hash joins or sorts, where a procedural generation of the tests is easier than having hundreds of megabytes of dummy data stored in git. Weren't early versions of isolationtester written in Python + Twisted? Noah suggested that Valgrind be extended to detect when a buffer was read from despite not being pinned. This seems like a good idea to me, but unless and until we exercise the code paths where this might be happening, which tend to involve complex issues with concurrency, then we won't get the full benefit of automated detection. I recall noticing such a bug when reviewing the B-Tree page split patch that went into 9.4. But the relevant code path isn't exercised at all, and so even if Valgrind could have detected it, it probably wouldn't have (the P_INCOMPLETE_SPLIT() path within _bt_moveright() has no coverage). I think that a lot of us are doing stuff like this privately already to some extent, but we don't know about it because it's never submitted because the requirements are too high. -- Peter Geoghegan