Thread: autovacuum causing numerous regression-test failures
I think we shall have to reconsider that patch to turn it on by default. So far I've seen two categories of failure: * manual ANALYZE issued by regression tests fails because autovac is analyzing the same table concurrently. * contrib tests fail in their repeated drop/create database operations because autovac is connected to that database. (pl tests presumably have same issue.) There are probably more symptoms we have not seen yet. In the long run it would be good to figure out fixes to make these problems not happen, but I'm not putting that on the must-fix-for-8.2 list. BTW, it would sure be nice to know what happened here: http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=wasp&dt=2006-08-28%2017:05:01 LOG: autovacuum process (PID 26315) was terminated by signal 11 LOG: terminating any other active server processes but even if there was a core file, it got wiped out immediately by the next "DROP DATABASE" command :-(. This one does look like a must-fix, if we can find out what happened. regards, tom lane
Tom Lane wrote: > I think we shall have to reconsider that patch to turn it on by > default. So far I've seen two categories of failure: So we turn autovacuum off for regression test instance. > * manual ANALYZE issued by regression tests fails because autovac is > analyzing the same table concurrently. Or we put manual exceptions for the affected tables into pg_autovacuum. > * contrib tests fail in their repeated drop/create database > operations because autovac is connected to that database. (pl tests > presumably have same issue.) I opine that when a database is to be dropped, the connections should be cut. -- Peter Eisentraut http://developer.postgresql.org/~petere/
Hi, When persisting a holdable cursor at COMMIT time we currently choose to rewind the executor and re-scan the whole result set into the tuplestore in order to be able to scroll backwards later on. And then, we reposition the cursor to the position we been in. However, unless I am missing something, this seems to be done always, even if the cursor is not scrollable. I suppose adding a simple conditional or two in PersistHoldablePortal() in portalcmds.c could save the rescan and filling up the tuplestore with tuples that will never be looked at, in the case that we never want to scroll back. Anyway, definitely not critical, but should save some time and space in those specific situations. Regards, Alon.
Peter Eisentraut <peter_e@gmx.net> writes: > Tom Lane wrote: >> I think we shall have to reconsider that patch to turn it on by >> default. So far I've seen two categories of failure: > So we turn autovacuum off for regression test instance. Not a solution for "make installcheck", unless you are proposing adding the ability to suppress autovac per-database. Which would be a good new feature ... for 8.3. >> * manual ANALYZE issued by regression tests fails because autovac is >> analyzing the same table concurrently. > Or we put manual exceptions for the affected tables into pg_autovacuum. New feature? Or does that capability exist already? >> * contrib tests fail in their repeated drop/create database >> operations because autovac is connected to that database. (pl tests >> presumably have same issue.) > I opine that when a database is to be dropped, the connections should be > cut. Sure, but that's another thing that we're not going to start designing and implementing four weeks after feature freeze. I didn't complain about your proposing two weeks after feature freeze that we turn autovac on by default, because I assumed (same as you no doubt) that it would be a trivial one-liner change. It is becoming clear that that is not the case, and I don't think it makes any sense from a project-management standpoint to try to flush the problems out at this time in the release cycle. We have more than enough problems to fix for 8.2 already. Let's try to do this early in the 8.3 cycle instead. regards, tom lane
Tom Lane wrote: > > So we turn autovacuum off for regression test instance. > > Not a solution for "make installcheck", Well, for "make installcheck" we don't have any control over whether autovacuum has been turned on or off manually anyway. If you are concerned about build farm reliability, the build farm scripts can surely be made to initialize or start the instance in a particular way. Another option might be to turn off stats_row_level on the fly. > > Or we put manual exceptions for the affected tables into > > pg_autovacuum. > > New feature? Or does that capability exist already? I haven't ever used the pg_autovacuum table but the documentation certainly makes one believe that this is possible. > > I opine that when a database is to be dropped, the connections > > should be cut. > > Sure, but that's another thing that we're not going to start > designing and implementing four weeks after feature freeze. Right. > clear that that is not the case, and I don't think it makes any sense > from a project-management standpoint to try to flush the problems out > at this time in the release cycle. We have more than enough problems > to fix for 8.2 already. Let's try to do this early in the 8.3 cycle > instead. Let's just consider some of the options a bit more closely, and if they don't work, we'll revert it. -- Peter Eisentraut http://developer.postgresql.org/~petere/
I wrote: > BTW, it would sure be nice to know what happened here: > http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=wasp&dt=2006-08-28%2017:05:01 > LOG: autovacuum process (PID 26315) was terminated by signal 11 I was able to cause autovac to crash by repeating contrib/intarray regression test enough times in a row. The cause is not specific to autovac, it's a generic bug created by my recent patch to add "waiting" status to pg_stat_activity. If we block on a lock during InitPostgres then the stats stuff isn't ready yet ... oops. Patch committed. The other issues remain problems however. regards, tom lane
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=osprey&dt=2006-08-28%2016:00:17 shows another autovac-induced failure mode: ! psql: FATAL: sorry, too many clients already initdb is choosing max_connections = 20 on this machine, which is sufficient to run the parallel regression tests by themselves, but not regression tests plus autovac. IIRC initdb will go down to 10 or so connections before deciding it's hopeless. I don't really want to change that behavior because it might make it impossible to initdb at all on a small machine. But probably there needs to be a way for pg_regress to set a floor on the acceptable max_connections setting while initializing the test instance for "make check". This also ties into the recent discussions about whether autovac needs its own reserved backend slots. Which, again, sounds to me like a fine idea for 8.3 work. regards, tom lane
On Mon, 2006-08-28 at 15:21 -0400, Tom Lane wrote: > We have more than enough problems to fix for 8.2 already. Let's > try to do this early in the 8.3 cycle instead. I agree -- I think this is exactly the sort of change that is best made at the beginning of a development cycle, so that there's a whole cycle's worth of testing to ensure it plays nicely with the rest of the system. -Neil
Neil Conway wrote: > On Mon, 2006-08-28 at 15:21 -0400, Tom Lane wrote: > > We have more than enough problems to fix for 8.2 already. Let's > > try to do this early in the 8.3 cycle instead. > > I agree -- I think this is exactly the sort of change that is best made > at the beginning of a development cycle, so that there's a whole cycle's > worth of testing to ensure it plays nicely with the rest of the system. On the other hand, the bug Tom found on DROP OWNED a couple of weeks ago was introduced right at the start of this development cycle, which tells us that our testing of the development branch is not very exhaustive. But I agree anyway. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Peter Eisentraut wrote: > Tom Lane wrote: >> Not a solution for "make installcheck", > > Well, for "make installcheck" we don't have any control over whether > autovacuum has been turned on or off manually anyway. If you are > concerned about build farm reliability, the build farm scripts can > surely be made to initialize or start the instance in a particular way. > > Another option might be to turn off stats_row_level on the fly. I'm sure I'm missing some of the subtleties of make installcheck issues, but autovacuum can be enabled / disabled on the fly just as easily as stats_row_level, so I don't see the difference? >>> Or we put manual exceptions for the affected tables into >>> pg_autovacuum. >> New feature? Or does that capability exist already? > > I haven't ever used the pg_autovacuum table but the documentation > certainly makes one believe that this is possible. Right, if it doesn't work, that would certainly be a bug. This feature was included during the original integration into the backend during the 8.0 dev cycle. > Let's just consider some of the options a bit more closely, and if they > don't work, we'll revert it. Agreed.
"Matthew T. O'Connor" <matthew@zeut.net> writes: >> Tom Lane wrote: >>> Not a solution for "make installcheck", > I'm sure I'm missing some of the subtleties of make installcheck issues, > but autovacuum can be enabled / disabled on the fly just as easily as > stats_row_level, so I don't see the difference? Well, "just as easily" means "edit postgresql.conf and SIGHUP", which is not an option available to "make installcheck", even if we thought that an invasive change of the server configuration would be acceptable for it to do. It's conceivable that we could invent a per-database autovac-off variable controlled by, say, ALTER DATABASE SET ... but we haven't got one today. My objection here is basically that this proposal passed on the assumption that it would be very nearly zero effort to make it happen. We are now finding out that we have a fair amount of work to do if we want autovac to not mess up the regression tests, and I think that has to mean that the proposal goes back on the shelf until 8.3 development starts. We are already overcommitted in terms of the stuff that was submitted *before* feature freeze. regards, tom lane
Tom Lane wrote: > > My objection here is basically that this proposal passed on the > assumption that it would be very nearly zero effort to make it happen. > We are now finding out that we have a fair amount of work to do if we > want autovac to not mess up the regression tests, and I think that has > to mean that the proposal goes back on the shelf until 8.3 development > starts. We are already overcommitted in terms of the stuff that was > submitted *before* feature freeze. > Kicking out autovacuum as default is a disaster, it took far too long to get in the backend already (wasn't it planned for 8.0?). You discuss this on the base of the regression tests, which obviously run on installations that do _not_ represent standard recommended installations. It's required for ages now to have vacuum running regularly, using cron or so. The regression tests have to deal with that default situation, in one way or the other (which might well mean "this tables don't need vacuum" or "this instance doesn't need vacuum"). IMHO blaming autovacuum for the test failures reverses cause and effect. Missing vacuum was probably a reason for poor performance of many newbie pgsql installations (and I must admit that I missed installing the cron job myself from time to time, though I _knew_ it was needed). As Magnus already pointed out, all win32 installations have it on by default, to take them to the safe side. Disabling it for modules a "retail" user will never launch appears overreacting. I can positively acknowledge that disabling autovacuum with a pg_autovacuum row does work, I'm using it in production. Regards, Andreas
Andreas Pflug <pgadmin@pse-consulting.de> writes: > Tom Lane wrote: >> My objection here is basically that this proposal passed on the >> assumption that it would be very nearly zero effort to make it happen. > Kicking out autovacuum as default is a disaster, it took far too long to > get in the backend already (wasn't it planned for 8.0?). If it's so "disastrous" to not have it, why wasn't it even proposed until two weeks after feature freeze? Sorry, I'm not buying this argument. regards, tom lane
Am Dienstag, 29. August 2006 11:14 schrieb Andreas Pflug: > already pointed out, all win32 installations have it on by default, to > take them to the safe side. Disabling it for modules a "retail" user > will never launch appears overreacting. Well, the really big problem is that autovacuum may be connected to a database when you want to drop it. (There may be related problems like vacuuming a template database at the wrong time. I'm not sure how that is handled.) I think this is not only a problem that is specific to the regression testing but a potential problem in deployment. I have opined earlier how I think that should behave properly, but we're not going to change that in 8.2. The other problems that were mentioned are pretty easy to work around by setting stats_row_level to off on the fly, but that doesn't stop autovacuum from connecting. The good thing is that we have collected plenty of interesting data in the last 24 hours which will make for plenty of development work next time around. :) -- Peter Eisentraut http://developer.postgresql.org/~petere/
Tom Lane wrote: > Andreas Pflug <pgadmin@pse-consulting.de> writes: > >> Tom Lane wrote: >> >>> My objection here is basically that this proposal passed on the >>> assumption that it would be very nearly zero effort to make it happen. >>> > > >> Kicking out autovacuum as default is a disaster, it took far too long to >> get in the backend already (wasn't it planned for 8.0?). >> > > If it's so "disastrous" to not have it, why wasn't it even proposed > until two weeks after feature freeze? To me, this proposal was just too obvious, for reasons already discussed earlier. Regards, Andreas
Peter Eisentraut wrote: > Am Dienstag, 29. August 2006 11:14 schrieb Andreas Pflug: > >> already pointed out, all win32 installations have it on by default, to >> take them to the safe side. Disabling it for modules a "retail" user >> will never launch appears overreacting. >> > > Well, the really big problem is that autovacuum may be connected to a database > when you want to drop it. (There may be related problems like vacuuming a > template database at the wrong time. I'm not sure how that is handled.) I > think this is not only a problem that is specific to the regression testing > but a potential problem in deployment. I have opined earlier how I think > that should behave properly, but we're not going to change that in 8.2. > Don't these issues hit a cron scheduled vacuum as well? Regards, Andreas
Folks, My vote is with Peter and Tom on not putting it in. We needed to discuss/test this well before feature freeze if we really wanted to do it. Here's what needs to be resolved: a) make autovaccum play nice with the regression tests b) come up with default threshold/multiplier values which are backed by test data -- Josh Berkus PostgreSQL @ Sun San Francisco