Thread: autovacuum causing numerous regression-test failures

autovacuum causing numerous regression-test failures

From
Tom Lane
Date:
I think we shall have to reconsider that patch to turn it on by default.
So far I've seen two categories of failure:

* manual ANALYZE issued by regression tests fails because autovac is
analyzing the same table concurrently.

* contrib tests fail in their repeated drop/create database operations
because autovac is connected to that database.  (pl tests presumably
have same issue.)

There are probably more symptoms we have not seen yet.

In the long run it would be good to figure out fixes to make these
problems not happen, but I'm not putting that on the must-fix-for-8.2
list.

BTW, it would sure be nice to know what happened here:
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=wasp&dt=2006-08-28%2017:05:01

LOG:  autovacuum process (PID 26315) was terminated by signal 11
LOG:  terminating any other active server processes

but even if there was a core file, it got wiped out immediately by
the next "DROP DATABASE" command :-(.  This one does look like a
must-fix, if we can find out what happened.
        regards, tom lane


Re: autovacuum causing numerous regression-test failures

From
Peter Eisentraut
Date:
Tom Lane wrote:
> I think we shall have to reconsider that patch to turn it on by
> default. So far I've seen two categories of failure:

So we turn autovacuum off for regression test instance.

> * manual ANALYZE issued by regression tests fails because autovac is
> analyzing the same table concurrently.

Or we put manual exceptions for the affected tables into pg_autovacuum.

> * contrib tests fail in their repeated drop/create database
> operations because autovac is connected to that database.  (pl tests
> presumably have same issue.)

I opine that when a database is to be dropped, the connections should be 
cut.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/


Unnecessary rescan for non scrollable holdable cursors

From
"Alon Goldshuv"
Date:
Hi,

When persisting a holdable cursor at COMMIT time we currently choose to
rewind the executor and re-scan the whole result set into the tuplestore in
order to be able to scroll backwards later on. And then, we reposition the
cursor to the position we been in. However, unless I am missing something,
this seems to be done always, even if the cursor is not scrollable. I
suppose adding a simple conditional or two in PersistHoldablePortal() in
portalcmds.c could save the rescan and filling up the tuplestore with tuples
that will never be looked at, in the case that we never want to scroll back.

Anyway, definitely not critical, but should save some time and space in
those specific situations.

Regards,
Alon.




Re: autovacuum causing numerous regression-test failures

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> Tom Lane wrote:
>> I think we shall have to reconsider that patch to turn it on by
>> default. So far I've seen two categories of failure:

> So we turn autovacuum off for regression test instance.

Not a solution for "make installcheck", unless you are proposing adding
the ability to suppress autovac per-database.  Which would be a good
new feature ... for 8.3.

>> * manual ANALYZE issued by regression tests fails because autovac is
>> analyzing the same table concurrently.

> Or we put manual exceptions for the affected tables into pg_autovacuum.

New feature?  Or does that capability exist already?

>> * contrib tests fail in their repeated drop/create database
>> operations because autovac is connected to that database.  (pl tests
>> presumably have same issue.)

> I opine that when a database is to be dropped, the connections should be 
> cut.

Sure, but that's another thing that we're not going to start designing
and implementing four weeks after feature freeze.

I didn't complain about your proposing two weeks after feature freeze
that we turn autovac on by default, because I assumed (same as you no
doubt) that it would be a trivial one-liner change.  It is becoming
clear that that is not the case, and I don't think it makes any sense
from a project-management standpoint to try to flush the problems out
at this time in the release cycle.  We have more than enough problems
to fix for 8.2 already.  Let's try to do this early in the 8.3 cycle
instead.
        regards, tom lane


Re: autovacuum causing numerous regression-test failures

From
Peter Eisentraut
Date:
Tom Lane wrote:
> > So we turn autovacuum off for regression test instance.
>
> Not a solution for "make installcheck",

Well, for "make installcheck" we don't have any control over whether 
autovacuum has been turned on or off manually anyway.  If you are 
concerned about build farm reliability, the build farm scripts can 
surely be made to initialize or start the instance in a particular way.

Another option might be to turn off stats_row_level on the fly.

> > Or we put manual exceptions for the affected tables into
> > pg_autovacuum.
>
> New feature?  Or does that capability exist already?

I haven't ever used the pg_autovacuum table but the documentation 
certainly makes one believe that this is possible.

> > I opine that when a database is to be dropped, the connections
> > should be cut.
>
> Sure, but that's another thing that we're not going to start
> designing and implementing four weeks after feature freeze.

Right.

> clear that that is not the case, and I don't think it makes any sense
> from a project-management standpoint to try to flush the problems out
> at this time in the release cycle.  We have more than enough problems
> to fix for 8.2 already.  Let's try to do this early in the 8.3 cycle
> instead.

Let's just consider some of the options a bit more closely, and if they 
don't work, we'll revert it.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/


Re: autovacuum causing numerous regression-test failures

From
Tom Lane
Date:
I wrote:
> BTW, it would sure be nice to know what happened here:
> http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=wasp&dt=2006-08-28%2017:05:01
> LOG:  autovacuum process (PID 26315) was terminated by signal 11

I was able to cause autovac to crash by repeating contrib/intarray
regression test enough times in a row.  The cause is not specific
to autovac, it's a generic bug created by my recent patch to add
"waiting" status to pg_stat_activity.  If we block on a lock during
InitPostgres then the stats stuff isn't ready yet ... oops.
Patch committed.

The other issues remain problems however.
        regards, tom lane


Re: autovacuum causing numerous regression-test failures

From
Tom Lane
Date:
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=osprey&dt=2006-08-28%2016:00:17
shows another autovac-induced failure mode:

! psql: FATAL:  sorry, too many clients already

initdb is choosing max_connections = 20 on this machine, which is
sufficient to run the parallel regression tests by themselves,
but not regression tests plus autovac.

IIRC initdb will go down to 10 or so connections before deciding
it's hopeless.  I don't really want to change that behavior because
it might make it impossible to initdb at all on a small machine.
But probably there needs to be a way for pg_regress to set a floor
on the acceptable max_connections setting while initializing the
test instance for "make check".

This also ties into the recent discussions about whether autovac needs
its own reserved backend slots.  Which, again, sounds to me like a fine
idea for 8.3 work.
        regards, tom lane


Re: autovacuum causing numerous regression-test failures

From
Neil Conway
Date:
On Mon, 2006-08-28 at 15:21 -0400, Tom Lane wrote:
> We have more than enough problems to fix for 8.2 already.  Let's
> try to do this early in the 8.3 cycle instead.

I agree -- I think this is exactly the sort of change that is best made
at the beginning of a development cycle, so that there's a whole cycle's
worth of testing to ensure it plays nicely with the rest of the system.

-Neil




Re: autovacuum causing numerous regression-test failures

From
Alvaro Herrera
Date:
Neil Conway wrote:
> On Mon, 2006-08-28 at 15:21 -0400, Tom Lane wrote:
> > We have more than enough problems to fix for 8.2 already.  Let's
> > try to do this early in the 8.3 cycle instead.
> 
> I agree -- I think this is exactly the sort of change that is best made
> at the beginning of a development cycle, so that there's a whole cycle's
> worth of testing to ensure it plays nicely with the rest of the system.

On the other hand, the bug Tom found on DROP OWNED a couple of weeks ago
was introduced right at the start of this development cycle, which tells
us that our testing of the development branch is not very exhaustive.
But I agree anyway.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: autovacuum causing numerous regression-test failures

From
"Matthew T. O'Connor"
Date:
Peter Eisentraut wrote:
> Tom Lane wrote:
>> Not a solution for "make installcheck",
> 
> Well, for "make installcheck" we don't have any control over whether 
> autovacuum has been turned on or off manually anyway.  If you are 
> concerned about build farm reliability, the build farm scripts can 
> surely be made to initialize or start the instance in a particular way.
> 
> Another option might be to turn off stats_row_level on the fly.

I'm sure I'm missing some of the subtleties of make installcheck issues, 
but autovacuum can be enabled / disabled on the fly just as easily as 
stats_row_level, so I don't see the difference?

>>> Or we put manual exceptions for the affected tables into
>>> pg_autovacuum.
>> New feature?  Or does that capability exist already?
> 
> I haven't ever used the pg_autovacuum table but the documentation 
> certainly makes one believe that this is possible.

Right, if it doesn't work, that would certainly be a bug.  This feature 
was included during the original integration into the backend during the 
8.0 dev cycle.

> Let's just consider some of the options a bit more closely, and if they 
> don't work, we'll revert it.

Agreed.



Re: autovacuum causing numerous regression-test failures

From
Tom Lane
Date:
"Matthew T. O'Connor" <matthew@zeut.net> writes:
>> Tom Lane wrote:
>>> Not a solution for "make installcheck",

> I'm sure I'm missing some of the subtleties of make installcheck issues, 
> but autovacuum can be enabled / disabled on the fly just as easily as 
> stats_row_level, so I don't see the difference?

Well, "just as easily" means "edit postgresql.conf and SIGHUP", which is
not an option available to "make installcheck", even if we thought that
an invasive change of the server configuration would be acceptable for
it to do.  It's conceivable that we could invent a per-database
autovac-off variable controlled by, say, ALTER DATABASE SET ... but we
haven't got one today.

My objection here is basically that this proposal passed on the
assumption that it would be very nearly zero effort to make it happen.
We are now finding out that we have a fair amount of work to do if we
want autovac to not mess up the regression tests, and I think that has
to mean that the proposal goes back on the shelf until 8.3 development
starts.  We are already overcommitted in terms of the stuff that was
submitted *before* feature freeze.
        regards, tom lane


Re: autovacuum causing numerous regression-test failures

From
Andreas Pflug
Date:
Tom Lane wrote:
>
> My objection here is basically that this proposal passed on the
> assumption that it would be very nearly zero effort to make it happen.
> We are now finding out that we have a fair amount of work to do if we
> want autovac to not mess up the regression tests, and I think that has
> to mean that the proposal goes back on the shelf until 8.3 development
> starts.  We are already overcommitted in terms of the stuff that was
> submitted *before* feature freeze.
>   

Kicking out autovacuum as default is a disaster, it took far too long to
get in the backend already (wasn't it planned for 8.0?).
You discuss this on the base of the regression tests, which obviously
run on installations that do _not_ represent standard recommended
installations. It's required for ages now to have vacuum running
regularly, using cron or so. The regression tests have to deal with that
default situation, in one way or the other (which might well mean "this
tables don't need vacuum" or "this instance doesn't need vacuum"). IMHO
blaming autovacuum for the test failures reverses cause and effect.

Missing vacuum was probably a reason for poor performance of many newbie
pgsql installations  (and I must admit that I missed installing the cron
job myself from time to time, though I _knew_ it was needed). As Magnus
already pointed out, all win32 installations have it on by default, to
take them to the safe side. Disabling it for modules a "retail" user
will never launch appears overreacting.

I can positively acknowledge that disabling autovacuum with a
pg_autovacuum row does work, I'm using it in production.

Regards,
Andreas



Re: autovacuum causing numerous regression-test failures

From
Tom Lane
Date:
Andreas Pflug <pgadmin@pse-consulting.de> writes:
> Tom Lane wrote:
>> My objection here is basically that this proposal passed on the
>> assumption that it would be very nearly zero effort to make it happen.

> Kicking out autovacuum as default is a disaster, it took far too long to
> get in the backend already (wasn't it planned for 8.0?).

If it's so "disastrous" to not have it, why wasn't it even proposed
until two weeks after feature freeze?  Sorry, I'm not buying this
argument.
        regards, tom lane


Re: autovacuum causing numerous regression-test failures

From
Peter Eisentraut
Date:
Am Dienstag, 29. August 2006 11:14 schrieb Andreas Pflug:
> already pointed out, all win32 installations have it on by default, to
> take them to the safe side. Disabling it for modules a "retail" user
> will never launch appears overreacting.

Well, the really big problem is that autovacuum may be connected to a database 
when you want to drop it.  (There may be related problems like vacuuming a 
template database at the wrong time.  I'm not sure how that is handled.)  I 
think this is not only a problem that is specific to the regression testing 
but a potential problem in deployment.  I have opined earlier how I think 
that should behave properly, but we're not going to change that in 8.2.

The other problems that were mentioned are pretty easy to work around by 
setting stats_row_level to off on the fly, but that doesn't stop autovacuum 
from connecting.

The good thing is that we have collected plenty of interesting data in the 
last 24 hours which will make for plenty of development work next time 
around. :)

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/


Re: autovacuum causing numerous regression-test failures

From
Andreas Pflug
Date:
Tom Lane wrote:
> Andreas Pflug <pgadmin@pse-consulting.de> writes:
>   
>> Tom Lane wrote:
>>     
>>> My objection here is basically that this proposal passed on the
>>> assumption that it would be very nearly zero effort to make it happen.
>>>       
>
>   
>> Kicking out autovacuum as default is a disaster, it took far too long to
>> get in the backend already (wasn't it planned for 8.0?).
>>     
>
> If it's so "disastrous" to not have it, why wasn't it even proposed
> until two weeks after feature freeze? 
To me, this proposal was just too obvious, for reasons already discussed
earlier.

Regards,
Andreas



Re: autovacuum causing numerous regression-test failures

From
Andreas Pflug
Date:
Peter Eisentraut wrote:
> Am Dienstag, 29. August 2006 11:14 schrieb Andreas Pflug:
>   
>> already pointed out, all win32 installations have it on by default, to
>> take them to the safe side. Disabling it for modules a "retail" user
>> will never launch appears overreacting.
>>     
>
> Well, the really big problem is that autovacuum may be connected to a database 
> when you want to drop it.  (There may be related problems like vacuuming a 
> template database at the wrong time.  I'm not sure how that is handled.)  I 
> think this is not only a problem that is specific to the regression testing 
> but a potential problem in deployment.  I have opined earlier how I think 
> that should behave properly, but we're not going to change that in 8.2.
>   
Don't these issues hit a cron scheduled vacuum as well?

Regards,
Andreas



Re: autovacuum causing numerous regression-test failures

From
Josh Berkus
Date:
Folks,

My vote is with Peter and Tom on not putting it in.  We needed to discuss/test 
this well before feature freeze if we really wanted to do it.

Here's what needs to be resolved:
a) make autovaccum play nice with the regression tests
b) come up with default threshold/multiplier values which are backed by test 
data

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco