Thread: autovacuum and immediate shutdown issues

autovacuum and immediate shutdown issues

From
Brad Nicholson
Date:
If you issue an immediate shutdown to the database, autovacumm will not
process tables that should be vacuumed until manually re-analyzed.

PG 8.3.8

Relevant settings:

autovacuum = on
log_autovacuum_min_duration = 0
autovacuum_max_workers = 3
autovacuum_naptime = 1min
autovacuum_vacuum_threshold = 10
autovacuum_analyze_threshold = 10
autovacuum_vacuum_scale_factor = 0.05
autovacuum_analyze_scale_factor = 0.05


test=# CREATE TABLE foo (id int);
test=# INSERT INTO foo SELECT generate_series(1,1000);

auto-analyze runs as expected

test=# SELECT last_autovacuum, last_autoanalyze from pg_stat_user_tables
where relname ='foo';
-[ RECORD 1 ]----+------------------------------
last_autovacuum  |
last_autoanalyze | 2009-10-19 14:14:47.791658+00


test=# DELETE FROM foo WHERE id IN (SELECT id FROM foo LIMIT 100);

test=# SELECT * from pgstattuple('foo');
-[ RECORD 1 ]------+------
table_len          | 32768
tuple_count        | 900
tuple_len          | 25200
tuple_percent      | 76.9
dead_tuple_count   | 100
dead_tuple_len     | 2800
dead_tuple_percent | 8.54
free_space         | 656
free_percent       | 2

dead_tuple_percent > 5% - autovacuum runs as expected on next pass:

test=# SELECT last_autovacuum, last_autoanalyze from pg_stat_user_tables
where relname ='foo';
-[ RECORD 1 ]----+------------------------------
last_autovacuum  | 2009-10-19 14:16:47.910177+00
last_autoanalyze | 2009-10-19 14:14:47.791658+00


--repopulate table
test=# TRUNCATE foo;
test=# INSERT INTO foo SELECT generate_series(1,1000);

Wait for autoanalyze, then before next run of autovacuum

test=# DELETE FROM foo WHERE id IN (SELECT id FROM foo LIMIT 100);
pg_ctl -D data -m immediate stop

restart the postmaster
stats are gone due to the immediate shutdown.

test=# SELECT last_autovacuum, last_autoanalyze from pg_stat_user_tables
where relname ='foo';
-[ RECORD 1 ]----+-
last_autovacuum  |
last_autoanalyze |


test=# SELECT * from pgstattuple('foo');
-[ RECORD 1 ]------+------
table_len          | 32768
tuple_count        | 900
tuple_len          | 25200
tuple_percent      | 76.9
dead_tuple_count   | 100
dead_tuple_len     | 2800
dead_tuple_percent | 8.54
free_space         | 656
free_percent       | 2

dead_tuple_percent > 5% of table is dead, autovacuum should pick it up,
but it doesn't (yes, I have waited longer enough).
autoanalyze does not process the table.

Autovacuum will not process this table again until one of the following
two conditions are met:

1: Manually analyze the table
2: an additional 5% (autovacuum_vacuum_scale_factor) of the tuples in
the are dirtied - which amounts to 10% (autovacuum_vacuum_scale_factor *
2) of the actual table being dirtied before it gets vacuumed.

In case 2 - reports from pgstattuple are odd.  After deleting one row.

test=# DELETE FROM foo WHERE id IN (SELECT id FROM foo LIMIT 1);
DELETE 1

test=# SELECT * from pgstattuple('foo');
-[ RECORD 1 ]------+------
table_len          | 32768
tuple_count        | 899
tuple_len          | 25172
tuple_percent      | 76.82
dead_tuple_count   | 1
dead_tuple_len     | 28
dead_tuple_percent | 0.09
free_space         | 3456
free_percent       | 10.55

Stats start over.

The DB should be able to recover cleanly from an immediate shutdown
IMHO.  If the stats are no longer there, I would expect autoanalyze to
run and regenerate them.  This is the same behaviour as when a new table
is created and populated.


A few questions

1: Is this expected behaviour, or a bug?
2: If not a bug, why does autoanalyze not process these tables.  It will
process newly loaded tables when they do not have any stats.
3: What is the best work around for this?  When our HA solution triggers
a DB shutdown, we want it to be immediate.  Currently we run a manual
analyze post recovery, this is bad for really for large databases.
4: after restart, why does pgstattuple shoe dead_tuple_percent = 8.54,
but after deleting one row, it shows dead_tuple_percent = 0.09?
5: on the missing stats - does this mean my query plans are potentially
bad until the stats are regenerated?


--
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.



Re: autovacuum and immediate shutdown issues

From
Tom Lane
Date:
Brad Nicholson <bnichols@ca.afilias.info> writes:
> If you issue an immediate shutdown to the database, autovacumm will not
> process tables that should be vacuumed until manually re-analyzed.

AFAICS this is an unsurprising consequence of flushing stats on a crash.
If you don't like it, avoid immediate shutdowns --- they are not
especially good practice in any case.

> 3: What is the best work around for this?  When our HA solution triggers
> a DB shutdown, we want it to be immediate.

That seems like a fundamentally stupid idea, unless you are unconcerned
with the time and cost of getting the DB running again, which seemingly
you are.

            regards, tom lane

Re: autovacuum and immediate shutdown issues

From
Brad Nicholson
Date:
On Mon, 2009-10-19 at 12:07 -0400, Tom Lane wrote:
> Brad Nicholson <bnichols@ca.afilias.info> writes:
> > If you issue an immediate shutdown to the database, autovacumm will not
> > process tables that should be vacuumed until manually re-analyzed.
>
> AFAICS this is an unsurprising consequence of flushing stats on a crash.
> If you don't like it, avoid immediate shutdowns --- they are not
> especially good practice in any case.
>
> > 3: What is the best work around for this?  When our HA solution triggers
> > a DB shutdown, we want it to be immediate.
>
> That seems like a fundamentally stupid idea, unless you are unconcerned
> with the time and cost of getting the DB running again, which seemingly
> you are.
>

I disagree that this is fundamentally stupid.  We are talking about a
situation where the server is about to die, HA solution kicks in and
moves it to standby.

If we wait for a clean shutdown instead, and the server dies before it
completes (which is entirely possible), Postgres crashes and the exact
same behaviour will happen.  It also means that if any server crashes
(HA aside, shutdown method aside), the database will come up, but
functionality may be impacted until manual intervention.

At the very least. shouldn't autoanalyze not correct the lack of
statistics?  To me, this looks like the database will not come up
cleanly after crashing.

--
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.



Re: autovacuum and immediate shutdown issues

From
Tom Lane
Date:
Brad Nicholson <bnichols@ca.afilias.info> writes:
> On Mon, 2009-10-19 at 12:07 -0400, Tom Lane wrote:
>> That seems like a fundamentally stupid idea, unless you are unconcerned
>> with the time and cost of getting the DB running again, which seemingly
>> you are.

> I disagree that this is fundamentally stupid.  We are talking about a
> situation where the server is about to die, HA solution kicks in and
> moves it to standby.

Moving it to standby immediately is a good idea, but it does not follow
that you need to hit the DB over the head with a hammer.  A fast-mode
shutdown seems perfectly adequate.  If it isn't, you're going to need
nontrivial recovery effort anyhow.

            regards, tom lane

Re: autovacuum and immediate shutdown issues

From
Scott Marlowe
Date:
On Mon, Oct 19, 2009 at 10:44 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Brad Nicholson <bnichols@ca.afilias.info> writes:
>> On Mon, 2009-10-19 at 12:07 -0400, Tom Lane wrote:
>>> That seems like a fundamentally stupid idea, unless you are unconcerned
>>> with the time and cost of getting the DB running again, which seemingly
>>> you are.
>
>> I disagree that this is fundamentally stupid.  We are talking about a
>> situation where the server is about to die, HA solution kicks in and
>> moves it to standby.
>
> Moving it to standby immediately is a good idea, but it does not follow
> that you need to hit the DB over the head with a hammer.  A fast-mode
> shutdown seems perfectly adequate.  If it isn't, you're going to need
> nontrivial recovery effort anyhow.

All of this is completely besides the point that a database that's
been shutdown immediately / had the power cord yanked comes back up
and doesn't start autovacuuming automatically, which seems a
non-optimal behaviour.

Re: autovacuum and immediate shutdown issues

From
Brad Nicholson
Date:
On Mon, 2009-10-19 at 10:53 -0600, Scott Marlowe wrote:
> On Mon, Oct 19, 2009 at 10:44 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Brad Nicholson <bnichols@ca.afilias.info> writes:
> >> On Mon, 2009-10-19 at 12:07 -0400, Tom Lane wrote:
> >>> That seems like a fundamentally stupid idea, unless you are unconcerned
> >>> with the time and cost of getting the DB running again, which seemingly
> >>> you are.
> >
> >> I disagree that this is fundamentally stupid.  We are talking about a
> >> situation where the server is about to die, HA solution kicks in and
> >> moves it to standby.
> >
> > Moving it to standby immediately is a good idea, but it does not follow
> > that you need to hit the DB over the head with a hammer.  A fast-mode
> > shutdown seems perfectly adequate.  If it isn't, you're going to need
> > nontrivial recovery effort anyhow.
>
> All of this is completely besides the point that a database that's
> been shutdown immediately / had the power cord yanked comes back up
> and doesn't start autovacuuming automatically, which seems a
> non-optimal behaviour.

It's also not going to endear us very much to the VLDB crowd - it will
amounts to a multi-hour crash recovery for those folks while analyze
regenerates statistics.

--
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.



Re: autovacuum and immediate shutdown issues

From
Scott Marlowe
Date:
On Mon, Oct 19, 2009 at 11:06 AM, Brad Nicholson
<bnichols@ca.afilias.info> wrote:
> On Mon, 2009-10-19 at 10:53 -0600, Scott Marlowe wrote:
>> On Mon, Oct 19, 2009 at 10:44 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> > Brad Nicholson <bnichols@ca.afilias.info> writes:
>> >> On Mon, 2009-10-19 at 12:07 -0400, Tom Lane wrote:
>> >>> That seems like a fundamentally stupid idea, unless you are unconcerned
>> >>> with the time and cost of getting the DB running again, which seemingly
>> >>> you are.
>> >
>> >> I disagree that this is fundamentally stupid.  We are talking about a
>> >> situation where the server is about to die, HA solution kicks in and
>> >> moves it to standby.
>> >
>> > Moving it to standby immediately is a good idea, but it does not follow
>> > that you need to hit the DB over the head with a hammer.  A fast-mode
>> > shutdown seems perfectly adequate.  If it isn't, you're going to need
>> > nontrivial recovery effort anyhow.
>>
>> All of this is completely besides the point that a database that's
>> been shutdown immediately / had the power cord yanked comes back up
>> and doesn't start autovacuuming automatically, which seems a
>> non-optimal behaviour.
>
> It's also not going to endear us very much to the VLDB crowd - it will
> amounts to a multi-hour crash recovery for those folks while analyze
> regenerates statistics.

But this would be AOK behaviour for small transactional databases?

Again, besides the point, but important.  The real point is a database
that doesn't run autovac after an emergency shutdown is broken by
design, and not just for one use case.

Re: autovacuum and immediate shutdown issues

From
Brad Nicholson
Date:
On Mon, 2009-10-19 at 11:16 -0600, Scott Marlowe wrote:
> On Mon, Oct 19, 2009 at 11:06 AM, Brad Nicholson
> <bnichols@ca.afilias.info> wrote:
> > On Mon, 2009-10-19 at 10:53 -0600, Scott Marlowe wrote:
> >> On Mon, Oct 19, 2009 at 10:44 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> > Brad Nicholson <bnichols@ca.afilias.info> writes:
> >> >> On Mon, 2009-10-19 at 12:07 -0400, Tom Lane wrote:
> >> >>> That seems like a fundamentally stupid idea, unless you are unconcerned
> >> >>> with the time and cost of getting the DB running again, which seemingly
> >> >>> you are.
> >> >
> >> >> I disagree that this is fundamentally stupid.  We are talking about a
> >> >> situation where the server is about to die, HA solution kicks in and
> >> >> moves it to standby.
> >> >
> >> > Moving it to standby immediately is a good idea, but it does not follow
> >> > that you need to hit the DB over the head with a hammer.  A fast-mode
> >> > shutdown seems perfectly adequate.  If it isn't, you're going to need
> >> > nontrivial recovery effort anyhow.
> >>
> >> All of this is completely besides the point that a database that's
> >> been shutdown immediately / had the power cord yanked comes back up
> >> and doesn't start autovacuuming automatically, which seems a
> >> non-optimal behaviour.
> >
> > It's also not going to endear us very much to the VLDB crowd - it will
> > amounts to a multi-hour crash recovery for those folks while analyze
> > regenerates statistics.
>
> But this would be AOK behaviour for small transactional databases?

Defiantly not.

> Again, besides the point, but important.  The real point is a database
> that doesn't run autovac after an emergency shutdown is broken by
> design, and not just for one use case.

This behaviour is also undocumented AFAIK.  I would bet that a lot of
users would have no idea that they are in this state post
crash-recovery.

--
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.



Re: autovacuum and immediate shutdown issues

From
Tom Lane
Date:
Scott Marlowe <scott.marlowe@gmail.com> writes:
> All of this is completely besides the point that a database that's
> been shutdown immediately / had the power cord yanked comes back up
> and doesn't start autovacuuming automatically, which seems a
> non-optimal behaviour.

It'll start as soon as you've modified enough rows.  The absolute worst
case behavior is that table bloat reaches twice the level it would have
otherwise, or pg_statistic data becomes twice as out of date as it would
have otherwise.

Now, if your server's MTBF is less than the autovac interval, you could
indeed have an accumulating problem ... but I suggest that in that
situation you've got other issues to fix.

            regards, tom lane

Re: autovacuum and immediate shutdown issues

From
Scott Marlowe
Date:
On Mon, Oct 19, 2009 at 11:27 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Scott Marlowe <scott.marlowe@gmail.com> writes:
>> All of this is completely besides the point that a database that's
>> been shutdown immediately / had the power cord yanked comes back up
>> and doesn't start autovacuuming automatically, which seems a
>> non-optimal behaviour.
>
> It'll start as soon as you've modified enough rows.  The absolute worst
> case behavior is that table bloat reaches twice the level it would have
> otherwise, or pg_statistic data becomes twice as out of date as it would
> have otherwise.

That could be a pretty bad worst case scenario for certain types of
tables / usage patterns.

How bad can the affect of out of date pg_statistic data be?  Is it
likely to turn a hash agg into a nested loop and take a query from
seconds to minutes?  If so, then that's pretty bad.

> Now, if your server's MTBF is less than the autovac interval, you could
> indeed have an accumulating problem ... but I suggest that in that
> situation you've got other issues to fix.

True.  Still very much beside the point.

Re: autovacuum and immediate shutdown issues

From
Christophe Pettus
Date:
On Oct 19, 2009, at 11:26 AM, Scott Marlowe wrote:
> That could be a pretty bad worst case scenario for certain types of
> tables / usage patterns.

Given that (presumably) the database server is not failing repeatedly
without some kind of operator notification, isn't it at least somewhat
reasonable to just make "start an vacuum upon recovery from an
immediate shutdown" an operational procedure, rather than something PG
does automatically?

--
-- Christophe Pettus
    xof@thebuild.com


Re: autovacuum and immediate shutdown issues

From
Brad Nicholson
Date:
On Mon, 2009-10-19 at 11:35 -0700, Christophe Pettus wrote:
> On Oct 19, 2009, at 11:26 AM, Scott Marlowe wrote:
> > That could be a pretty bad worst case scenario for certain types of
> > tables / usage patterns.
>
> Given that (presumably) the database server is not failing repeatedly
> without some kind of operator notification, isn't it at least somewhat
> reasonable to just make "start an vacuum upon recovery from an
> immediate shutdown" an operational procedure, rather than something PG
> does automatically?
>


It's not a vacuum you want, it's an analyze.  Once the stats are back,
autovacuum will vacuum accordingly.

autoanalyze will automatically analyze new tables when they don't have
stats.  It seems logical that it should handle this case where the table
also does not have stats.

--
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.



Re: autovacuum and immediate shutdown issues

From
Tom Lane
Date:
Brad Nicholson <bnichols@ca.afilias.info> writes:
> autoanalyze will automatically analyze new tables when they don't have
> stats.  It seems logical that it should handle this case where the table
> also does not have stats.

It will autoanalyze once a sufficient number of inserts have occurred.
The effect of a crash is just to reset the inserts-since-last-analyze
counters.  You really haven't made the case why that's so awful that
we need to do things that are unpleasant along other dimensions in order
to avoid it.  (The only ways of avoiding it that I can see would result
in very significant overhead added to the stats collection mechanism.)

            regards, tom lane

Re: autovacuum and immediate shutdown issues

From
Brad Nicholson
Date:
On Mon, 2009-10-19 at 15:01 -0400, Tom Lane wrote:
> Brad Nicholson <bnichols@ca.afilias.info> writes:
> > autoanalyze will automatically analyze new tables when they don't have
> > stats.  It seems logical that it should handle this case where the table
> > also does not have stats.
>
> It will autoanalyze once a sufficient number of inserts have occurred.
> The effect of a crash is just to reset the inserts-since-last-analyze
> counters.  You really haven't made the case why that's so awful that
> we need to do things that are unpleasant along other dimensions in order
> to avoid it.  (The only ways of avoiding it that I can see would result
> in very significant overhead added to the stats collection mechanism.)

What about the question that Scott and I both asked - what about query
plans.  Will they be affected?

If so, you could have a table that does not get updated, that no longer
generates relevant query plans, and won't until manual intervention.

--
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.



Re: autovacuum and immediate shutdown issues

From
Brad Nicholson
Date:
On Mon, 2009-10-19 at 15:09 -0400, Brad Nicholson wrote:
> On Mon, 2009-10-19 at 15:01 -0400, Tom Lane wrote:
> > Brad Nicholson <bnichols@ca.afilias.info> writes:
> > > autoanalyze will automatically analyze new tables when they don't have
> > > stats.  It seems logical that it should handle this case where the table
> > > also does not have stats.
> >
> > It will autoanalyze once a sufficient number of inserts have occurred.
> > The effect of a crash is just to reset the inserts-since-last-analyze
> > counters.  You really haven't made the case why that's so awful that
> > we need to do things that are unpleasant along other dimensions in order
> > to avoid it.  (The only ways of avoiding it that I can see would result
> > in very significant overhead added to the stats collection mechanism.)
>
> What about the question that Scott and I both asked - what about query
> plans.  Will they be affected?
>
> If so, you could have a table that does not get updated, that no longer
> generates relevant query plans, and won't until manual intervention.
>

Actually, isn't there also a condition (although unlikely to occur) here
that could result in transaction wraparound?
--
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.



Re: autovacuum and immediate shutdown issues

From
Tom Lane
Date:
Brad Nicholson <bnichols@ca.afilias.info> writes:
> What about the question that Scott and I both asked - what about query
> plans.  Will they be affected?

No, they should be the same as what you were getting just before the
crash.  The planner only looks at pg_statistic which is a whole
different thing.

            regards, tom lane