Thread: Autovacuum keeps vacuuming a table disabled in pg_autovacuum

Autovacuum keeps vacuuming a table disabled in pg_autovacuum

From
Csaba Nagy
Date:
Hi all,

System: postgres 8.1.3p, not a released version but a CVS checkout
somewhere after 8.1.3 was released.

I have a table (quite big one, 2 columns, 2166381 DB relpages, 364447136
reltuples as reported by pg_class) which is mostly inserted into, very
rarely deleted from.

I also have a primary key on the combination of the 2 columns with the
corresponding PK index.

Initially I left this table out of the pg_autovacuum table, as like I
said it is never updated, rarely deleted from, and I have a weekly
complete DB vacuum scheduled which should theoretically be enough so
autovacuum will never touch it except maybe for analyzing, which is OK.

Recently however I discovered that my weekly DB wide vacuum started to
crash due to an error on vacuuming this table:

ERROR:  failed to re-find parent key in "pk_table"

This turned out to be a problem for autovacuum too, which spent lots of
time vacuuming this table, only to (presumably) crash and start it all
over next time (I can't explain otherwise the table bloating going on on
this DB box, which translates in slow operation).

I think I have seen some bug reports on this list with similar errors
and I guess it is fixed in 8.2, and I have planned a move to 8.2 anyway
next month, so I thought I just take this table out from autovacuum's
list.

So I did:

insert into pg_autovacuum (
    vacrelid, enabled,
    vac_base_thresh, vac_scale_factor,
    anl_base_thresh, anl_scale_factor,
    vac_cost_delay, vac_cost_limit
) values (
    (select oid from pg_class where relname='table_name'),
    false,
    5000, 0.2,
    10000, 0.2,
    -1, -1
);

Which indeed inserted one row in pg_autovacuum.

Trouble: the autovacuum daemon is still taking that table and vacuums
it... how do I know ? So:

select l.pid,c.relname from pg_locks l, pg_class c where l.pid not in
(select distinct procpid from pg_stat_activity) and l.relation=c.oid and
c.relkind='r';

  pid  |  relname
-------+------------
 16317 | table_name
(1 row)

postgres@dbname:~$ ps auxww|grep auto
postgres 16317  0.8  5.8 436008 240656 ?     D    13:26   0:55 postgres:
autovacuum process   dbname

So, the autovacuum's process is locking the table, which I interpret as
autvacuum vacuuming it.

Question: is the autovacuum daemon somehow caching it's table list ? Can
I reset somehow this ? I tried killing it's process, but it doesn't
work, next time it took the table again.

Thanks,
Csaba.



Re: Autovacuum keeps vacuuming a table disabled in pg_autovacuum

From
Tom Lane
Date:
Csaba Nagy <nagy@ecircle-ag.com> writes:
> System: postgres 8.1.3p, not a released version but a CVS checkout
> somewhere after 8.1.3 was released.

Updating to 8.1.6 or later would fix your "failed to re-find parent key"
problem.

> Trouble: the autovacuum daemon is still taking that table and vacuums
> it... how do I know ?

If that table has a TOAST table you might need to mark the toast table
as disabled too.  Or maybe it's forcing a vacuum because it's worried
about XID wraparound?

            regards, tom lane

Re: Autovacuum keeps vacuuming a table disabled in pg_autovacuum

From
Csaba Nagy
Date:
> If that table has a TOAST table you might need to mark the toast table
> as disabled too.  Or maybe it's forcing a vacuum because it's worried
> about XID wraparound?

OK, that might be the case, as I guess there are tables which were not
successfully vacuumed in the last few months (DB wide vacuum never
finished, and this system was stable enough before that I only noticed
this problem recently when performance started to degrade considerably).
Is there a way I can check the imminence of XID wraparound ?

Thanks,
Csaba.



Re: Autovacuum keeps vacuuming a table disabled in pg_autovacuum

From
Alvaro Herrera
Date:
Csaba Nagy wrote:

> Trouble: the autovacuum daemon is still taking that table and vacuums
> it... how do I know ? So:
>
> select l.pid,c.relname from pg_locks l, pg_class c where l.pid not in
> (select distinct procpid from pg_stat_activity) and l.relation=c.oid and
> c.relkind='r';
>
>   pid  |  relname
> -------+------------
>  16317 | table_name
> (1 row)
>
> postgres@dbname:~$ ps auxww|grep auto
> postgres 16317  0.8  5.8 436008 240656 ?     D    13:26   0:55 postgres:
> autovacuum process   dbname

Probably autovacuum is worried about Xid wraparound so it tries to do a
database-wide vacuum.  In these cases it ignores the pg_autovacuum
disable.

Try reindexing the table; that should make the problem go away.  Also,
update to the latest of the 8.1 branch ASAP.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: Autovacuum keeps vacuuming a table disabled in pg_autovacuum

From
Csaba Nagy
Date:
On Fri, 2007-06-01 at 15:58, Alvaro Herrera wrote:
> Try reindexing the table; that should make the problem go away.  Also,
> update to the latest of the 8.1 branch ASAP.

Reindexing won't work, it would mean hours of downtime. I plan to move
the DB to 8.2 via slony in ~2-3 weeks, that should take care of the
bloating too.

Regarding upgrade, I have to use the CVS version because I have some
local patches to apply. Now what is more recommended, use the latest
version on the REL8_1_STABLE branch or stick with REL8_1_9 ? I checked
and there are quite a few changed files after REL8_1_9.

Thanks,
Csaba.



Re: Autovacuum keeps vacuuming a table disabled in pg_autovacuum

From
Erik Jones
Date:
On Jun 1, 2007, at 9:19 AM, Csaba Nagy wrote:

> On Fri, 2007-06-01 at 15:58, Alvaro Herrera wrote:
>> Try reindexing the table; that should make the problem go away.
>> Also,
>> update to the latest of the 8.1 branch ASAP.
>
> Reindexing won't work, it would mean hours of downtime. I plan to move
> the DB to 8.2 via slony in ~2-3 weeks, that should take care of the
> bloating too.
>
> Regarding upgrade, I have to use the CVS version because I have some
> local patches to apply. Now what is more recommended, use the latest
> version on the REL8_1_STABLE branch or stick with REL8_1_9 ? I checked
> and there are quite a few changed files after REL8_1_9.

How would reindexing a table imply hours of downtime?

Erik Jones

Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com



Re: Autovacuum keeps vacuuming a table disabled in pg_autovacuum

From
Csaba Nagy
Date:
> How would reindexing a table imply hours of downtime?

Simply, all meaningful activities on this system will sooner or later
insert into this table :-)

So given that we use a connection pool we end up pretty soon with all
connections waiting for an insert on this table, and at that point
nothing else will work either.

Cheers,
Csaba.


Re: Autovacuum keeps vacuuming a table disabled in pg_autovacuum

From
Alvaro Herrera
Date:
Erik Jones wrote:
> On Jun 1, 2007, at 9:19 AM, Csaba Nagy wrote:
>
> >On Fri, 2007-06-01 at 15:58, Alvaro Herrera wrote:
> >>Try reindexing the table; that should make the problem go away.
> >>Also,
> >>update to the latest of the 8.1 branch ASAP.
> >
> >Reindexing won't work, it would mean hours of downtime. I plan to move
> >the DB to 8.2 via slony in ~2-3 weeks, that should take care of the
> >bloating too.

I meant that reindex would probably fix the "unable to find parent"
problem.  I am not sure if the index is permanently corrupt, and so you
need a reindex anyway, or rather it's just that VACUUM is unable to work
with it and you just need to update past 8.1.6 in order for vacuum to be
able to work.

peeks the logs ...  Ah, found it:

2006-11-01 16:50  tgl

    * src/backend/access/nbtree/: README (1.8.6.1), nbtinsert.c
      (1.127.2.2), nbtpage.c (1.88.2.2):

Fix "failed to re-find parent key" btree VACUUM failure by tweaking
_bt_pagedel to recover from the failure: just search the whole parent level
if searching to the right fails.  This does nothing for the underlying problem
that index keys became out-of-order in the grandparent level.  However, we
believe that there is no other consequence worse than slightly inefficient
searching, so this narrow patch seems like the safest solution for the back
branches.

You don't need to reindex, just update.

> How would reindexing a table imply hours of downtime?

Because reindexing takes an exclusive lock on the table.

> >Regarding upgrade, I have to use the CVS version because I have some
> >local patches to apply. Now what is more recommended, use the latest
> >version on the REL8_1_STABLE branch or stick with REL8_1_9 ? I checked
> >and there are quite a few changed files after REL8_1_9.

HEAD from the branch is certainly better as it contains some extra
fixes (particularly one for a bug introduced in 8.1.9 which may bite you
if you just pick that one ... see about bug #3116)

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: Autovacuum keeps vacuuming a table disabled in pg_autovacuum

From
Csaba Nagy
Date:
[snip]
> You don't need to reindex, just update.

I was pretty sure I've seen the error reported and the fix too, the
thing is relatively harmless except the vacuum troubles. And considering
that the table itself does not need vacuuming, it would be even more
harmless if it wouldn't block other stuff and if XID wraparound wouldn't
be a problem.

> HEAD from the branch is certainly better as it contains some extra
> fixes (particularly one for a bug introduced in 8.1.9 which may bite you
> if you just pick that one ... see about bug #3116)

OK, I'll go with head.

Thanks,
Csaba.



Re: Autovacuum keeps vacuuming a table disabled in pg_autovacuum

From
Alvaro Herrera
Date:
Csaba Nagy wrote:
> [snip]
> > You don't need to reindex, just update.
>
> I was pretty sure I've seen the error reported and the fix too, the
> thing is relatively harmless except the vacuum troubles. And considering
> that the table itself does not need vacuuming, it would be even more
> harmless if it wouldn't block other stuff and if XID wraparound wouldn't
> be a problem.

Huh, why do you say that it doesn't need any vacuuming?

--
Alvaro Herrera                          Developer, http://www.PostgreSQL.org/
"Always assume the user will do much worse than the stupidest thing
you can imagine."                                (Julien PUYDT)

Re: Autovacuum keeps vacuuming a table disabled in pg_autovacuum

From
Csaba Nagy
Date:
On Fri, 2007-06-01 at 17:40, Alvaro Herrera wrote:
> Huh, why do you say that it doesn't need any vacuuming?

This table is only inserted and rarely deleted, so other than the XID
wraparound vacuuming it is not critical. Of course it will need vacuum
finally, but it can survive months without it - or till the XID
wraparound occurs, which I can't estimate yet if in danger or not... how
could I check that ? (I can probably find the answer if looking harder
though).

Cheers,
Csaba.


Re: Autovacuum keeps vacuuming a table disabled in pg_autovacuum

From
Alvaro Herrera
Date:
Csaba Nagy wrote:
> On Fri, 2007-06-01 at 17:40, Alvaro Herrera wrote:
> > Huh, why do you say that it doesn't need any vacuuming?
>
> This table is only inserted and rarely deleted, so other than the XID
> wraparound vacuuming it is not critical. Of course it will need vacuum
> finally, but it can survive months without it - or till the XID
> wraparound occurs, which I can't estimate yet if in danger or not... how
> could I check that ? (I can probably find the answer if looking harder
> though).

select age(datfrozenxid) from pg_database where datname = 'your database'

2 billions and you are screwed.  Autovacuum starts panicking way before
that, to have enough slack.

--
Alvaro Herrera                  http://www.amazon.com/gp/registry/5ZYLFMCVHXC
"Un poeta es un mundo encerrado en un hombre" (Victor Hugo)

Re: Autovacuum keeps vacuuming a table disabled in pg_autovacuum

From
Csaba Nagy
Date:
> select age(datfrozenxid) from pg_database where datname = 'your database'
>
> 2 billions and you are screwed.  Autovacuum starts panicking way before
> that, to have enough slack.

dbname=# select age(datfrozenxid) from pg_database where datname = 'dbname';
    age
------------
 1648762992
(1 row)

Sooo... looks like time to quickly upgrade to 8.1 head.

Thanks for the help,
Csaba.