Thread: Autovacuum keeps vacuuming a table disabled in pg_autovacuum
Hi all, System: postgres 8.1.3p, not a released version but a CVS checkout somewhere after 8.1.3 was released. I have a table (quite big one, 2 columns, 2166381 DB relpages, 364447136 reltuples as reported by pg_class) which is mostly inserted into, very rarely deleted from. I also have a primary key on the combination of the 2 columns with the corresponding PK index. Initially I left this table out of the pg_autovacuum table, as like I said it is never updated, rarely deleted from, and I have a weekly complete DB vacuum scheduled which should theoretically be enough so autovacuum will never touch it except maybe for analyzing, which is OK. Recently however I discovered that my weekly DB wide vacuum started to crash due to an error on vacuuming this table: ERROR: failed to re-find parent key in "pk_table" This turned out to be a problem for autovacuum too, which spent lots of time vacuuming this table, only to (presumably) crash and start it all over next time (I can't explain otherwise the table bloating going on on this DB box, which translates in slow operation). I think I have seen some bug reports on this list with similar errors and I guess it is fixed in 8.2, and I have planned a move to 8.2 anyway next month, so I thought I just take this table out from autovacuum's list. So I did: insert into pg_autovacuum ( vacrelid, enabled, vac_base_thresh, vac_scale_factor, anl_base_thresh, anl_scale_factor, vac_cost_delay, vac_cost_limit ) values ( (select oid from pg_class where relname='table_name'), false, 5000, 0.2, 10000, 0.2, -1, -1 ); Which indeed inserted one row in pg_autovacuum. Trouble: the autovacuum daemon is still taking that table and vacuums it... how do I know ? So: select l.pid,c.relname from pg_locks l, pg_class c where l.pid not in (select distinct procpid from pg_stat_activity) and l.relation=c.oid and c.relkind='r'; pid | relname -------+------------ 16317 | table_name (1 row) postgres@dbname:~$ ps auxww|grep auto postgres 16317 0.8 5.8 436008 240656 ? D 13:26 0:55 postgres: autovacuum process dbname So, the autovacuum's process is locking the table, which I interpret as autvacuum vacuuming it. Question: is the autovacuum daemon somehow caching it's table list ? Can I reset somehow this ? I tried killing it's process, but it doesn't work, next time it took the table again. Thanks, Csaba.
Csaba Nagy <nagy@ecircle-ag.com> writes: > System: postgres 8.1.3p, not a released version but a CVS checkout > somewhere after 8.1.3 was released. Updating to 8.1.6 or later would fix your "failed to re-find parent key" problem. > Trouble: the autovacuum daemon is still taking that table and vacuums > it... how do I know ? If that table has a TOAST table you might need to mark the toast table as disabled too. Or maybe it's forcing a vacuum because it's worried about XID wraparound? regards, tom lane
> If that table has a TOAST table you might need to mark the toast table > as disabled too. Or maybe it's forcing a vacuum because it's worried > about XID wraparound? OK, that might be the case, as I guess there are tables which were not successfully vacuumed in the last few months (DB wide vacuum never finished, and this system was stable enough before that I only noticed this problem recently when performance started to degrade considerably). Is there a way I can check the imminence of XID wraparound ? Thanks, Csaba.
Csaba Nagy wrote: > Trouble: the autovacuum daemon is still taking that table and vacuums > it... how do I know ? So: > > select l.pid,c.relname from pg_locks l, pg_class c where l.pid not in > (select distinct procpid from pg_stat_activity) and l.relation=c.oid and > c.relkind='r'; > > pid | relname > -------+------------ > 16317 | table_name > (1 row) > > postgres@dbname:~$ ps auxww|grep auto > postgres 16317 0.8 5.8 436008 240656 ? D 13:26 0:55 postgres: > autovacuum process dbname Probably autovacuum is worried about Xid wraparound so it tries to do a database-wide vacuum. In these cases it ignores the pg_autovacuum disable. Try reindexing the table; that should make the problem go away. Also, update to the latest of the 8.1 branch ASAP. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Fri, 2007-06-01 at 15:58, Alvaro Herrera wrote: > Try reindexing the table; that should make the problem go away. Also, > update to the latest of the 8.1 branch ASAP. Reindexing won't work, it would mean hours of downtime. I plan to move the DB to 8.2 via slony in ~2-3 weeks, that should take care of the bloating too. Regarding upgrade, I have to use the CVS version because I have some local patches to apply. Now what is more recommended, use the latest version on the REL8_1_STABLE branch or stick with REL8_1_9 ? I checked and there are quite a few changed files after REL8_1_9. Thanks, Csaba.
On Jun 1, 2007, at 9:19 AM, Csaba Nagy wrote: > On Fri, 2007-06-01 at 15:58, Alvaro Herrera wrote: >> Try reindexing the table; that should make the problem go away. >> Also, >> update to the latest of the 8.1 branch ASAP. > > Reindexing won't work, it would mean hours of downtime. I plan to move > the DB to 8.2 via slony in ~2-3 weeks, that should take care of the > bloating too. > > Regarding upgrade, I have to use the CVS version because I have some > local patches to apply. Now what is more recommended, use the latest > version on the REL8_1_STABLE branch or stick with REL8_1_9 ? I checked > and there are quite a few changed files after REL8_1_9. How would reindexing a table imply hours of downtime? Erik Jones Software Developer | Emma® erik@myemma.com 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com
> How would reindexing a table imply hours of downtime? Simply, all meaningful activities on this system will sooner or later insert into this table :-) So given that we use a connection pool we end up pretty soon with all connections waiting for an insert on this table, and at that point nothing else will work either. Cheers, Csaba.
Erik Jones wrote: > On Jun 1, 2007, at 9:19 AM, Csaba Nagy wrote: > > >On Fri, 2007-06-01 at 15:58, Alvaro Herrera wrote: > >>Try reindexing the table; that should make the problem go away. > >>Also, > >>update to the latest of the 8.1 branch ASAP. > > > >Reindexing won't work, it would mean hours of downtime. I plan to move > >the DB to 8.2 via slony in ~2-3 weeks, that should take care of the > >bloating too. I meant that reindex would probably fix the "unable to find parent" problem. I am not sure if the index is permanently corrupt, and so you need a reindex anyway, or rather it's just that VACUUM is unable to work with it and you just need to update past 8.1.6 in order for vacuum to be able to work. peeks the logs ... Ah, found it: 2006-11-01 16:50 tgl * src/backend/access/nbtree/: README (1.8.6.1), nbtinsert.c (1.127.2.2), nbtpage.c (1.88.2.2): Fix "failed to re-find parent key" btree VACUUM failure by tweaking _bt_pagedel to recover from the failure: just search the whole parent level if searching to the right fails. This does nothing for the underlying problem that index keys became out-of-order in the grandparent level. However, we believe that there is no other consequence worse than slightly inefficient searching, so this narrow patch seems like the safest solution for the back branches. You don't need to reindex, just update. > How would reindexing a table imply hours of downtime? Because reindexing takes an exclusive lock on the table. > >Regarding upgrade, I have to use the CVS version because I have some > >local patches to apply. Now what is more recommended, use the latest > >version on the REL8_1_STABLE branch or stick with REL8_1_9 ? I checked > >and there are quite a few changed files after REL8_1_9. HEAD from the branch is certainly better as it contains some extra fixes (particularly one for a bug introduced in 8.1.9 which may bite you if you just pick that one ... see about bug #3116) -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
[snip] > You don't need to reindex, just update. I was pretty sure I've seen the error reported and the fix too, the thing is relatively harmless except the vacuum troubles. And considering that the table itself does not need vacuuming, it would be even more harmless if it wouldn't block other stuff and if XID wraparound wouldn't be a problem. > HEAD from the branch is certainly better as it contains some extra > fixes (particularly one for a bug introduced in 8.1.9 which may bite you > if you just pick that one ... see about bug #3116) OK, I'll go with head. Thanks, Csaba.
Csaba Nagy wrote: > [snip] > > You don't need to reindex, just update. > > I was pretty sure I've seen the error reported and the fix too, the > thing is relatively harmless except the vacuum troubles. And considering > that the table itself does not need vacuuming, it would be even more > harmless if it wouldn't block other stuff and if XID wraparound wouldn't > be a problem. Huh, why do you say that it doesn't need any vacuuming? -- Alvaro Herrera Developer, http://www.PostgreSQL.org/ "Always assume the user will do much worse than the stupidest thing you can imagine." (Julien PUYDT)
On Fri, 2007-06-01 at 17:40, Alvaro Herrera wrote: > Huh, why do you say that it doesn't need any vacuuming? This table is only inserted and rarely deleted, so other than the XID wraparound vacuuming it is not critical. Of course it will need vacuum finally, but it can survive months without it - or till the XID wraparound occurs, which I can't estimate yet if in danger or not... how could I check that ? (I can probably find the answer if looking harder though). Cheers, Csaba.
Csaba Nagy wrote: > On Fri, 2007-06-01 at 17:40, Alvaro Herrera wrote: > > Huh, why do you say that it doesn't need any vacuuming? > > This table is only inserted and rarely deleted, so other than the XID > wraparound vacuuming it is not critical. Of course it will need vacuum > finally, but it can survive months without it - or till the XID > wraparound occurs, which I can't estimate yet if in danger or not... how > could I check that ? (I can probably find the answer if looking harder > though). select age(datfrozenxid) from pg_database where datname = 'your database' 2 billions and you are screwed. Autovacuum starts panicking way before that, to have enough slack. -- Alvaro Herrera http://www.amazon.com/gp/registry/5ZYLFMCVHXC "Un poeta es un mundo encerrado en un hombre" (Victor Hugo)
> select age(datfrozenxid) from pg_database where datname = 'your database' > > 2 billions and you are screwed. Autovacuum starts panicking way before > that, to have enough slack. dbname=# select age(datfrozenxid) from pg_database where datname = 'dbname'; age ------------ 1648762992 (1 row) Sooo... looks like time to quickly upgrade to 8.1 head. Thanks for the help, Csaba.