Autovaccum failure with Postgres 8.3.14? (max-fsm_pages issue?) - Mailing list pgsql-admin

From Subhasis Bhattacharya
Subject Autovaccum failure with Postgres 8.3.14? (max-fsm_pages issue?)
Date
Msg-id CANOU=99aSuPuHw_LK7fLyaRnh_3bpGQnTDcPB6SEFCo23pg_hA@mail.gmail.com
Whole thread Raw
Responses Re: Autovaccum failure with Postgres 8.3.14? (max-fsm_pages issue?)  (Craig Ringer <craig@2ndQuadrant.com>)
List pgsql-admin
Hi All,

Recently a postgres database embedded within our product faced a series of hiccups at a customer site, as follows:

For about 24 h ours the postgres log file had logged errors like:

2012-09-24 00:00:12 GMTLOG:  could not rename temporary statistics file "global/pgstat.tmp" to "global/pgstat.stat": Input/output error
2012-09-24 00:00:13 GMTLOG:  could not rename temporary statistics file "global/pgstat.tmp" to "global/pgstat.stat": Input/output error
2012-09-24 00:00:15 GMTLOG:  could not rename temporary statistics file "global/pgstat.tmp" to "global/pgstat.stat": Input/output error
2012-09-24 00:00:16 GMTLOG:  could not rename temporary statistics file "global/pgstat.tmp" to "global/pgstat.stat": Input/output error
2012-09-24 00:00:20 GMTLOG:  could not rename temporary statistics file "global/pgstat.tmp" to "global/pgstat.stat": Input/output error
2012-09-24 00:00:21 GMTLOG:  could not rename temporary statistics file "global/pgstat.tmp" to "global/pgstat.stat": Input/output error
2012-09-24 00:00:22 GMTLOG:  could not rename temporary statistics file "global/pgstat.tmp" to "global/pgstat.stat": Input/output error
2012-09-24 00:00:22 GMTLOG:  could not rename temporary statistics file "global/pgstat.tmp" to "global/pgstat.stat": Input/output error
2012-09-24 00:00:23 GMTLOG:  could not rename temporary statistics file "global/pgstat.tmp" to "global/pgstat.stat": Input/output error
2012-09-24 00:00:24 GMTLOG:  could not rename temporary statistics file "global/pgstat.tmp" to "global/pgstat.stat": Input/output error

....
Its an extremely volatile database with rows deleted very often....

After about 24 hours we saw this:
2012-10-01 00:19:21 GMTWARNING:  relation "pg_toast.pg_toast_16509" contains more than "max_fsm_pages" pages with useful free space
2012-10-01 00:19:21 GMTHINT:  Consider using VACUUM FULL on this relation or increasing the configuration parameter "max_fsm_pages".
2012-10-01 00:57:12 GMTWARNING:  relation "pbs.job_attr" contains more than "max_fsm_pages" pages with useful free space
2012-10-01 00:57:12 GMTHINT:  Consider using VACUUM FULL on this relation or increasing the configuration parameter "max_fsm_pages".
2012-10-01 01:09:30 GMTWARNING:  relation "pg_toast.pg_toast_16509" contains more than "max_fsm_pages" pages with useful free space
2012-10-01 01:09:30 GMTHINT:  Consider using VACUUM FULL on this relation or increasing the configuration parameter "max_fsm_pages".
2012-10-01 01:18:45 GMTERROR:  canceling autovacuum task


So looks like autovacuum encountered  a situation it could not handle and was asking for a manual vacuum full. Queries henceforth ran very very slow, since autovacuum kept popping up and cancelling itself...

My question to the postgres guru's or developer's:

- I read that the autovacuum daemon depends on the statistics collector to do its job properly. Could it be that the fact that the collector could not update the pgstat file earlier for over 24 hours or so, led autovacuum to NOT do its job leading to the situation where autovacuum could no longer handle it....

- In a normal situation, should we assume that autovacuum must be able to do it job without needing a manual vacuum full ever?

Thanks and Regards,
Subhasis Bhattacharya

pgsql-admin by date:

Previous
From: "Albe Laurenz"
Date:
Subject: Re: timezone, how postgres deal with it?
Next
From: Craig Ringer
Date:
Subject: Re: Autovaccum failure with Postgres 8.3.14? (max-fsm_pages issue?)