Thread: Autovacuum stopped running
I had a very strange occurrence 2 days ago where autovacuum appears to have stopped running.
I did not see any error messages in pg_log. The problem caused a file which is usually around 50Mb but which gets updated extensively to grow to 105GB, which in turn brought the server’s performance down to unusable levels.
Restarting Postgres got the autovacuum running again, but was just wondering what could have cause this.
PostgreSQL 8.4.3 amd64 on FreeBSD 7.2.
"Benjamin Krajmalnik" <kraj@illumen.com> wrote: > I had a very strange occurrence 2 days ago where autovacuum > appears to have stopped running. What did you see which led you to that conclusion? Did ps show the 'autovacuum launcher process'? > I did not see any error messages in pg_log. The problem caused a > file which is usually around 50Mb but which gets updated > extensively to grow to 105GB, which in turn brought the server's > performance down to unusable levels. Were there any long-running transactions showing in pg_stat_activity? (That would include transactions showing '<IDLE> in transaction'.) -Kevin
TO be truthful, I did not check that. I arrived at the conclusion because although I have 10 autovacuum threads, none of the tables had been autoanalyzed or autovacuumed in hours (and due to the nature of the activity they do so every minute). If I ever see this behavior now I know what to check for :) Thx. > -----Original Message----- > From: Kevin Grittner [mailto:Kevin.Grittner@wicourts.gov] > Sent: Saturday, May 01, 2010 8:27 AM > To: Benjamin Krajmalnik; pgsql-admin@postgresql.org > Subject: Re: [ADMIN] Autovacuum stopped running > > "Benjamin Krajmalnik" <kraj@illumen.com> wrote: > > I had a very strange occurrence 2 days ago where autovacuum > > appears to have stopped running. > > What did you see which led you to that conclusion? Did ps show the > 'autovacuum launcher process'? > > > I did not see any error messages in pg_log. The problem caused a > > file which is usually around 50Mb but which gets updated > > extensively to grow to 105GB, which in turn brought the server's > > performance down to unusable levels. > > Were there any long-running transactions showing in > pg_stat_activity? (That would include transactions showing '<IDLE> > in transaction'.) > > -Kevin
"Benjamin Krajmalnik" <kraj@illumen.com> writes: > TO be truthful, I did not check that. > I arrived at the conclusion because although I have 10 autovacuum > threads, none of the tables had been autoanalyzed or autovacuumed in > hours (and due to the nature of the activity they do so every minute). > If I ever see this behavior now I know what to check for :) Thx. We've seen one or two previous reports that suggested that the autovac launcher might have stopped launching workers, but it hasn't been reproducible. Are you using any nonstandard parameters for autovacuum? regards, tom lane
# - Cost-Based Vacuum Delay - #vacuum_cost_delay = 0ms # 0-100 milliseconds #vacuum_cost_page_hit = 1 # 0-10000 credits #vacuum_cost_page_miss = 10 # 0-10000 credits #vacuum_cost_page_dirty = 20 # 0-10000 credits vacuum_cost_limit = 10000 # 1-10000 credits #----------------------------------------------------------------------- ------- # AUTOVACUUM PARAMETERS #----------------------------------------------------------------------- ------- #autovacuum = on # Enable autovacuum subprocess? 'on' # requires track_counts to also be on. #log_autovacuum_min_duration = -1 # -1 disables, 0 logs all actions and # their durations, > 0 logs only # actions running at least this number # of milliseconds. autovacuum_max_workers = 10 # max number of autovacuum subprocesses #autovacuum_naptime = 1min # time between autovacuum runs #autovacuum_vacuum_threshold = 50 # min number of row updates before # vacuum #autovacuum_analyze_threshold = 50 # min number of row updates before # analyze #autovacuum_vacuum_scale_factor = 0.2 # fraction of table size before vacuum #autovacuum_analyze_scale_factor = 0.1 # fraction of table size before analyze #autovacuum_freeze_max_age = 200000000 # maximum XID age before forced vacuum # (change requires restart) #autovacuum_vacuum_cost_delay = 20ms # default vacuum cost delay for # autovacuum, in milliseconds; # -1 means use vacuum_cost_delay #autovacuum_vacuum_cost_limit = -1 # default vacuum cost limit for # autovacuum, -1 means use # vacuum_cost_limit > -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: Saturday, May 01, 2010 5:10 PM > To: Benjamin Krajmalnik > Cc: Kevin Grittner; pgsql-admin@postgresql.org > Subject: Re: [ADMIN] Autovacuum stopped running > > "Benjamin Krajmalnik" <kraj@illumen.com> writes: > > TO be truthful, I did not check that. > > I arrived at the conclusion because although I have 10 autovacuum > > threads, none of the tables had been autoanalyzed or autovacuumed in > > hours (and due to the nature of the activity they do so every > minute). > > If I ever see this behavior now I know what to check for :) Thx. > > We've seen one or two previous reports that suggested that the autovac > launcher might have stopped launching workers, but it hasn't been > reproducible. Are you using any nonstandard parameters for autovacuum? > > regards, tom lane
OK. This happened again on another server with these same settings. I did not see any long running transaction, and the autovacuum launch process was running. > -----Original Message----- > From: Benjamin Krajmalnik > Sent: Saturday, May 01, 2010 5:22 PM > To: 'Tom Lane' > Cc: Kevin Grittner; pgsql-admin@postgresql.org > Subject: RE: [ADMIN] Autovacuum stopped running > > # - Cost-Based Vacuum Delay - > > #vacuum_cost_delay = 0ms # 0-100 milliseconds > #vacuum_cost_page_hit = 1 # 0-10000 credits > #vacuum_cost_page_miss = 10 # 0-10000 credits > #vacuum_cost_page_dirty = 20 # 0-10000 credits > vacuum_cost_limit = 10000 # 1-10000 credits > > #---------------------------------------------------------------------- > -------- > # AUTOVACUUM PARAMETERS > #---------------------------------------------------------------------- > -------- > > #autovacuum = on # Enable autovacuum subprocess? 'on' > # requires track_counts to also be on. > #log_autovacuum_min_duration = -1 # -1 disables, 0 logs all actions > and > # their durations, > 0 logs only > # actions running at least this number > # of milliseconds. > autovacuum_max_workers = 10 # max number of autovacuum > subprocesses > #autovacuum_naptime = 1min # time between autovacuum runs > #autovacuum_vacuum_threshold = 50 # min number of row updates before > # vacuum > #autovacuum_analyze_threshold = 50 # min number of row updates before > # analyze > #autovacuum_vacuum_scale_factor = 0.2 # fraction of table size > before vacuum > #autovacuum_analyze_scale_factor = 0.1 # fraction of table size > before analyze > #autovacuum_freeze_max_age = 200000000 # maximum XID age before > forced vacuum > # (change requires restart) > #autovacuum_vacuum_cost_delay = 20ms # default vacuum cost delay > for > # autovacuum, in milliseconds; > # -1 means use vacuum_cost_delay > #autovacuum_vacuum_cost_limit = -1 # default vacuum cost limit for > # autovacuum, -1 means use > # vacuum_cost_limit > > > > -----Original Message----- > > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > > Sent: Saturday, May 01, 2010 5:10 PM > > To: Benjamin Krajmalnik > > Cc: Kevin Grittner; pgsql-admin@postgresql.org > > Subject: Re: [ADMIN] Autovacuum stopped running > > > > "Benjamin Krajmalnik" <kraj@illumen.com> writes: > > > TO be truthful, I did not check that. > > > I arrived at the conclusion because although I have 10 autovacuum > > > threads, none of the tables had been autoanalyzed or autovacuumed > in > > > hours (and due to the nature of the activity they do so every > > minute). > > > If I ever see this behavior now I know what to check for :) Thx. > > > > We've seen one or two previous reports that suggested that the > autovac > > launcher might have stopped launching workers, but it hasn't been > > reproducible. Are you using any nonstandard parameters for > autovacuum? > > > > regards, tom lane
Benjamin Krajmalnik wrote: > OK. > This happened again on another server with these same settings. > I did not see any long running transaction, and the autovacuum launch > process was running. Did you check the server log for FATAL or ERROR lines? Maybe it's dying trying to vacuum some table and isn't ever able to get to the next ones. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
"Benjamin Krajmalnik" <kraj@illumen.com> wrote: > This happened again on another server with these same settings. > I did not see any long running transaction, and the autovacuum > launch process was running. Like last time, all ten autovacuum workers were running? What does CPU and disk activity look like in this state? The contents of pg_locks might yield a clue. If you have all ten workers showing on the task list, but they don't seem to be making much progress, connecting to each with gdb (or equivalent) and getting a backtrace might be useful. -Kevin
"Benjamin Krajmalnik" <kraj@illumen.com> writes: > This happened again on another server with these same settings. > I did not see any long running transaction, and the autovacuum launch > process was running. Hmm ... could you strace the launcher for a couple of minutes to see whether it's doing anything? Also, I've forgotten exactly which PG version you're running? regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > Also, I've forgotten exactly which PG version you're running? The original post said: PostgreSQL 8.4.3 amd64 on FreeBSD 7.2. -Kevin
Tom, 1. I had to restart the postgres server, otherwise I would have been in deep problems. 2. I just tried installing strace from the ports system - unfortunately it appears that FreeBSD only has it for i38s, not amd64. Any alternatives so that if this happens again I can take care of this? > > Hmm ... could you strace the launcher for a couple of minutes to see > whether it's doing anything? > > Also, I've forgotten exactly which PG version you're running? > > regards, tom lane
I did not find any errors pertaining to the autovacuum task. In both cases we had a data-centric issue which was causing excessive errors on the database (we have a timestamp field and an external utility which we use to collect product keys was sending the date as "dd/mm/yyyy hh:mm:ss" for some entries. We modified our processing script to catch this condition and rewrite it to "mm/dd/yyyy hh:mm:ss" on the server script, and I will be modifying the agent to send it in as "yyyymmdd hh:mm:ss" in our next agent release to bullet proof it. > -----Original Message----- > From: Alvaro Herrera [mailto:alvherre@commandprompt.com] > Sent: Monday, May 03, 2010 7:48 AM > To: Benjamin Krajmalnik > Cc: Tom Lane; pgsql-admin@postgresql.org > Subject: Re: [ADMIN] Autovacuum stopped running > > Benjamin Krajmalnik wrote: > > OK. > > This happened again on another server with these same settings. > > I did not see any long running transaction, and the autovacuum launch > > process was running. > > Did you check the server log for FATAL or ERROR lines? Maybe it's > dying > trying to vacuum some table and isn't ever able to get to the next > ones. > > > -- > Alvaro Herrera > http://www.CommandPrompt.com/ > PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Benjamin Krajmalnik wrote: > I did not find any errors pertaining to the autovacuum task. Hmm. I think it would be good to find out what is the launcher doing, if anything. If you strace it, do you see it sending signals to postmaster? -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.