Thread: Autovacuum stopped running

Autovacuum stopped running

From
"Benjamin Krajmalnik"
Date:

I had a very strange occurrence 2 days ago where autovacuum appears to have stopped running.

I did not see any error messages in pg_log.  The problem caused a file which is usually around 50Mb but which gets updated extensively to grow to 105GB, which in turn brought the server’s performance down to unusable levels.

Restarting Postgres got the autovacuum running again, but was just wondering what could have cause this.

 

PostgreSQL 8.4.3 amd64 on FreeBSD 7.2.

 

Re: Autovacuum stopped running

From
"Kevin Grittner"
Date:
"Benjamin Krajmalnik" <kraj@illumen.com> wrote:
> I had a very strange occurrence 2 days ago where autovacuum
> appears to have stopped running.

What did you see which led you to that conclusion?  Did ps show the
'autovacuum launcher process'?

> I did not see any error messages in pg_log.  The problem caused a
> file which is usually around 50Mb but which gets updated
> extensively to grow to 105GB, which in turn brought the server's
> performance down to unusable levels.

Were there any long-running transactions showing in
pg_stat_activity?  (That would include transactions showing '<IDLE>
in transaction'.)

-Kevin

Re: Autovacuum stopped running

From
"Benjamin Krajmalnik"
Date:
TO be truthful, I did not check that.
I arrived at the conclusion because although I have 10 autovacuum
threads, none of the tables had been autoanalyzed or autovacuumed in
hours (and due to the nature of the activity they do so every minute).
If I ever see this behavior now I know what to check for :) Thx.

> -----Original Message-----
> From: Kevin Grittner [mailto:Kevin.Grittner@wicourts.gov]
> Sent: Saturday, May 01, 2010 8:27 AM
> To: Benjamin Krajmalnik; pgsql-admin@postgresql.org
> Subject: Re: [ADMIN] Autovacuum stopped running
>
> "Benjamin Krajmalnik" <kraj@illumen.com> wrote:
> > I had a very strange occurrence 2 days ago where autovacuum
> > appears to have stopped running.
>
> What did you see which led you to that conclusion?  Did ps show the
> 'autovacuum launcher process'?
>
> > I did not see any error messages in pg_log.  The problem caused a
> > file which is usually around 50Mb but which gets updated
> > extensively to grow to 105GB, which in turn brought the server's
> > performance down to unusable levels.
>
> Were there any long-running transactions showing in
> pg_stat_activity?  (That would include transactions showing '<IDLE>
> in transaction'.)
>
> -Kevin

Re: Autovacuum stopped running

From
Tom Lane
Date:
"Benjamin Krajmalnik" <kraj@illumen.com> writes:
> TO be truthful, I did not check that.
> I arrived at the conclusion because although I have 10 autovacuum
> threads, none of the tables had been autoanalyzed or autovacuumed in
> hours (and due to the nature of the activity they do so every minute).
> If I ever see this behavior now I know what to check for :) Thx.

We've seen one or two previous reports that suggested that the autovac
launcher might have stopped launching workers, but it hasn't been
reproducible.  Are you using any nonstandard parameters for autovacuum?

            regards, tom lane

Re: Autovacuum stopped running

From
"Benjamin Krajmalnik"
Date:
# - Cost-Based Vacuum Delay -

#vacuum_cost_delay = 0ms        # 0-100 milliseconds
#vacuum_cost_page_hit = 1        # 0-10000 credits
#vacuum_cost_page_miss = 10        # 0-10000 credits
#vacuum_cost_page_dirty = 20        # 0-10000 credits
vacuum_cost_limit = 10000        # 1-10000 credits

#-----------------------------------------------------------------------
-------
# AUTOVACUUM PARAMETERS
#-----------------------------------------------------------------------
-------

#autovacuum = on            # Enable autovacuum subprocess?
'on'
                    # requires track_counts to also
be on.
#log_autovacuum_min_duration = -1    # -1 disables, 0 logs all
actions and
                    # their durations, > 0 logs only
                    # actions running at least this
number
                    # of milliseconds.
autovacuum_max_workers = 10        # max number of autovacuum
subprocesses
#autovacuum_naptime = 1min        # time between autovacuum runs
#autovacuum_vacuum_threshold = 50    # min number of row updates
before
                    # vacuum
#autovacuum_analyze_threshold = 50    # min number of row updates
before
                    # analyze
#autovacuum_vacuum_scale_factor = 0.2    # fraction of table size before
vacuum
#autovacuum_analyze_scale_factor = 0.1    # fraction of table size before
analyze
#autovacuum_freeze_max_age = 200000000    # maximum XID age before forced
vacuum
                    # (change requires restart)
#autovacuum_vacuum_cost_delay = 20ms    # default vacuum cost delay for
                    # autovacuum, in milliseconds;
                    # -1 means use vacuum_cost_delay
#autovacuum_vacuum_cost_limit = -1    # default vacuum cost limit for
                    # autovacuum, -1 means use
                    # vacuum_cost_limit


> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: Saturday, May 01, 2010 5:10 PM
> To: Benjamin Krajmalnik
> Cc: Kevin Grittner; pgsql-admin@postgresql.org
> Subject: Re: [ADMIN] Autovacuum stopped running
>
> "Benjamin Krajmalnik" <kraj@illumen.com> writes:
> > TO be truthful, I did not check that.
> > I arrived at the conclusion because although I have 10 autovacuum
> > threads, none of the tables had been autoanalyzed or autovacuumed in
> > hours (and due to the nature of the activity they do so every
> minute).
> > If I ever see this behavior now I know what to check for :) Thx.
>
> We've seen one or two previous reports that suggested that the autovac
> launcher might have stopped launching workers, but it hasn't been
> reproducible.  Are you using any nonstandard parameters for
autovacuum?
>
>             regards, tom lane

Re: Autovacuum stopped running

From
"Benjamin Krajmalnik"
Date:
OK.
This happened again on another server with these same settings.
I did not see any long running transaction, and the autovacuum launch
process was running.


> -----Original Message-----
> From: Benjamin Krajmalnik
> Sent: Saturday, May 01, 2010 5:22 PM
> To: 'Tom Lane'
> Cc: Kevin Grittner; pgsql-admin@postgresql.org
> Subject: RE: [ADMIN] Autovacuum stopped running
>
> # - Cost-Based Vacuum Delay -
>
> #vacuum_cost_delay = 0ms        # 0-100 milliseconds
> #vacuum_cost_page_hit = 1        # 0-10000 credits
> #vacuum_cost_page_miss = 10        # 0-10000 credits
> #vacuum_cost_page_dirty = 20        # 0-10000 credits
> vacuum_cost_limit = 10000        # 1-10000 credits
>
>
#----------------------------------------------------------------------
> --------
> # AUTOVACUUM PARAMETERS
>
#----------------------------------------------------------------------
> --------
>
> #autovacuum = on            # Enable autovacuum subprocess?
'on'
>                     # requires track_counts to also
be on.
> #log_autovacuum_min_duration = -1    # -1 disables, 0 logs all
actions
> and
>                     # their durations, > 0 logs only
>                     # actions running at least this
number
>                     # of milliseconds.
> autovacuum_max_workers = 10        # max number of autovacuum
> subprocesses
> #autovacuum_naptime = 1min        # time between autovacuum runs
> #autovacuum_vacuum_threshold = 50    # min number of row updates
before
>                     # vacuum
> #autovacuum_analyze_threshold = 50    # min number of row updates
before
>                     # analyze
> #autovacuum_vacuum_scale_factor = 0.2    # fraction of table size
> before vacuum
> #autovacuum_analyze_scale_factor = 0.1    # fraction of table size
> before analyze
> #autovacuum_freeze_max_age = 200000000    # maximum XID age before
> forced vacuum
>                     # (change requires restart)
> #autovacuum_vacuum_cost_delay = 20ms    # default vacuum cost delay
> for
>                     # autovacuum, in milliseconds;
>                     # -1 means use vacuum_cost_delay
> #autovacuum_vacuum_cost_limit = -1    # default vacuum cost limit for
>                     # autovacuum, -1 means use
>                     # vacuum_cost_limit
>
>
> > -----Original Message-----
> > From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> > Sent: Saturday, May 01, 2010 5:10 PM
> > To: Benjamin Krajmalnik
> > Cc: Kevin Grittner; pgsql-admin@postgresql.org
> > Subject: Re: [ADMIN] Autovacuum stopped running
> >
> > "Benjamin Krajmalnik" <kraj@illumen.com> writes:
> > > TO be truthful, I did not check that.
> > > I arrived at the conclusion because although I have 10 autovacuum
> > > threads, none of the tables had been autoanalyzed or autovacuumed
> in
> > > hours (and due to the nature of the activity they do so every
> > minute).
> > > If I ever see this behavior now I know what to check for :) Thx.
> >
> > We've seen one or two previous reports that suggested that the
> autovac
> > launcher might have stopped launching workers, but it hasn't been
> > reproducible.  Are you using any nonstandard parameters for
> autovacuum?
> >
> >             regards, tom lane

Re: Autovacuum stopped running

From
Alvaro Herrera
Date:
Benjamin Krajmalnik wrote:
> OK.
> This happened again on another server with these same settings.
> I did not see any long running transaction, and the autovacuum launch
> process was running.

Did you check the server log for FATAL or ERROR lines?  Maybe it's dying
trying to vacuum some table and isn't ever able to get to the next ones.


--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Autovacuum stopped running

From
"Kevin Grittner"
Date:
"Benjamin Krajmalnik" <kraj@illumen.com> wrote:

> This happened again on another server with these same settings.
> I did not see any long running transaction, and the autovacuum
> launch process was running.

Like last time, all ten autovacuum workers were running?  What does
CPU and disk activity look like in this state?  The contents of
pg_locks might yield a clue.  If you have all ten workers showing on
the task list, but they don't seem to be making much progress,
connecting to each with gdb (or equivalent) and getting a backtrace
might be useful.

-Kevin

Re: Autovacuum stopped running

From
Tom Lane
Date:
"Benjamin Krajmalnik" <kraj@illumen.com> writes:
> This happened again on another server with these same settings.
> I did not see any long running transaction, and the autovacuum launch
> process was running.

Hmm ... could you strace the launcher for a couple of minutes to see
whether it's doing anything?

Also, I've forgotten exactly which PG version you're running?

            regards, tom lane

Re: Autovacuum stopped running

From
"Kevin Grittner"
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Also, I've forgotten exactly which PG version you're running?

The original post said:

PostgreSQL 8.4.3 amd64 on FreeBSD 7.2.

-Kevin

Re: Autovacuum stopped running

From
"Benjamin Krajmalnik"
Date:
Tom,

1.  I had to restart the postgres server, otherwise I would have been in
deep problems.
2.  I just tried installing strace from the ports system - unfortunately
it appears that FreeBSD only has it for i38s, not amd64.  Any
alternatives so that if this happens again I can take care of this?

>
> Hmm ... could you strace the launcher for a couple of minutes to see
> whether it's doing anything?
>
> Also, I've forgotten exactly which PG version you're running?
>
>             regards, tom lane

Re: Autovacuum stopped running

From
"Benjamin Krajmalnik"
Date:
I did not find any errors pertaining to the autovacuum task.
In both cases we had a data-centric issue which was causing excessive
errors on the database (we have a timestamp field and an external
utility which we use to collect product keys was sending the date as
"dd/mm/yyyy hh:mm:ss" for some entries.   We modified our processing
script to catch this condition and rewrite it to "mm/dd/yyyy hh:mm:ss"
on the server script, and I will be modifying the agent to send it in as
"yyyymmdd hh:mm:ss" in our next agent release to bullet proof it.


> -----Original Message-----
> From: Alvaro Herrera [mailto:alvherre@commandprompt.com]
> Sent: Monday, May 03, 2010 7:48 AM
> To: Benjamin Krajmalnik
> Cc: Tom Lane; pgsql-admin@postgresql.org
> Subject: Re: [ADMIN] Autovacuum stopped running
>
> Benjamin Krajmalnik wrote:
> > OK.
> > This happened again on another server with these same settings.
> > I did not see any long running transaction, and the autovacuum
launch
> > process was running.
>
> Did you check the server log for FATAL or ERROR lines?  Maybe it's
> dying
> trying to vacuum some table and isn't ever able to get to the next
> ones.
>
>
> --
> Alvaro Herrera
> http://www.CommandPrompt.com/
> PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Autovacuum stopped running

From
Alvaro Herrera
Date:
Benjamin Krajmalnik wrote:
> I did not find any errors pertaining to the autovacuum task.

Hmm.  I think it would be good to find out what is the launcher doing,
if anything.  If you strace it, do you see it sending signals to
postmaster?

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.