Re: Auto-vacuum is not running in 9.1.12 - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Auto-vacuum is not running in 9.1.12
Date
Msg-id 2343.1434559562@sss.pgh.pa.us
Whole thread Raw
In response to Re: Auto-vacuum is not running in 9.1.12  (Haribabu Kommi <kommi.haribabu@gmail.com>)
Responses Re: Auto-vacuum is not running in 9.1.12  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-hackers
Haribabu Kommi <kommi.haribabu@gmail.com> writes:
> I can think of a case where the "launcher_determine_sleep" function
> returns a big sleep value because of system time change.
> Because of that it is possible that the launcher is not generating
> workers to do the vacuum. May be I am wrong.

I talked with Alvaro about this and we agreed that's most likely what
happened.  The launcher tracks future times-to-wake-up as absolute times,
so shortly after the system clock went backwards, it could have computed
that the next time to wake up was 20 years in the future, and issued a
sleep() call for 20 years.  Fixing the system clock after that would not
have caused it to wake up again.

It looks like a SIGHUP (pg_ctl reload) ought to be enough to wake it up,
or of course you could restart the server.

In HEAD this doesn't seem like it could cause an indefinite sleep because
if nothing else, sinval queue overrun would eventually wake the launcher
even without any manual action from the DBA.  But the loop logic is
different in 9.1.

launcher_determine_sleep() does have a minimum sleep time, and it seems
like we could fairly cheaply guard against this kind of scenario by also
enforcing a maximum sleep time (of say 5 or 10 minutes).  Not quite
convinced whether it's worth the trouble though.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Auto-vacuum is not running in 9.1.12
Next
From: Gurjeet Singh
Date:
Subject: Re: [PATCH] Function to get size of asynchronous notification queue