Thread: autovacuum does not start in HEAD

autovacuum does not start in HEAD

From
ITAGAKI Takahiro
Date:
I found that autovacuum launcher does not launch any workers in HEAD.

AFAICS, we track the time to be vaccumed of each database in the following way:

1. In rebuild_database_list(), we initialize avl_dbase->adl_next_worker  with (current_time + autovacuum_naptime /
nDBs).
2. In do_start_worker(), we skip database entries that adl_next_worker  is between current_time and current_time +
autovacuum_naptime.
3. If there is no jobs in do_start_worker(), we call rebuild_database_list()  to rebuild database entries.

The point is we use the same range (current_time and current_time +
autovacuum_naptime) at 1 and 2. We set adl_next_worker with values in the
range, and drop all of them at 2 because their values are in the range.
And if there is no database to vacuum, we re-initilaize database list at 3,
then we repeat the cycle.

Or am I missing something?

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



Re: autovacuum does not start in HEAD

From
Alvaro Herrera
Date:
ITAGAKI Takahiro wrote:
> I found that autovacuum launcher does not launch any workers in HEAD.
> 
> AFAICS, we track the time to be vaccumed of each database in the following way:
> 
> 1. In rebuild_database_list(), we initialize avl_dbase->adl_next_worker
>    with (current_time + autovacuum_naptime / nDBs).
> 2. In do_start_worker(), we skip database entries that adl_next_worker
>    is between current_time and current_time + autovacuum_naptime.
> 3. If there is no jobs in do_start_worker(), we call rebuild_database_list()
>    to rebuild database entries.
> 
> The point is we use the same range (current_time and current_time +
> autovacuum_naptime) at 1 and 2. We set adl_next_worker with values in the
> range, and drop all of them at 2 because their values are in the range.
> And if there is no database to vacuum, we re-initilaize database list at 3,
> then we repeat the cycle.
> 
> Or am I missing something?

Note that rebuild_database_list skips databases that don't have stat
entries.  Maybe that's what confusing your examination.  When the list
is empty, worker are launched only every naptime seconds; and then it'll
also pick only databases with stat entries.  All other databases will be
skipped until the max_freeze_age is reached.  Right after an initdb or a
WAL replay, all database stats are deleted.

The point of (1) is to spread the starting of workers in the
autovacuum_naptime interval.

The point of (2) is that we don't want to process a database that was
processed too recently (less than autovacuum_naptime seconds ago).  This
is useful in the cases where databases are dropped, so the launcher is
awakened earlier than what the schedule would say if the dropped
database were not in the list.  It is possible that I confused the
arithmetic in there (because TimestampDifference does not return
negative results so there may be strange corner cases), but the last
time I examined it it was correct.

The point of (3) is to cover the case where there were no databases
being previously autovacuumed and that may now need vacuuming (i.e. just
after a database got its stat entry).

The fact that some databases may not have stat entries tends to confuse
the logic, both in rebuild_database_list and do_start_worker.  If it's
not documented enough maybe it needs extra clarification in code
comments.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: autovacuum does not start in HEAD

From
ITAGAKI Takahiro
Date:
I wrote:
> I found that autovacuum launcher does not launch any workers in HEAD.

The attached autovacuum-fix.patch could fix the problem. I changed
to use 'greater or equal' instead of 'greater' at the decision of
next autovacuum target.

The point was in the resolution of timer; There is a platform that timer
has only a resolution of milliseconds. We initialize adl_next_worker with
current_time in rebuild_database_list(), but we could use again the same
value in do_start_worker(), because there is no measurable difference
in those low-resolution-platforms.


Another attached patch, autovacuum-debug.patch, is just for printf-debug.
I got the following logs without fix -- autovacuum never works.

# SELECT oid, datname FROM pg_database ORDER BY oid;
  oid  |  datname
-------+-----------
     1 | template1
 11494 | template0
 11495 | postgres
 16384 | bench
(4 rows)

# pgbench bench -s1 -c1 -t100000
[with configurations of autovacuum_naptime = 10s and log_min_messages = debug1]

LOG:  do_start_worker skip : 230863399.250000, 230863399.250000, 230863409.250000
LOG:  rebuild_database_list: db=11495, time=230863404.250000
LOG:  rebuild_database_list: db=16384, time=230863409.250000
DEBUG:  autovacuum: processing database "bench"
LOG:  do_start_worker skip : 230863404.250000, 230863404.250000, 230863414.250000
LOG:  do_start_worker skip : 230863404.250000, 230863409.250000, 230863414.250000
LOG:  rebuild_database_list: db=11495, time=230863409.250000
LOG:  rebuild_database_list: db=16384, time=230863414.250000
LOG:  do_start_worker skip : 230863409.250000, 230863409.250000, 230863419.250000
LOG:  do_start_worker skip : 230863409.250000, 230863414.250000, 230863419.250000
LOG:  rebuild_database_list: db=11495, time=230863414.250000
LOG:  rebuild_database_list: db=16384, time=230863419.250000
...
(no autovacuum activities forever)

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


Attachment

Re: [PATCHES] autovacuum does not start in HEAD

From
Bruce Momjian
Date:
Your patch has been added to the PostgreSQL unapplied patches list at:

    http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---------------------------------------------------------------------------


ITAGAKI Takahiro wrote:
> I wrote:
> > I found that autovacuum launcher does not launch any workers in HEAD.
>
> The attached autovacuum-fix.patch could fix the problem. I changed
> to use 'greater or equal' instead of 'greater' at the decision of
> next autovacuum target.
>
> The point was in the resolution of timer; There is a platform that timer
> has only a resolution of milliseconds. We initialize adl_next_worker with
> current_time in rebuild_database_list(), but we could use again the same
> value in do_start_worker(), because there is no measurable difference
> in those low-resolution-platforms.
>
>
> Another attached patch, autovacuum-debug.patch, is just for printf-debug.
> I got the following logs without fix -- autovacuum never works.
>
> # SELECT oid, datname FROM pg_database ORDER BY oid;
>   oid  |  datname
> -------+-----------
>      1 | template1
>  11494 | template0
>  11495 | postgres
>  16384 | bench
> (4 rows)
>
> # pgbench bench -s1 -c1 -t100000
> [with configurations of autovacuum_naptime = 10s and log_min_messages = debug1]
>
> LOG:  do_start_worker skip : 230863399.250000, 230863399.250000, 230863409.250000
> LOG:  rebuild_database_list: db=11495, time=230863404.250000
> LOG:  rebuild_database_list: db=16384, time=230863409.250000
> DEBUG:  autovacuum: processing database "bench"
> LOG:  do_start_worker skip : 230863404.250000, 230863404.250000, 230863414.250000
> LOG:  do_start_worker skip : 230863404.250000, 230863409.250000, 230863414.250000
> LOG:  rebuild_database_list: db=11495, time=230863409.250000
> LOG:  rebuild_database_list: db=16384, time=230863414.250000
> LOG:  do_start_worker skip : 230863409.250000, 230863409.250000, 230863419.250000
> LOG:  do_start_worker skip : 230863409.250000, 230863414.250000, 230863419.250000
> LOG:  rebuild_database_list: db=11495, time=230863414.250000
> LOG:  rebuild_database_list: db=16384, time=230863419.250000
> ...
> (no autovacuum activities forever)
>
> Regards,
> ---
> ITAGAKI Takahiro
> NTT Open Source Software Center
>

[ Attachment, skipping... ]

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings

--
  Bruce Momjian  <bruce@momjian.us>          http://momjian.us
  EnterpriseDB                               http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: autovacuum does not start in HEAD

From
Alvaro Herrera
Date:
ITAGAKI Takahiro wrote:
> I wrote:
> > I found that autovacuum launcher does not launch any workers in HEAD.
>
> The attached autovacuum-fix.patch could fix the problem. I changed
> to use 'greater or equal' instead of 'greater' at the decision of
> next autovacuum target.

I developed a different fix, which is possible due to the addition of
TimestampDifferenceExceeds to the TimestampTz API.  (Thanks Tom).

It continues to work for me here, but please confirm that it fixes the
bug you reported -- I don't have a low-resolution platform handy.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Attachment

Re: autovacuum does not start in HEAD

From
Alvaro Herrera
Date:
ITAGAKI Takahiro wrote:
> I wrote:
> > I found that autovacuum launcher does not launch any workers in HEAD.
>
> The attached autovacuum-fix.patch could fix the problem. I changed
> to use 'greater or equal' instead of 'greater' at the decision of
> next autovacuum target.

I have committed a patch which might fix this issue in autovacuum.c rev 1.44.
Please retest.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: autovacuum does not start in HEAD

From
ITAGAKI Takahiro
Date:
Alvaro Herrera <alvherre@commandprompt.com> wrote:

> ITAGAKI Takahiro wrote:
> > > I found that autovacuum launcher does not launch any workers in HEAD.
> > 
> > The attached autovacuum-fix.patch could fix the problem. I changed
> > to use 'greater or equal' instead of 'greater' at the decision of
> > next autovacuum target.
> 
> I have committed a patch which might fix this issue in autovacuum.c rev 1.44.
> Please retest.

HEAD (r1.45) is still broken. We skip entries using the test adl_next_worker - autovacuum_naptime < current_time <=
adl_next_worker,
but the second inequation should be adl_next_worker - autovacuum_naptime < current_time < adl_next_worker,
because adl_next_worker can equal current_time.

@@ -1036,8 +1036,8 @@                 * Skip this database if its next_worker value falls between                 * the
currenttime and the current time plus naptime.                 */
 
-                if (TimestampDifferenceExceeds(current_time,
-                                               dbp->adl_next_worker, 0) &&
+                if (!TimestampDifferenceExceeds(dbp->adl_next_worker,
+                                                current_time, 0) &&
!TimestampDifferenceExceeds(current_time,                                               dbp->adl_next_worker,
                                    autovacuum_naptime * 1000))
 

By the way, why do we need the upper bounds to decide a next target?
Can we use simplify it to "current_time < adl_next_worker"?

@@ -1033,16 +1033,11 @@            if (dbp->adl_datid == tmp->adw_datid)            {                /*
-                 * Skip this database if its next_worker value falls between
-                 * the current time and the current time plus naptime.
+                 * Skip this database if its next_worker value is later than
+                 * the current time.                 */
-                if (TimestampDifferenceExceeds(current_time,
-                                               dbp->adl_next_worker, 0) &&
-                    !TimestampDifferenceExceeds(current_time,
-                                                dbp->adl_next_worker,
-                                                autovacuum_naptime * 1000))
-                    skipit = true;
-
+                skipit = !TimestampDifferenceExceeds(dbp->adl_next_worker,
+                                                     current_time, 0);                break;            }
elem= DLGetPred(elem);
 

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center




Re: autovacuum does not start in HEAD

From
Alvaro Herrera
Date:
ITAGAKI Takahiro wrote:
> Alvaro Herrera <alvherre@commandprompt.com> wrote:
> 
> > ITAGAKI Takahiro wrote:
> > > > I found that autovacuum launcher does not launch any workers in HEAD.
> > > 
> > > The attached autovacuum-fix.patch could fix the problem. I changed
> > > to use 'greater or equal' instead of 'greater' at the decision of
> > > next autovacuum target.
> > 
> > I have committed a patch which might fix this issue in autovacuum.c rev 1.44.
> > Please retest.
> 
> HEAD (r1.45) is still broken. We skip entries using the test
>   adl_next_worker - autovacuum_naptime < current_time <= adl_next_worker,
> but the second inequation should be
>   adl_next_worker - autovacuum_naptime < current_time < adl_next_worker,
> because adl_next_worker can equal current_time.

Ok, I'll change this.

> By the way, why do we need the upper bounds to decide a next target?
> Can we use simplify it to "current_time < adl_next_worker"?

No, we can't take that check out, because otherwise a database could be
skipped forever if it happens to fall behind for some reason (for
example when a new database is created and autovac decides to work on
that one instead of the one that was scheduled).

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support