Thread: BUG #15067: Documentation or behaviour bug with autovacuum thresholds?

BUG #15067: Documentation or behaviour bug with autovacuum thresholds?

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      15067
Logged by:          Greg Clough
Email address:      greg.clough@ipreo.com
PostgreSQL version: 9.6.7
Operating system:   CentOS v7.4
Description:

I think there is a documentation bug in the Autovacuum section:


https://www.postgresql.org/docs/9.6/static/routine-vacuuming.html#AUTOVACUUM


The formula for triggering an autovacuum is listed as:

        vacuum threshold = vacuum base threshold + vacuum scale factor *
number of tuples


But in reality I think it needs “+ 1” in there:

        vacuum threshold = vacuum base threshold + 1 + vacuum scale factor *
number of tuples

Maybe it has been left out for simplicity, or if not then there’s either a
documentation or implementation bug.  I think it has been deliberately
included in the code for safety, so that if both tuning parameters are set
to 0 then it won’t repeatedly vacuum tables with zero changes.


To test this on PostgreSQL v9.6.7 I turned on autovacuum logging, and left
the other parameters set to the defaults:

        postgres=# SELECT version();
                                                         version
                                     

----------------------------------------------------------------------------------------------------------
         PostgreSQL 9.6.7 on x86_64-pc-linux-gnu, compiled by gcc (GCC)
4.8.5 20150623 (Red Hat 4.8.5-16), 64-bit
         (1 row)
        
        postgres=# SHOW log_autovacuum_min_duration;
         log_autovacuum_min_duration 
        -----------------------------
         0
         (1 row)
        
        postgres=# SHOW autovacuum_vacuum_scale_factor;
         autovacuum_vacuum_scale_factor 
        --------------------------------
         0.2
         (1 row)
        
        postgres=# SHOW autovacuum_vacuum_threshold;
         autovacuum_vacuum_threshold 
        -----------------------------
         50
         (1 row)


I then created a table with some dummy data, and set
autovacuum_vacuum_scale_factor = 0, and autovacuum_vacuum_threshold = 1.

        postgres=# CREATE TABLE autovacuum_threshold_test(id int);
        CREATE TABLE
        postgres=# INSERT INTO autovacuum_threshold_test(id) (SELECT
generate_series(1,1000));
        INSERT 0 1000
        postgres=# ALTER TABLE autovacuum_threshold_test SET
(autovacuum_vacuum_scale_factor = 0);
        ALTER TABLE
        postgres=# ALTER TABLE autovacuum_threshold_test SET
(autovacuum_vacuum_threshold = 1);
        ALTER TABLE
        postgres=# \d+ autovacuum_threshold_test;
                      Table "public.autovacuum_threshold_test"
         Column |  Type   | Modifiers | Storage | Stats target | Description


--------+---------+-----------+---------+--------------+-------------
         id     | integer |           | plain   |              | 
        Options: autovacuum_vacuum_threshold=1,
autovacuum_vacuum_scale_factor=0


Issuing a single row update does not trigger an autovacuum:

        postgres=# UPDATE autovacuum_threshold_test SET id = id WHERE id <=
1;
        UPDATE 1


Whereas issuing a 2 row update does:

        postgres=# UPDATE autovacuum_threshold_test SET id = id WHERE id <=
2;
        UPDATE 2
        
        < 2018-02-14 14:16:07.531 GMT > LOG:  automatic vacuum of table
"postgres.public.autovacuum_threshold_test": index scans: 0
            pages: 0 removed, 5 remain, 0 skipped due to pins, 0 skipped
frozen
            tuples: 2 removed, 1000 remain, 0 are dead but not yet removable
            buffer usage: 33 hits, 2 misses, 8 dirtied
            avg read rate: 15.038 MB/s, avg write rate: 60.154 MB/s
            system usage: CPU 0.00s/0.00u sec elapsed 0.00 sec


This behaviour stays constant, even if autovacuum_vacuum_threshold is set to
0, or larger numbers like 500, 1000, etc.  It always adds 1:

        postgres=# ALTER TABLE autovacuum_threshold_test SET
(autovacuum_vacuum_threshold = 500);
        ALTER TABLE
        postgres=# \d+ autovacuum_threshold_test;
                      Table "public.autovacuum_threshold_test"
         Column |  Type   | Modifiers | Storage | Stats target | Description


--------+---------+-----------+---------+--------------+-------------
         id     | integer |           | plain   |              | 
        Options: autovacuum_vacuum_scale_factor=0,
autovacuum_vacuum_threshold=500
        
        postgres=# UPDATE autovacuum_threshold_test SET id = id WHERE id <=
500;
        UPDATE 500
        postgres=# UPDATE autovacuum_threshold_test SET id = id WHERE id <=
501;
        UPDATE 501
        
        < 2018-02-14 14:20:07.583 GMT > LOG:  automatic vacuum of table
"postgres.public.autovacuum_threshold_test": index scans: 0
            pages: 0 removed, 9 remain, 0 skipped due to pins, 0 skipped
frozen
            tuples: 501 removed, 1000 remain, 0 are dead but not yet
removable
            buffer usage: 43 hits, 0 misses, 5 dirtied
            avg read rate: 0.000 MB/s, avg write rate: 15.575 MB/s
            system usage: CPU 0.00s/0.00u sec elapsed 0.00 sec


Possibly there’s something more unusual happening, because if you run an
update at the *exact* autovacuum_vacuum_threshold, and then shortly after
run the same number of updates again, the autovacuum log only shows that one
set of tuples removed.  Shouldn’t it be A + B, as two updates have run?  (so
“tuples: 1000 removed” instead of the “tuples: 500 removed” shown below)

        postgres=# VACUUM FULL autovacuum_threshold_test;
        VACUUM
        postgres=# UPDATE autovacuum_threshold_test SET id = id WHERE id <=
500;
        UPDATE 500
        postgres=# SELECT pg_sleep(30);
         pg_sleep 
        ----------
         
         (1 row)
        
        postgres=# UPDATE autovacuum_threshold_test SET id = id WHERE id
BETWEEN 501 and 1000;
        UPDATE 500
        
        < 2018-02-14 14:30:07.690 GMT > LOG:  automatic vacuum of table
"postgres.public.autovacuum_threshold_test": index scans: 0
            pages: 0 removed, 9 remain, 0 skipped due to pins, 0 skipped
frozen
            tuples: 500 removed, 1000 remain, 0 are dead but not yet
removable
            buffer usage: 41 hits, 2 misses, 4 dirtied
            avg read rate: 5.564 MB/s, avg write rate: 11.129 MB/s
            system usage: CPU 0.00s/0.00u sec elapsed 0.00 sec

Cheers,
Greg Clough
Senior Technology Engineer

Ipreo
Castle House | 37-35 Paul St | London EC2A 4LS


RE: BUG #15067: Documentation or behaviour bug with autovacuumthresholds?

From
Greg Clough
Date:
> Possibly there’s something more unusual happening, because if you run an update at the
> *exact* autovacuum_vacuum_threshold, and then shortly after run the same number of
> updates again, the autovacuum log only shows that one set of tuples removed.  Shouldn’t
> it be A + B, as two updates have run?  (so “tuples: 1000 removed” instead of the
> “tuples: 500 removed” shown below)

Even stranger, is that if you do an update of exactly the threshold, wait, and then do a *SINGLE* row update:

postgres=# VACUUM FULL autovacuum_threshold_test;
VACUUM
postgres=# UPDATE autovacuum_threshold_test SET id = id WHERE id <= 500;
UPDATE 500
postgres=# SELECT pg_sleep(30);
 pg_sleep 
----------
 
(1 row)

postgres=# UPDATE autovacuum_threshold_test SET id = id WHERE id = 501;
UPDATE 1


The resulting autovacuum ran but only removed 1 row, even though the autovacuum_vaccum_threshold is set at 500:

< 2018-02-14 15:20:08.512 GMT > LOG:  automatic vacuum of table "postgres.public.autovacuum_threshold_test": index
scans:0
 
    pages: 0 removed, 7 remain, 0 skipped due to pins, 0 skipped frozen
    tuples: 1 removed, 1000 remain, 0 are dead but not yet removable
    buffer usage: 37 hits, 2 misses, 4 dirtied
    avg read rate: 18.426 MB/s, avg write rate: 36.851 MB/s
    system usage: CPU 0.00s/0.00u sec elapsed 0.00 sec

Greg Clough
Senior Technology Engineer

Ipreo
Castle House | 37-35 Paul St | London EC2A 4LS

RE: BUG #15067: Documentation or behaviour bug with autovacuumthresholds?

From
Greg Clough
Date:
> Even stranger, is that if you do an update of exactly the threshold, wait, and then do a *SINGLE* row update:

I realise this isn't a top priority, but it does seem to allow a violation of the "autovacuum_vacuum_threshold"
parameter. In practice it doesn't seem to be serious, but finding the issue would reduce autovacuuming of tables that
makeexactly (rows * autovacuum_vacuum_threshold  + autovacuum_vacuum_threshold) changes.
 

I presume it's that simple, but I worry it has wider implications.

Cheers,
Greg.

Re: BUG #15067: Documentation or behaviour bug with autovacuum thresholds?

From
Euler Taveira
Date:
2018-02-14 11:44 GMT-03:00 PG Bug reporting form <noreply@postgresql.org>:
> I think there is a documentation bug in the Autovacuum section:
>
No, there isn't.

> https://www.postgresql.org/docs/9.6/static/routine-vacuuming.html#AUTOVACUUM
>
If you read the documentation again you will note the following setence:

... since the last VACUUM exceeds the "vacuum threshold", the table is vacuumed.

"exceed" means "greater than". Hence, documentation is accurate.


-- 
   Euler Taveira                                   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


RE: BUG #15067: Documentation or behaviour bug with autovacuumthresholds?

From
Greg Clough
Date:
> > I think there is a documentation bug in the Autovacuum section:
> No, there isn't.
> "exceed" means "greater than". Hence, documentation is accurate.
> > I think there is a documentation bug in the Autovacuum section:
> No, there isn't.
> "exceed" means "greater than". Hence, documentation is accurate.

Hi Euler,

I had considered that, but if you do *exactly* the same number of updates as the threshold... wait... and then do a
singleupdate, it vacuums a single row, not threshold+1.  In my example:
 

1) autovacuum_vacuum_scale_factor = 0
2) autovacuum_vacuum_threshold = 500
3) Update 500 rows of a 1,000 row table
4) Wait 30 seconds
5) Update 1 row
6) NOTE:  "tuples: 1 removed" in the postgres.log, not "tuples: 501 removed".

< 2018-02-14 15:20:08.512 GMT > LOG:  automatic vacuum of table "postgres.public.autovacuum_threshold_test": index
scans:0
 
    pages: 0 removed, 7 remain, 0 skipped due to pins, 0 skipped frozen
    tuples: 1 removed, 1000 remain, 0 are dead but not yet removable


There are two potential issues with that:

1) The autovacuum_vacuum_threshold is 500, so it should never vacuum anything less... especially not a single tuple.
   (Is that a bug in autovacuum, triggering on less than the threshold?)

2) Nowhere does it ever mention "tuples: 500 removed", so what happened to the originally updated 500 tuples?
   - Did they get vacuumed silently?
     (Is that a bug in autovacuum, where it doesn't report correctly?)
   - Did they skip being vacuumed?
     (Are these rows being lost forever, adding to bloat?)


In short, what happened to the initial autovacuum_vacuum_threshold (500) updates, and why did autovacuum run on a
singlerow?
 

Cheers,
Greg.

On 3/1/18 5:13 AM, Greg Clough wrote:
> 1) autovacuum_vacuum_scale_factor = 0
> 2) autovacuum_vacuum_threshold = 500
> 3) Update 500 rows of a 1,000 row table
> 4) Wait 30 seconds
> 5) Update 1 row
> 6) NOTE:  "tuples: 1 removed" in the postgres.log, not "tuples: 501 removed".
> 
> < 2018-02-14 15:20:08.512 GMT > LOG:  automatic vacuum of table "postgres.public.autovacuum_threshold_test": index
scans:0
 
>     pages: 0 removed, 7 remain, 0 skipped due to pins, 0 skipped frozen
>     tuples: 1 removed, 1000 remain, 0 are dead but not yet removable

That indicates that the vacuum ran when there were only 1000 rows in the 
table, 1 of which had been updated. Obviously that doesn't make sense 
given your example, so I'm wondering if you did a single-row update some 
time before step 3. Note that after step 3 vacuum should report 1500 
tuples, not 1000 (pretty sure that's true regardless of the tuples being 
HOT or not).

I suggest posting a complete test script demonstrating the behavior 
you're seeing.
-- 
Jim Nasby, Chief Data Architect, Austin TX
OpenSCG                 http://OpenSCG.com


RE: BUG #15067: Documentation or behaviour bug with autovacuumthresholds?

From
Greg Clough
Date:
> Obviously that doesn't make sense given your example, so I'm wondering if you did a single-row update some time
beforestep 3.
 

Hi Jim.  No, there were no other updates.  I'm glad you agree this is strange.


> I suggest posting a complete test script demonstrating the behavior you're seeing.

Script attached below.  I think it shows up some definite weirdness in the way autovacuum is running and reporting into
thepostgres.log.  It should work on any basic database (I used v9.6.8 on CentOS v7.4), but ensure "autovacuum_naptime =
5s"so that the short sleeps actually allow autovacuum to run.
 

I ran this script in a terminal with psql, and had the postgres.log tailing in another.  I apologise if there are a lot
ofcombinations, but I'm trying to cover all angles to show it's not a simple corner case.  I have also tried it with
autovacuum_vacuum_scale_factor> 0, and the behaviour is the same... albeit with the autovacuum trigger point increased
by2000 * autovacuum_vacuum_scale_factor.
 

In short, it seems that it will only report the correct "tuples: XX removed" into the postgres.log if cumulative
updatesare done within a single transaction.
 

Cheers,
Greg Clough
Senior Technology Engineer
Ipreo
Castle House | 37-35 Paul St | London EC2A 4LS


--
-- Turn on logging for all autovacuum, and increase autovacuum frequency
--
ALTER SYSTEM SET log_autovacuum_min_duration = 0;

--
-- Capture some system information
--
SELECT version();
SHOW log_autovacuum_min_duration;
SHOW autovacuum_naptime;  -- MUST BE SET TO A LOW VALUE, LIKE "5s"

--
-- Prepare a test table with some data
--
DROP TABLE IF EXISTS autovacuum_threshold_test;
CREATE TABLE autovacuum_threshold_test(id int);
INSERT INTO autovacuum_threshold_test(id) (SELECT generate_series(1,2000));

--
-- Set the autovacuum for the test table
--
ALTER TABLE autovacuum_threshold_test SET (autovacuum_vacuum_scale_factor = 0);
ALTER TABLE autovacuum_threshold_test SET (autovacuum_vacuum_threshold = 500);
\d+ autovacuum_threshold_test;

\echo ============================================================
\echo Trigger autovacuum, should vacuum 500 rows...
\echo ... but it does NOT, then it vacuums "1" row, which is < the threshold
\echo ------------------------------------------------------------
UPDATE autovacuum_threshold_test SET id = id WHERE id <= 500;
SELECT pg_sleep(30);
UPDATE autovacuum_threshold_test SET id = id WHERE id <= 1;
\echo ============================================================

\echo
\echo Wait between tests...
\echo
SELECT pg_sleep(30);
VACUUM FULL autovacuum_threshold_test;

\echo ============================================================
\echo Trigger autovacuum, should vacuum 500 rows...
\echo ... but it does NOT, and without the delay NEITHER are reported
\echo ------------------------------------------------------------
UPDATE autovacuum_threshold_test SET id = id WHERE id <= 500;
UPDATE autovacuum_threshold_test SET id = id WHERE id <= 1;
\echo ============================================================

\echo
\echo Wait between tests...
\echo
SELECT pg_sleep(30);
VACUUM FULL autovacuum_threshold_test;

\echo ============================================================
\echo Trigger autovacuum, should vacuum 499 + 2 = 501 rows...
\echo ... but only the "2" row vacuum is reported, either with or without a sleep
\echo ------------------------------------------------------------
UPDATE autovacuum_threshold_test SET id = id WHERE id <= 499;
-- SELECT pg_sleep(30);
UPDATE autovacuum_threshold_test SET id = id WHERE id BETWEEN 500 and 501;
\echo ============================================================

\echo
\echo Wait between tests...
\echo
SELECT pg_sleep(30);
VACUUM FULL autovacuum_threshold_test;

\echo ============================================================
\echo Trigger autovacuum, should vacuum 200 + 201 + 202 = 603 rows...
\echo ... but it does NOT, it only reports the final 202 row vacuum, either with or without a sleep
\echo ------------------------------------------------------------
UPDATE autovacuum_threshold_test SET id = id WHERE id BETWEEN   1 and 200;
-- SELECT pg_sleep(30);
UPDATE autovacuum_threshold_test SET id = id WHERE id BETWEEN 201 and 401;
-- SELECT pg_sleep(30);
UPDATE autovacuum_threshold_test SET id = id WHERE id BETWEEN 402 and 603;
\echo ============================================================

\echo
\echo Wait between tests...
\echo
SELECT pg_sleep(30);
VACUUM FULL autovacuum_threshold_test;

\echo ============================================================
\echo Trigger autovacuum, should vacuum 500 rows... then do a separate vacuum of 501 rows
\echo ... but it does NOT, it missed the 500 rows entirely, and only reports the 501 row vacuum, either with or without
asleep
 
\echo ------------------------------------------------------------
UPDATE autovacuum_threshold_test SET id = id WHERE id <= 500;
-- SELECT pg_sleep(30);
UPDATE autovacuum_threshold_test SET id = id WHERE id BETWEEN 501 and 1001;
\echo ============================================================

\echo
\echo Wait between tests...
\echo
SELECT pg_sleep(30);
VACUUM FULL autovacuum_threshold_test;

\echo ============================================================
\echo Trigger autovacuum, should vacuum 500 + 501 = 1001 rows...
\echo ... and it DOES, because it is in a single transaction
\echo ------------------------------------------------------------
BEGIN;
UPDATE autovacuum_threshold_test SET id = id WHERE id <= 500;
UPDATE autovacuum_threshold_test SET id = id WHERE id BETWEEN 501 and 1001;
COMMIT;
\echo ============================================================

\echo TESTS FINISHED
--
-- END OF FILE
--

RE: BUG #15067: Documentation or behaviour bug with autovacuumthresholds?

From
Greg Clough
Date:
> > Obviously that doesn't make sense given your example, so I'm wondering if you did a single-row update some time
beforestep 3.
 
>
> Hi Jim.  No, there were no other updates.  I'm glad you agree this is strange.

Using my test case in the last email, can anyone explain why this is actually working "as designed"?  It's probably not
abig issue, but it's a corner case that's still bugging me.
 

Cheers,
Greg.