Thread: BUG #15067: Documentation or behaviour bug with autovacuum thresholds?
BUG #15067: Documentation or behaviour bug with autovacuum thresholds?
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 15067 Logged by: Greg Clough Email address: greg.clough@ipreo.com PostgreSQL version: 9.6.7 Operating system: CentOS v7.4 Description: I think there is a documentation bug in the Autovacuum section: https://www.postgresql.org/docs/9.6/static/routine-vacuuming.html#AUTOVACUUM The formula for triggering an autovacuum is listed as: vacuum threshold = vacuum base threshold + vacuum scale factor * number of tuples But in reality I think it needs “+ 1” in there: vacuum threshold = vacuum base threshold + 1 + vacuum scale factor * number of tuples Maybe it has been left out for simplicity, or if not then there’s either a documentation or implementation bug. I think it has been deliberately included in the code for safety, so that if both tuning parameters are set to 0 then it won’t repeatedly vacuum tables with zero changes. To test this on PostgreSQL v9.6.7 I turned on autovacuum logging, and left the other parameters set to the defaults: postgres=# SELECT version(); version ---------------------------------------------------------------------------------------------------------- PostgreSQL 9.6.7 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16), 64-bit (1 row) postgres=# SHOW log_autovacuum_min_duration; log_autovacuum_min_duration ----------------------------- 0 (1 row) postgres=# SHOW autovacuum_vacuum_scale_factor; autovacuum_vacuum_scale_factor -------------------------------- 0.2 (1 row) postgres=# SHOW autovacuum_vacuum_threshold; autovacuum_vacuum_threshold ----------------------------- 50 (1 row) I then created a table with some dummy data, and set autovacuum_vacuum_scale_factor = 0, and autovacuum_vacuum_threshold = 1. postgres=# CREATE TABLE autovacuum_threshold_test(id int); CREATE TABLE postgres=# INSERT INTO autovacuum_threshold_test(id) (SELECT generate_series(1,1000)); INSERT 0 1000 postgres=# ALTER TABLE autovacuum_threshold_test SET (autovacuum_vacuum_scale_factor = 0); ALTER TABLE postgres=# ALTER TABLE autovacuum_threshold_test SET (autovacuum_vacuum_threshold = 1); ALTER TABLE postgres=# \d+ autovacuum_threshold_test; Table "public.autovacuum_threshold_test" Column | Type | Modifiers | Storage | Stats target | Description --------+---------+-----------+---------+--------------+------------- id | integer | | plain | | Options: autovacuum_vacuum_threshold=1, autovacuum_vacuum_scale_factor=0 Issuing a single row update does not trigger an autovacuum: postgres=# UPDATE autovacuum_threshold_test SET id = id WHERE id <= 1; UPDATE 1 Whereas issuing a 2 row update does: postgres=# UPDATE autovacuum_threshold_test SET id = id WHERE id <= 2; UPDATE 2 < 2018-02-14 14:16:07.531 GMT > LOG: automatic vacuum of table "postgres.public.autovacuum_threshold_test": index scans: 0 pages: 0 removed, 5 remain, 0 skipped due to pins, 0 skipped frozen tuples: 2 removed, 1000 remain, 0 are dead but not yet removable buffer usage: 33 hits, 2 misses, 8 dirtied avg read rate: 15.038 MB/s, avg write rate: 60.154 MB/s system usage: CPU 0.00s/0.00u sec elapsed 0.00 sec This behaviour stays constant, even if autovacuum_vacuum_threshold is set to 0, or larger numbers like 500, 1000, etc. It always adds 1: postgres=# ALTER TABLE autovacuum_threshold_test SET (autovacuum_vacuum_threshold = 500); ALTER TABLE postgres=# \d+ autovacuum_threshold_test; Table "public.autovacuum_threshold_test" Column | Type | Modifiers | Storage | Stats target | Description --------+---------+-----------+---------+--------------+------------- id | integer | | plain | | Options: autovacuum_vacuum_scale_factor=0, autovacuum_vacuum_threshold=500 postgres=# UPDATE autovacuum_threshold_test SET id = id WHERE id <= 500; UPDATE 500 postgres=# UPDATE autovacuum_threshold_test SET id = id WHERE id <= 501; UPDATE 501 < 2018-02-14 14:20:07.583 GMT > LOG: automatic vacuum of table "postgres.public.autovacuum_threshold_test": index scans: 0 pages: 0 removed, 9 remain, 0 skipped due to pins, 0 skipped frozen tuples: 501 removed, 1000 remain, 0 are dead but not yet removable buffer usage: 43 hits, 0 misses, 5 dirtied avg read rate: 0.000 MB/s, avg write rate: 15.575 MB/s system usage: CPU 0.00s/0.00u sec elapsed 0.00 sec Possibly there’s something more unusual happening, because if you run an update at the *exact* autovacuum_vacuum_threshold, and then shortly after run the same number of updates again, the autovacuum log only shows that one set of tuples removed. Shouldn’t it be A + B, as two updates have run? (so “tuples: 1000 removed” instead of the “tuples: 500 removed” shown below) postgres=# VACUUM FULL autovacuum_threshold_test; VACUUM postgres=# UPDATE autovacuum_threshold_test SET id = id WHERE id <= 500; UPDATE 500 postgres=# SELECT pg_sleep(30); pg_sleep ---------- (1 row) postgres=# UPDATE autovacuum_threshold_test SET id = id WHERE id BETWEEN 501 and 1000; UPDATE 500 < 2018-02-14 14:30:07.690 GMT > LOG: automatic vacuum of table "postgres.public.autovacuum_threshold_test": index scans: 0 pages: 0 removed, 9 remain, 0 skipped due to pins, 0 skipped frozen tuples: 500 removed, 1000 remain, 0 are dead but not yet removable buffer usage: 41 hits, 2 misses, 4 dirtied avg read rate: 5.564 MB/s, avg write rate: 11.129 MB/s system usage: CPU 0.00s/0.00u sec elapsed 0.00 sec Cheers, Greg Clough Senior Technology Engineer Ipreo Castle House | 37-35 Paul St | London EC2A 4LS
> Possibly there’s something more unusual happening, because if you run an update at the > *exact* autovacuum_vacuum_threshold, and then shortly after run the same number of > updates again, the autovacuum log only shows that one set of tuples removed. Shouldn’t > it be A + B, as two updates have run? (so “tuples: 1000 removed” instead of the > “tuples: 500 removed” shown below) Even stranger, is that if you do an update of exactly the threshold, wait, and then do a *SINGLE* row update: postgres=# VACUUM FULL autovacuum_threshold_test; VACUUM postgres=# UPDATE autovacuum_threshold_test SET id = id WHERE id <= 500; UPDATE 500 postgres=# SELECT pg_sleep(30); pg_sleep ---------- (1 row) postgres=# UPDATE autovacuum_threshold_test SET id = id WHERE id = 501; UPDATE 1 The resulting autovacuum ran but only removed 1 row, even though the autovacuum_vaccum_threshold is set at 500: < 2018-02-14 15:20:08.512 GMT > LOG: automatic vacuum of table "postgres.public.autovacuum_threshold_test": index scans:0 pages: 0 removed, 7 remain, 0 skipped due to pins, 0 skipped frozen tuples: 1 removed, 1000 remain, 0 are dead but not yet removable buffer usage: 37 hits, 2 misses, 4 dirtied avg read rate: 18.426 MB/s, avg write rate: 36.851 MB/s system usage: CPU 0.00s/0.00u sec elapsed 0.00 sec Greg Clough Senior Technology Engineer Ipreo Castle House | 37-35 Paul St | London EC2A 4LS
> Even stranger, is that if you do an update of exactly the threshold, wait, and then do a *SINGLE* row update: I realise this isn't a top priority, but it does seem to allow a violation of the "autovacuum_vacuum_threshold" parameter. In practice it doesn't seem to be serious, but finding the issue would reduce autovacuuming of tables that makeexactly (rows * autovacuum_vacuum_threshold + autovacuum_vacuum_threshold) changes. I presume it's that simple, but I worry it has wider implications. Cheers, Greg.
2018-02-14 11:44 GMT-03:00 PG Bug reporting form <noreply@postgresql.org>: > I think there is a documentation bug in the Autovacuum section: > No, there isn't. > https://www.postgresql.org/docs/9.6/static/routine-vacuuming.html#AUTOVACUUM > If you read the documentation again you will note the following setence: ... since the last VACUUM exceeds the "vacuum threshold", the table is vacuumed. "exceed" means "greater than". Hence, documentation is accurate. -- Euler Taveira Timbira - http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
> > I think there is a documentation bug in the Autovacuum section: > No, there isn't. > "exceed" means "greater than". Hence, documentation is accurate. > > I think there is a documentation bug in the Autovacuum section: > No, there isn't. > "exceed" means "greater than". Hence, documentation is accurate. Hi Euler, I had considered that, but if you do *exactly* the same number of updates as the threshold... wait... and then do a singleupdate, it vacuums a single row, not threshold+1. In my example: 1) autovacuum_vacuum_scale_factor = 0 2) autovacuum_vacuum_threshold = 500 3) Update 500 rows of a 1,000 row table 4) Wait 30 seconds 5) Update 1 row 6) NOTE: "tuples: 1 removed" in the postgres.log, not "tuples: 501 removed". < 2018-02-14 15:20:08.512 GMT > LOG: automatic vacuum of table "postgres.public.autovacuum_threshold_test": index scans:0 pages: 0 removed, 7 remain, 0 skipped due to pins, 0 skipped frozen tuples: 1 removed, 1000 remain, 0 are dead but not yet removable There are two potential issues with that: 1) The autovacuum_vacuum_threshold is 500, so it should never vacuum anything less... especially not a single tuple. (Is that a bug in autovacuum, triggering on less than the threshold?) 2) Nowhere does it ever mention "tuples: 500 removed", so what happened to the originally updated 500 tuples? - Did they get vacuumed silently? (Is that a bug in autovacuum, where it doesn't report correctly?) - Did they skip being vacuumed? (Are these rows being lost forever, adding to bloat?) In short, what happened to the initial autovacuum_vacuum_threshold (500) updates, and why did autovacuum run on a singlerow? Cheers, Greg.
On 3/1/18 5:13 AM, Greg Clough wrote: > 1) autovacuum_vacuum_scale_factor = 0 > 2) autovacuum_vacuum_threshold = 500 > 3) Update 500 rows of a 1,000 row table > 4) Wait 30 seconds > 5) Update 1 row > 6) NOTE: "tuples: 1 removed" in the postgres.log, not "tuples: 501 removed". > > < 2018-02-14 15:20:08.512 GMT > LOG: automatic vacuum of table "postgres.public.autovacuum_threshold_test": index scans:0 > pages: 0 removed, 7 remain, 0 skipped due to pins, 0 skipped frozen > tuples: 1 removed, 1000 remain, 0 are dead but not yet removable That indicates that the vacuum ran when there were only 1000 rows in the table, 1 of which had been updated. Obviously that doesn't make sense given your example, so I'm wondering if you did a single-row update some time before step 3. Note that after step 3 vacuum should report 1500 tuples, not 1000 (pretty sure that's true regardless of the tuples being HOT or not). I suggest posting a complete test script demonstrating the behavior you're seeing. -- Jim Nasby, Chief Data Architect, Austin TX OpenSCG http://OpenSCG.com
> Obviously that doesn't make sense given your example, so I'm wondering if you did a single-row update some time beforestep 3. Hi Jim. No, there were no other updates. I'm glad you agree this is strange. > I suggest posting a complete test script demonstrating the behavior you're seeing. Script attached below. I think it shows up some definite weirdness in the way autovacuum is running and reporting into thepostgres.log. It should work on any basic database (I used v9.6.8 on CentOS v7.4), but ensure "autovacuum_naptime = 5s"so that the short sleeps actually allow autovacuum to run. I ran this script in a terminal with psql, and had the postgres.log tailing in another. I apologise if there are a lot ofcombinations, but I'm trying to cover all angles to show it's not a simple corner case. I have also tried it with autovacuum_vacuum_scale_factor> 0, and the behaviour is the same... albeit with the autovacuum trigger point increased by2000 * autovacuum_vacuum_scale_factor. In short, it seems that it will only report the correct "tuples: XX removed" into the postgres.log if cumulative updatesare done within a single transaction. Cheers, Greg Clough Senior Technology Engineer Ipreo Castle House | 37-35 Paul St | London EC2A 4LS -- -- Turn on logging for all autovacuum, and increase autovacuum frequency -- ALTER SYSTEM SET log_autovacuum_min_duration = 0; -- -- Capture some system information -- SELECT version(); SHOW log_autovacuum_min_duration; SHOW autovacuum_naptime; -- MUST BE SET TO A LOW VALUE, LIKE "5s" -- -- Prepare a test table with some data -- DROP TABLE IF EXISTS autovacuum_threshold_test; CREATE TABLE autovacuum_threshold_test(id int); INSERT INTO autovacuum_threshold_test(id) (SELECT generate_series(1,2000)); -- -- Set the autovacuum for the test table -- ALTER TABLE autovacuum_threshold_test SET (autovacuum_vacuum_scale_factor = 0); ALTER TABLE autovacuum_threshold_test SET (autovacuum_vacuum_threshold = 500); \d+ autovacuum_threshold_test; \echo ============================================================ \echo Trigger autovacuum, should vacuum 500 rows... \echo ... but it does NOT, then it vacuums "1" row, which is < the threshold \echo ------------------------------------------------------------ UPDATE autovacuum_threshold_test SET id = id WHERE id <= 500; SELECT pg_sleep(30); UPDATE autovacuum_threshold_test SET id = id WHERE id <= 1; \echo ============================================================ \echo \echo Wait between tests... \echo SELECT pg_sleep(30); VACUUM FULL autovacuum_threshold_test; \echo ============================================================ \echo Trigger autovacuum, should vacuum 500 rows... \echo ... but it does NOT, and without the delay NEITHER are reported \echo ------------------------------------------------------------ UPDATE autovacuum_threshold_test SET id = id WHERE id <= 500; UPDATE autovacuum_threshold_test SET id = id WHERE id <= 1; \echo ============================================================ \echo \echo Wait between tests... \echo SELECT pg_sleep(30); VACUUM FULL autovacuum_threshold_test; \echo ============================================================ \echo Trigger autovacuum, should vacuum 499 + 2 = 501 rows... \echo ... but only the "2" row vacuum is reported, either with or without a sleep \echo ------------------------------------------------------------ UPDATE autovacuum_threshold_test SET id = id WHERE id <= 499; -- SELECT pg_sleep(30); UPDATE autovacuum_threshold_test SET id = id WHERE id BETWEEN 500 and 501; \echo ============================================================ \echo \echo Wait between tests... \echo SELECT pg_sleep(30); VACUUM FULL autovacuum_threshold_test; \echo ============================================================ \echo Trigger autovacuum, should vacuum 200 + 201 + 202 = 603 rows... \echo ... but it does NOT, it only reports the final 202 row vacuum, either with or without a sleep \echo ------------------------------------------------------------ UPDATE autovacuum_threshold_test SET id = id WHERE id BETWEEN 1 and 200; -- SELECT pg_sleep(30); UPDATE autovacuum_threshold_test SET id = id WHERE id BETWEEN 201 and 401; -- SELECT pg_sleep(30); UPDATE autovacuum_threshold_test SET id = id WHERE id BETWEEN 402 and 603; \echo ============================================================ \echo \echo Wait between tests... \echo SELECT pg_sleep(30); VACUUM FULL autovacuum_threshold_test; \echo ============================================================ \echo Trigger autovacuum, should vacuum 500 rows... then do a separate vacuum of 501 rows \echo ... but it does NOT, it missed the 500 rows entirely, and only reports the 501 row vacuum, either with or without asleep \echo ------------------------------------------------------------ UPDATE autovacuum_threshold_test SET id = id WHERE id <= 500; -- SELECT pg_sleep(30); UPDATE autovacuum_threshold_test SET id = id WHERE id BETWEEN 501 and 1001; \echo ============================================================ \echo \echo Wait between tests... \echo SELECT pg_sleep(30); VACUUM FULL autovacuum_threshold_test; \echo ============================================================ \echo Trigger autovacuum, should vacuum 500 + 501 = 1001 rows... \echo ... and it DOES, because it is in a single transaction \echo ------------------------------------------------------------ BEGIN; UPDATE autovacuum_threshold_test SET id = id WHERE id <= 500; UPDATE autovacuum_threshold_test SET id = id WHERE id BETWEEN 501 and 1001; COMMIT; \echo ============================================================ \echo TESTS FINISHED -- -- END OF FILE --
> > Obviously that doesn't make sense given your example, so I'm wondering if you did a single-row update some time beforestep 3. > > Hi Jim. No, there were no other updates. I'm glad you agree this is strange. Using my test case in the last email, can anyone explain why this is actually working "as designed"? It's probably not abig issue, but it's a corner case that's still bugging me. Cheers, Greg.