Thread: Autovacuum Truncation Phase Loop?

Autovacuum Truncation Phase Loop?

From
Creston Jamison
Date:
Hello,

We recently upgraded a 17 TB database from Postgres 9.6 to 12 using pg_upgrade. After this upgrade, we started observing that autovacuum would get in a loop about every 5 seconds for certain tables. This usually happened to be the toast table of the relation. This causes the performance of the table to decrease substantially. A manual VACUUM of the table resolves the issue. Here is an example of what we see in the log:

2020-11-04 16:34:38.131 UTC [892980-1] ERROR:  canceling autovacuum task
2020-11-04 16:34:38.131 UTC [892980-2] CONTEXT:  automatic vacuum of table "x.pg_toast.pg_toast_981540"
2020-11-04 16:34:41.878 UTC [893355-1] ERROR:  canceling autovacuum task
2020-11-04 16:34:41.878 UTC [893355-2] CONTEXT:  automatic vacuum of table "x.pg_toast.pg_toast_981540"
2020-11-04 16:34:45.208 UTC [893972-1] ERROR:  canceling autovacuum task
2020-11-04 16:34:45.208 UTC [893972-2] CONTEXT:  automatic vacuum of table "x.pg_toast.pg_toast_981540"
2020-11-04 16:34:47.635 UTC [894681-1] ERROR:  canceling autovacuum task
2020-11-04 16:34:47.635 UTC [894681-2] CONTEXT:  automatic vacuum of table "x.pg_toast.pg_toast_981540"

Based upon Googling, we suspect it is the truncation step of autovacuum and its ACCESS EXCLUSIVE lock attempt(s). 

This behavior hits our pg_statistic toast table a lot. However, it is randomly happening to other tables infrequently. 

I know that in PG 12 there is the new vacuum_truncate option that we can turn off on a table by table basis. However, that is a concern since this randomly happens to different tables.

I am seeking to understand what the best practice is in this case.

- Should we turn off vacuum_truncate for all tables or just turn off vacuum_truncate for the tables as they happen?
- Should we run a "VACUUM (TRUNCATE TRUE) table_name" manually for the tables where vacuum_truncate is turned off?
- Is there another database level solution I am not aware of?
- Lastly, does anyone know why this started happening after our upgrade from PG 9.6 to 12?

Many thanks,
Creston

Re: Autovacuum Truncation Phase Loop?

From
Creston Jamison
Date:
As a follow-up to this, we attempted to turn off vacuum_truncate on the pg_statistic table to resolve these autovacuum loops, but that results in an error. Is there anyway to turn this off for a system catalog table?

postgres=# ALTER TABLE pg_statistic SET (vacuum_truncate = off, toast.vacuum_truncate = off);
ERROR:  permission denied: "pg_statistic" is a system catalog


Many thanks,
Creston Jamison

On Mon, Nov 9, 2020 at 11:43 AM Creston Jamison <creston.jamison@rubytreesoftware.com> wrote:
Hello,

We recently upgraded a 17 TB database from Postgres 9.6 to 12 using pg_upgrade. After this upgrade, we started observing that autovacuum would get in a loop about every 5 seconds for certain tables. This usually happened to be the toast table of the relation. This causes the performance of the table to decrease substantially. A manual VACUUM of the table resolves the issue. Here is an example of what we see in the log:

2020-11-04 16:34:38.131 UTC [892980-1] ERROR:  canceling autovacuum task
2020-11-04 16:34:38.131 UTC [892980-2] CONTEXT:  automatic vacuum of table "x.pg_toast.pg_toast_981540"
2020-11-04 16:34:41.878 UTC [893355-1] ERROR:  canceling autovacuum task
2020-11-04 16:34:41.878 UTC [893355-2] CONTEXT:  automatic vacuum of table "x.pg_toast.pg_toast_981540"
2020-11-04 16:34:45.208 UTC [893972-1] ERROR:  canceling autovacuum task
2020-11-04 16:34:45.208 UTC [893972-2] CONTEXT:  automatic vacuum of table "x.pg_toast.pg_toast_981540"
2020-11-04 16:34:47.635 UTC [894681-1] ERROR:  canceling autovacuum task
2020-11-04 16:34:47.635 UTC [894681-2] CONTEXT:  automatic vacuum of table "x.pg_toast.pg_toast_981540"

Based upon Googling, we suspect it is the truncation step of autovacuum and its ACCESS EXCLUSIVE lock attempt(s). 

This behavior hits our pg_statistic toast table a lot. However, it is randomly happening to other tables infrequently. 

I know that in PG 12 there is the new vacuum_truncate option that we can turn off on a table by table basis. However, that is a concern since this randomly happens to different tables.

I am seeking to understand what the best practice is in this case.

- Should we turn off vacuum_truncate for all tables or just turn off vacuum_truncate for the tables as they happen?
- Should we run a "VACUUM (TRUNCATE TRUE) table_name" manually for the tables where vacuum_truncate is turned off?
- Is there another database level solution I am not aware of?
- Lastly, does anyone know why this started happening after our upgrade from PG 9.6 to 12?

Many thanks,
Creston


--
Creston Jamison
407-362-6515 x707

Ruby Tree Software, Inc. - https://www.rubytreesoftware.com
15430 County Road 565A Suite V
Groveland, FL 34736

Re: Autovacuum Truncation Phase Loop?

From
Jeff Janes
Date:
On Mon, Nov 9, 2020 at 11:43 AM Creston Jamison <creston.jamison@rubytreesoftware.com> wrote:

2020-11-04 16:34:47.635 UTC [894681-1] ERROR:  canceling autovacuum task
2020-11-04 16:34:47.635 UTC [894681-2] CONTEXT:  automatic vacuum of table "x.pg_toast.pg_toast_981540"

Based upon Googling, we suspect it is the truncation step of autovacuum and its ACCESS EXCLUSIVE lock attempt(s).

No, I think that would lead to different messages explicitly mentioning truncation. (or no messages in most cases, as I think those messages only get logged either for 'vacuum verbose' or when log_min_messages = debug2 or higher).  So something else is going on.

Are these vacuums happening for wrap around?  I don't know why else they would restart so aggressively.  On the other hand if they were for wrap around, they shouldn't be allowing themselves to get canceled so easily in the first place.

You mention a manual vacuum resolves the issue.  For how long does it stay resolved?

Cheers,

Jeff

Re: Autovacuum Truncation Phase Loop?

From
Laurenz Albe
Date:
On Sat, 2020-12-12 at 12:08 -0500, Jeff Janes wrote:
> On Mon, Nov 9, 2020 at 11:43 AM Creston Jamison <creston.jamison@rubytreesoftware.com> wrote:
> > 2020-11-04 16:34:47.635 UTC [894681-1] ERROR:  canceling autovacuum task
> > 2020-11-04 16:34:47.635 UTC [894681-2] CONTEXT:  automatic vacuum of table "x.pg_toast.pg_toast_981540"
> > 
> > Based upon Googling, we suspect it is the truncation step of autovacuum and its ACCESS EXCLUSIVE lock attempt(s).
> > 
> 
> No, I think that would lead to different messages explicitly mentioning truncation.
>  (or no messages in most cases, as I think those messages only get logged either for 
> 'vacuum verbose' or when log_min_messages = debug2 or higher).  So something else is going on.
> 
> Are these vacuums happening for wrap around?  I don't know why else they would
>  restart so aggressively.  On the other hand if they were for wrap around, they
>  shouldn't be allowing themselves to get canceled so easily in the first place.

Right.

Autovacuum gets canceled if it blocks another statement for more than a second.  

Likely candidates are: concurrent manual VACUUM, CREATE/DROP INDEX, CREATE/DROP TRIGGER,
ALTER/DROP TABLE, ALTER TABLE, TRUNCATE, or (most often) an explicit LOCK TABLE.

If that happens all the time for that table, you'll get a problem eventually.

Yours,
Laurenz Albe

-- 
Cybertec | https://www.cybertec-postgresql.com