Re: autovacuum locking question - Mailing list pgsql-performance

From Jeff Janes
Subject Re: autovacuum locking question
Date
Msg-id CAMkU=1zGKub0OjwFYkbcck7-7YmNEbj1HRpzdOMHnAvFY_tppw@mail.gmail.com
Whole thread Raw
In response to RE: autovacuum locking question  (Mike Schanne <mschanne@kns.com>)
Responses Re: autovacuum locking question  (MichaelDBA <MichaelDBA@sqlexec.com>)
List pgsql-performance
On Fri, Dec 6, 2019 at 10:55 AM Mike Schanne <mschanne@kns.com> wrote:

The error is not actually showing up very often (I have 8 occurrences from 11/29 and none since then).  So maybe I should not be concerned about it.  I suspect we have an I/O bottleneck from other logs (i.e. long checkpoint sync times), so this error may be a symptom rather than the cause.


I think that at the point it is getting cancelled, it has done all the work except the truncation of the empty pages, and reporting the results (for example, updating n_live_tup  and n_dead_tup).  If this happens every single time (neither last_autovacuum nor last_vacuum ever advances) it will eventually cause problems.  So this is mostly a symptom, but not entirely.  Simply running a manual vacuum should fix the reporting problem.  It is not subject to cancelling, so it will detect it is blocking someone and gracefully bow.  Meaning it will suspend the truncation, but will still report its results as normal.
 
Reading the table backwards in order to truncate it might be contributing to the IO problems as well as being a victim of those problems.  Upgrading to v10 might help with this, as it implemented a prefetch where it reads the table forward in 128kB chunks, and then jumps backwards one chunk at a time.  Rather than just reading backwards 8kB at a time.

Cheers,

Jeff

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: autovacuum locking question
Next
From: Justin Pryzby
Date:
Subject: Re: autovacuum locking question