Re: pgsql: Improve autovacuum logging for aggressive andanti-wraparound ru - Mailing list pgsql-hackers

From Andres Freund
Subject Re: pgsql: Improve autovacuum logging for aggressive andanti-wraparound ru
Date
Msg-id 20200324014150.2nyuygbbejzaybip@alap3.anarazel.de
Whole thread Raw
In response to Re: pgsql: Improve autovacuum logging for aggressive andanti-wraparound ru  (Michael Paquier <michael@paquier.xyz>)
Responses Re: pgsql: Improve autovacuum logging for aggressive andanti-wraparound ru  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers
Hi,

On 2019-03-29 20:51:38 +0900, Michael Paquier wrote:
> So, coming back to this thread, and studying the problem again, it
> looks that the diagnostic that a non-aggressive, anti-wraparound
> vacuum could be triggered because the worker sees trouble in the
> force because of some activity happening in parallel.  Hence, if we
> face this case, it looks right to skip the vacuum for this relation.
> 
> Attached is an updated patch with a better error message, more
> comments, and the removal of the anti-wraparound non-aggressive log
> which was added in 28a8fa9.  The error message could be better I
> guess.  Suggestions are welcome.

> diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
> index 5c554f9465..82be8c81f3 100644
> --- a/src/backend/access/heap/vacuumlazy.c
> +++ b/src/backend/access/heap/vacuumlazy.c
> @@ -248,6 +248,25 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
>      if (params->options & VACOPT_DISABLE_PAGE_SKIPPING)
>          aggressive = true;
>  
> +    /*
> +     * When running an anti-wraparound vacuum, we expect relfrozenxid to be
> +     * old enough so as aggressive is always set.  If this is not the case,
> +     * it could be possible that another concurrent vacuum process has done
> +     * the work for this relation so that relfrozenxid in relcache has
> +     * already been moved forward enough, causing this vacuum run to be
> +     * non-aggressive.  If that happens, note that this relation no longer
> +     * needs to be vacuumed, so just skip it.
> +     */
> +    if (params->is_wraparound && !aggressive)
> +    {
> +        ereport(LOG,
> +                (errmsg_internal("found vacuum to prevent wraparound of table \"%s.%s.%s\" to be not aggressive, so
skipping",
> +                                 get_database_name(MyDatabaseId),
> +                                 get_namespace_name(RelationGetNamespace(onerel)),
> +                                 RelationGetRelationName(onerel))));
> +        return;
> +    }
> +

Which valid scenario can lead to this? Neither the comment, nor commit
message explain it. Unless you're thinking of scenarios where autovacuum
and manual vacuum are mixed, I don't really see valid reasons? Normally
autovacuum's locking + the table_recheck_autovac() check should prevent
problematic scenarios.

I do see a few scenarios that can trigger this - but they all more or
less are bugs.

It doesn't strike me as a good idea to work around such bugs by silently
neutering heap_vacuum_rel(). The likelihood of that temporarily covering
up more severe problems seems significant - they're likely to then later
bite you with a cluster shutdown.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Additional improvements to extended statistics
Next
From: Fujii Masao
Date:
Subject: Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side