Re: pgsql: Improve autovacuum logging for aggressive andanti-wraparound ru - Mailing list pgsql-hackers
| From | Kyotaro HORIGUCHI |
|---|---|
| Subject | Re: pgsql: Improve autovacuum logging for aggressive andanti-wraparound ru |
| Date | |
| Msg-id | 20181009.181536.142257785.horiguchi.kyotaro@lab.ntt.co.jp Whole thread Raw |
| In response to | Re: pgsql: Improve autovacuum logging for aggressive andanti-wraparound ru (Michael Paquier <michael@paquier.xyz>) |
| Responses |
Re: pgsql: Improve autovacuum logging for aggressive andanti-wraparound ru
|
| List | pgsql-hackers |
At Fri, 5 Oct 2018 15:35:04 +0900, Michael Paquier <michael@paquier.xyz> wrote in <20181005063504.GB14664@paquier.xyz>
> On Fri, Oct 05, 2018 at 12:16:03PM +0900, Michael Paquier wrote:
> > So, I have come back to this stuff, and finished with the attached
> > instead, so as the assertion is in a single place. I find that
> > clearer. The comments have also been improved. Thoughts?
>
> And so... I have been looking at committing this thing, and while
> testing in-depth I have been able to trigger a case where an autovacuum
> has been able to be not aggressive but anti-wraparound, which is exactly
> what should not be possible, no? I have simply created an instance with
> autovacuum_freeze_max_age = 200000, then ran pgbench with
> autovacuum_freeze_table_age=200000 set for each table, and also ran
> installcheck-world in parallel. This has been able to trigger the
> assertion pretty quickly.
I investigated it and in short, it can happen.
It is a kind of race consdition between two autovacuum
processes. do_autovacuum() looks into pg_class (using a snapshot)
and vacuum_set_xid_limits() looks into relcache. If concurrent
vacuum happens and one has finished the relation, another gets
relcache invalidation and relfrozenxid is updated. If this
happens between do_autovacuum() and vacuum_set_xid_limits(), the
latter sees newer relfrozenxid than the former. The problem
happens when it moves by more than 5% of
autovacuum_freeze_max_age.
If lazy_vacuum_rel() sees the situation, the relation is already
aggressively vacuumed by a cocurrent worker. We can just ingore
the state safely but also we know that the vacuum is useless.
1. Just allow the case there (and add comment).
Causes redundant anti-wraparound vacuum.
2. Skip the relation by the condition.
I think that we can safely skip the relation in the
case. (attached)
3. Ensure that do_autovacuum always sees the same relfrozenxid
with vacuum_set_xid_limits().
4. Prevent concurrent acuuming of the same relation rigorously,
somehow.
Thoughts?
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index 8996d366e9..436d573454 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -249,6 +249,21 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
if (options & VACOPT_DISABLE_PAGE_SKIPPING)
aggressive = true;
+ /*
+ * When is_wraparound, relfrozenxid is old enough and aggressive must be
+ * true here. However, if another concurrent vacuum process has processed
+ * this relation, relfrozenxid in relcache can be moved forward enough so
+ * that aggressive is calculated as false here. This relation no longer
+ * needs to be vacuumed. Bail out.
+ */
+ if (params->is_wraparound && !aggressive)
+ {
+ elog(DEBUG1, "relation %d has been vacuumd ocncurrently, skip",
+ onerel->rd_id);
+ return;
+ }
+
+
vacrelstats = (LVRelStats *) palloc0(sizeof(LVRelStats));
vacrelstats->old_rel_pages = onerel->rd_rel->relpages;
pgsql-hackers by date: