Re: autovacuum prioritization - Mailing list pgsql-hackers

From Robert Haas
Subject Re: autovacuum prioritization
Date
Msg-id CA+TgmoaHFPtZgVSF3RxUzQHz69aAU1w6ekenCaM57pjmP0EMRw@mail.gmail.com
Whole thread Raw
In response to Re: autovacuum prioritization  (Dilip Kumar <dilipbalaut@gmail.com>)
Responses Re: autovacuum prioritization
Re: autovacuum prioritization
List pgsql-hackers
On Mon, Jan 24, 2022 at 11:14 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> I think we need some more parameters to compare bloat vs wraparound.
> I mean in one of your examples in the 2nd paragraph we can say that
> the need-to-start of table A is earlier than table B so it's kind of
> simple.  But when it comes to wraparound vs bloat we need to add some
> weightage to compute how much bloat is considered as bad as
> wraparound.  I think the amount of bloat can not be an absolute number
> but it should be relative w.r.t the total database size or so.  I
> don't think it can be computed w.r.t to the table size because if the
> table is e.g. just 1 GB size and it is 5 times bloated then it is not
> as bad as another 1 TB table which is just 2 times bloated.

Thanks for writing back.

I don't think that I believe the last part of this argument, because
it seems to suppose that the big problem with bloat is that it might
use up disk space, whereas in my experience the big problem with bloat
is that it slows down access to your data. Yet the dead space in some
other table will not have much impact on the speed of access to the
current table. In fact, if most accesses to the table are index scans,
even dead space in the current table may not have much effect, but
sequential scans are bound to notice. It's true that, on a
cluster-wide basis, every dead page is one more page that can
potentially take up space in cache, so in that sense the performance
consequences are global to the whole cluster. However, that effect is
more indirect and takes a long time to become a big problem. The
direct effect of having to read more pages to execute the same query
plan causes problems a lot sooner.

But your broader point that we need to consider how much bloat
represents a problem is a really good one. In the past, one rule that
I've thought about is: if we're vacuuming a table and we're not going
to finish before it needs to be vacuumed again, then we should vacuum
faster (i.e. in effect, increase the cost limit on the fly). That
might still not result in good behavior, but it would at least result
in behavior that is less bad. However, it doesn't really answer the
question of how we decide when to start the very first VACUUM. I don't
really know the answer to that question. The current heuristics result
in estimates of acceptable bloat that are too high in some cases and
too low in others. I've seen tables that got bloated vastly beyond
what autovacuum is configured to tolerate before they caused any real
difficulty, and I know there are other cases where users start to
suffer long before those thresholds are reached.

At the moment, the best idea I have is to use something like the
current algorithm, but treat it as a deadline (keep bloat below this
amount) rather than an initiation criteria (start when you reach this
amount).  But I think that idea is a bit weak; maybe there's something
better out there.

> I think we should be thinking of dynamically adjusting priority as
> well.  Because it is possible that when autovacuum started we
> prioritize the table based on some statistics and estimation but
> vacuuming process can take long time and during that some priority
> might change so during the start of the autovacuum if we push all
> table to some priority queue and simply vacuum in that order then we
> might go wrong somewhere.

Yep. I think we should reassess what to do next after each table.
Possibly making some exception for really small tables - e.g. if we
last recomputed priorities less than 1 minute ago, don't do it again.

> I think we need to make different priority
> queues based on different factors, for example 1 queue for wraparound
> risk and another for bloat risk.

I don't see why we want multiple queues. We have to answer the
question "what should we do next?" which requires us, in some way, to
funnel everything into a single prioritization.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)
Next
From: "Bossart, Nathan"
Date:
Subject: Re: make MaxBackends available in _PG_init