Re: cost delay brainstorming - Mailing list pgsql-hackers

From Andres Freund
Subject Re: cost delay brainstorming
Date
Msg-id 20240617213719.xls5fdbgylgnjokg@awork3.anarazel.de
Whole thread Raw
In response to cost delay brainstorming  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: cost delay brainstorming
List pgsql-hackers
Hi,

On 2024-06-17 15:39:27 -0400, Robert Haas wrote:
> As I mentioned in my talk at 2024.pgconf.dev, I think that the biggest
> problem with autovacuum as it exists today is that the cost delay is
> sometimes too low to keep up with the amount of vacuuming that needs
> to be done.

I agree it's a big problem, not sure it's *the* problem. But I'm happy to see
it improved anyway, so it doesn't really matter.

One issue around all of this is that we pretty much don't have the tools to
analyze autovacuum behaviour across a larger number of systems in a realistic
way :/.  I find my own view of what precisely the problem is being heavily
swayed by the last few problematic cases I've looked t.



> I think we might able to get fairly far by observing that if the
> number of running autovacuum workers is equal to the maximum allowable
> number of running autovacuum workers, that may be a sign of trouble,
> and the longer that situation persists, the more likely it is that
> we're in trouble. So, a very simple algorithm would be: If the maximum
> number of workers have been running continuously for more than, say,
> 10 minutes, assume we're falling behind and exempt all workers from
> the cost limit for as long as the situation persists. One could
> criticize this approach on the grounds that it causes a very sudden
> behavior change instead of, say, allowing the rate of vacuuming to
> gradually increase. I'm curious to know whether other people think
> that would be a problem.

Another issue is that it's easy to fall behind due to cost limits on systems
where autovacuum_max_workers is smaller than the number of busy tables.

IME one common situation is to have a single table that's being vacuumed too
slowly due to cost limits, with everything else keeping up easily.


> I think it might be OK, for a couple of reasons:
>
> 1. I'm unconvinced that the vacuum_cost_delay system actually prevents
> very many problems. I've fixed a lot of problems by telling users to
> raise the cost limit, but virtually never by lowering it. When we
> lowered the delay by an order of magnitude a few releases ago -
> equivalent to increasing the cost limit by an order of magnitude - I
> didn't personally hear any complaints about that causing problems. So
> disabling the delay completely some of the time might just be fine.

I have seen disabling cost limits cause replication setups to fall over
because the amount of WAL increases beyond what can be
replicated/archived/replayed.  It's very easy to reach the issue when syncrep
is enabled.



> 1a. Incidentally, when I have seen problems because of vacuum running
> "too fast", it's not been because it was using up too much I/O
> bandwidth, but because it's pushed too much data out of cache too
> quickly. A long overnight vacuum can evict a lot of pages from the
> system page cache by morning - the ring buffer only protects our
> shared_buffers, not the OS cache. I don't think this can be fixed by
> rate-limiting vacuum, though: to keep the cache eviction at a level
> low enough that you could be certain of not causing trouble, you'd
> have to limit it to an extremely low rate which would just cause
> vacuuming not to keep up. The cure would be worse than the disease at
> that point.

I've seen versions of this too. Ironically it's often made way worse by
ringbuffers, because even if there is space is shared buffers, we'll not move
buffers there, instead putting a lot of pressure on the OS page cache.


> If you like this idea, I'd like to know that, and hear any further
> thoughts you have about how to improve or refine it. If you don't, I'd
> like to know that, too, and any alternatives you can propose,
> especially alternatives that don't require crazy amounts of new
> infrastructure to implement.

I unfortunately don't know what to do about all of this with just the existing
set of metrics :/.


One reason that cost limit can be important is that we often schedule
autovacuums on tables in useless ways, over and over. Without cost limits
providing *some* protection against using up all IO bandwidth / WAL volume, we
could end up doing even worse.


Common causes of such useless vacuums I've seen:

- Longrunning transaction prevents increasing relfrozenxid, we run autovacuum
  over and over on the same relation, using up the whole cost budget. This is
  particularly bad because often we'll not autovacuum anything else, building
  up a larger and larger backlog of actual work.

- Tables, where on-access pruning works very well, end up being vacuumed far
  too frequently, because our autovacuum scheduling doesn't know about tuples
  having been cleaned up by on-access pruning.

- Larger tables with occasional lock conflicts cause autovacuum to be
  cancelled and restarting from scratch over and over. If that happens before
  the second table scan, this can easily eat up the whole cost budget without
  making forward progress.


Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Melanie Plageman
Date:
Subject: Re: BitmapHeapScan streaming read user and prelim refactoring
Next
From: Justin Pryzby
Date:
Subject: Re: Reducing the log spam