Re: Trigger more frequent autovacuums of heavy insert tables - Mailing list pgsql-hackers
From | Melanie Plageman |
---|---|
Subject | Re: Trigger more frequent autovacuums of heavy insert tables |
Date | |
Msg-id | CAAKRu_arp7a-Kiy8DhL21CZrpm0w3CU3JOFei_LUqSShfpCUPg@mail.gmail.com Whole thread Raw |
In response to | Re: Trigger more frequent autovacuums of heavy insert tables (Nathan Bossart <nathandbossart@gmail.com>) |
List | pgsql-hackers |
On Fri, Feb 7, 2025 at 12:37 PM Nathan Bossart <nathandbossart@gmail.com> wrote: > > On Tue, Oct 22, 2024 at 03:12:53PM -0400, Melanie Plageman wrote: > > By considering only the unfrozen portion of the table when calculating > > the vacuum insert threshold, we can trigger vacuums more proactively > > on insert-heavy tables. This changes the definition of > > insert_scale_factor to a percentage of "active" table size. The > > attached patch does this. > > I think this is a creative idea. Indeed. I can't take much credit for it -- Andres suggested this direction during an off-list conversation where I was complaining about how difficult it was to benchmark my vacuum eager scanning patch set [1] because normal vacuums were so rarely triggered for insert-only tables after the first aggressive vacuum. > My first reaction is to question whether > it makes send to have two strategies for this sort of thing: > autovacuum_vacuum_max_threshold for updates/deletes and this for inserts. > Perhaps we don't want to more aggressively clean up bloat (except for the > very largest tables via the hard cap), but we do want to more aggressively > mark newly-inserted tuples frozen. I'm curious what you think. The goal with insert-only tables is to set the whole page frozen in the VM. So, the number of pages is more important than the total number of tuples inserted. Whereas, with updates/deletes, it seems like the total amount of garbage (# tuples) needing cleaning is more important. My intuition (maybe wrong) is that it is more common to have a bunch of pages with a single (or few) updates/deletes than it is to have a bunch of pages with a single insert. This patch is mostly meant to trigger vacuums sooner on large insert-only or bulk loaded tables. Though, it is more common to have a cluster of hot pages than uniformly distributed updates and deletes... > > I've estimated the unfrozen percentage of the table by adding a new > > field to pg_class, relallfrozen, which is updated in the same places > > as relallvisible. > > Wouldn't relallvisible be sufficient here? We'll skip all-visible pages > unless this is an anti-wraparound vacuum, at which point I would think the > insert threshold goes out the window. It's a great question. There are a couple reasons why I don't think so. I think this might lead to triggering vacuums too often for insert-mostly tables. For those tables, the pages that are not all-visible will largely be just those with data that is new since the last vacuum. And if we trigger vacuums based off of the % not all-visible, we might decrease the number of cases where we are able to vacuum inserted data and freeze it the first time it is vacuumed -- thereby increasing the total amount of work. As for your point about us skipping all-visible pages except in anti-wraparound vacuums -- that's not totally true. Autovacuums triggered by the insert or update/delete thresholds and not by autovacuum_freeze_max_age can also be aggressive (that's based on vacuum_freeze_table_age). Aggressive vacuums scan all-visible pages. And we actually want to trigger more normal aggressive (non-anti-wrap) vacuums because anti-wraparound vacuums are not canceled by conflicting lock requests (like those needed by DDL) -- see PROC_VACUUM_FOR_WRAPAROUND in ProcSleep(). We also scan a surprising number of all-visible pages in practice due to SKIP_PAGES_THRESHOLD. I was pretty taken aback while testing [1] how many all-visible pages we scan due to this optimization. And, I'm planning on merging [1] in the next few days, so this will also increase the number of all-visible pages scanned during normal vacuums. > > More frequent vacuums means each vacuum scans fewer pages, but, more > > interestingly, the first vacuum after a checkpoint is much more > > efficient. With the patch, the first vacuum after a checkpoint emits > > half as many FPIs. You can see that only 18 pages were newly dirtied. > > So, with the patch, the pages being vacuumed are usually still in > > shared buffers and still dirty. > > Are you aware of any scenarios where your proposed strategy might make > things worse? From your test results, it sounds like these vacuums ought > to usually be relatively efficient, so sending insert-only tables to the > front of the line is normally okay, but maybe that's not always true. So, of course they aren't exactly at the front of the line since we autovacuum based on the order in pg_class. But, I suppose if you spend a bunch of time vacuuming an insert-mostly table you previously would have skipped instead of some other table -- that is effectively prioritizing the insert-mostly tables. For insert-only/mostly tables, what you are ideally doing is vacuuming more frequently and handling a small number of pages each vacuum of the relation, so it has a low performance impact. I suppose if you only have a few autovacuum workers and an equal number of massive insert-only tables, you could end up starving other actively updated tables of vacuum resources. But, those insert-only tables would have to be vacuumed eventually -- and I imagine that the impact of a massive aggressive vacuum of all of the data in those tables would be more disruptive than some extra bloat in your other tables. I'd be interested if other people with more field experience can imagine starvation scenarios that would be much worse with this patch. What kinds of starvation scenarios do you normally see? In terms of specific, dramatic differences in behavior (since this wouldn't be hidden behind a guc) people might be surprised by how soon tables start being vacuumed after a huge COPY FREEZE. - Melanie
pgsql-hackers by date: