Re: Eagerly scan all-visible pages to amortize aggressive vacuum - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Eagerly scan all-visible pages to amortize aggressive vacuum |
Date | |
Msg-id | bepzdxjhmgueoiddeemzhhgvx4nr2ysrttrgyxidask7m5crzt@sws4c5adn544 Whole thread Raw |
In response to | Re: Eagerly scan all-visible pages to amortize aggressive vacuum (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Eagerly scan all-visible pages to amortize aggressive vacuum
|
List | pgsql-hackers |
Hi, On 2025-01-29 14:12:52 -0500, Melanie Plageman wrote: > From 71f32189aad510b73d221fb0478ffd916e5e5dde Mon Sep 17 00:00:00 2001 > From: Melanie Plageman <melanieplageman@gmail.com> > Date: Mon, 27 Jan 2025 12:23:00 -0500 > Subject: [PATCH v14] Eagerly scan all-visible pages to amortize aggressive > vacuum > > Amortize the cost of an aggressive vacuum by eagerly scanning some > all-visible but not all-frozen pages during normal vacuums. I think it'd be good to explain the problem this is trying to address a bit more (i.e. the goal is to avoid the situation which a lot of work is deferred until an aggressive vacuum, which is then very expensive). > Because the goal is to freeze these all-visible pages, all-visible pages > that are eagerly scanned and set all-frozen in the visibility map are > counted as successful eager freezes and those not frozen are considered > failed eager freezes. I don't really understand this sentence. "Because the goal is to freeze these all-visible pages" doesn't really relate with the rest of the sentence. > + <varlistentry id="guc-vacuum-max-eager-freeze-failure-rate" xreflabel="vacuum_max_eager_freeze_failure_rate"> > + <term><varname>vacuum_max_eager_freeze_failure_rate</varname> (<type>floating point</type>) > + <indexterm> > + <primary><varname>vacuum_max_eager_freeze_failure_rate</varname> configuration parameter</primary> > + </indexterm> > + </term> > + <listitem> > + <para> > + Specifies the maximum fraction of pages that > + <command>VACUUM</command> may scan and <emphasis>fail</emphasis> to set > + all-frozen in the visibility map before disabling eager scanning. A > + value of <literal>0</literal> disables eager scanning altogether. The > + default is <literal>0.03</literal> (3%). > + </para> Fraction of what? > + <para> > + Note that when eager scanning is enabled, successful page freezes > + do not count against this limit, although they are internally > + capped at 20% of the all-visible but not all-frozen pages in the > + relation. Capping successful page freezes helps amortize the > + overhead across multiple normal vacuums. > + </para> What does it mean that they are not counted, but are capped? > + <para> > + If a table is building up a backlog of all-visible but not all-frozen > + pages, a normal vacuum may choose to scan skippable pages in an effort to > + freeze them. Doing so decreases the number of pages the next aggressive > + vacuum must scan. These are referred to as <firstterm>eagerly > + scanned</firstterm> pages. Eager scanning can be tuned to attempt > + to freeze more all-visible pages by increasing > + <xref linkend="guc-vacuum-max-eager-freeze-failure-rate"/>. Even if eager > + scanning has kept the number of all-visible but not all-frozen pages to a > + minimum, most tables still require periodic aggressive vacuuming. > + </para> Maybe mention that the aggressive vacuuming will often be cheaper than without eager freezing, even if necessary? > + * Normal vacuums count all-visible pages eagerly scanned as a success when > + * they are able to set them all-frozen in the VM and as a failure when they > + * are not able to set them all-frozen. Maybe some more punctuation would make this more readable? Or a slight rephrasing? > + * Because we want to amortize the overhead of freezing pages over multiple > + * vacuums, normal vacuums cap the number of successful eager freezes to > + * MAX_EAGER_FREEZE_SUCCESS_RATE of the number of all-visible but not > + * all-frozen pages at the beginning of the vacuum. Once the success cap has > + * been hit, eager scanning is disabled for the remainder of the vacuum of the > + * relation. It also caps the maximum "downside" of freezing eagerly, right? Seems worth mentioning. > + /* > + * Now calculate the eager scan start block. Start at a random spot > + * somewhere within the first eager scan region. This avoids eager > + * scanning and failing to freeze the exact same blocks each vacuum of the > + * relation. > + */ If I understand correctly, we're not really choosing a spot inside the first eager scan region, we determine the bounds of the first region? > @@ -930,16 +1188,21 @@ lazy_scan_heap(LVRelState *vacrel) > vacrel->current_block = InvalidBlockNumber; > vacrel->next_unskippable_block = InvalidBlockNumber; > vacrel->next_unskippable_allvis = false; > + vacrel->next_unskippable_eager_scanned = false; > vacrel->next_unskippable_vmbuffer = InvalidBuffer; > > - while (heap_vac_scan_next_block(vacrel, &blkno, &all_visible_according_to_vm)) > + while (heap_vac_scan_next_block(vacrel, &blkno, &all_visible_according_to_vm, > + &was_eager_scanned)) Pedantic^3: Is past tense really appropriate? We *will* be scanning that page in the body of the loop, right? > @@ -1064,7 +1327,45 @@ lazy_scan_heap(LVRelState *vacrel) > if (got_cleanup_lock) > lazy_scan_prune(vacrel, buf, blkno, page, > vmbuffer, all_visible_according_to_vm, > - &has_lpdead_items); > + &has_lpdead_items, &vm_page_frozen); > + > + /* > + * Count an eagerly scanned page as a failure or a success. > + */ > + if (was_eager_scanned) Hm - how should pages be counted that we couldn't get a lock on? I think right now they'll be counted as a failure, but that doesn't seem quite right. > diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c > index 0ab921a169b..32a1b8c46a1 100644 > --- a/src/backend/postmaster/autovacuum.c > +++ b/src/backend/postmaster/autovacuum.c > @@ -2826,6 +2826,12 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map, > tab->at_params.is_wraparound = wraparound; > tab->at_params.log_min_duration = log_min_duration; > tab->at_params.toast_parent = InvalidOid; > + > + /* > + * Later we check reloptions for vacuum_max_eager_freeze_failure_rate > + * override > + */ > + tab->at_params.max_eager_freeze_failure_rate = vacuum_max_eager_freeze_failure_rate; > tab->at_storage_param_vac_cost_limit = avopts ? > avopts->vacuum_cost_limit : 0; > tab->at_storage_param_vac_cost_delay = avopts ? I'd mention where that is, so that a reader of that comment doesn't have to search around... I think this is pretty close! Greetings, Andres Freund
pgsql-hackers by date: